Chat with us, powered by LiveChat
We are operating remotely at full capacity during the COVID-19 events. We are here to support where we can.

Why Won’t Google Index My Site?

Why Won’t Google Index My Site?

We get the question “why won’t Google index my site?” from prospects all the time. This can be a hard question to answer, because there are many reasons why a website won’t get content indexed by search engine crawlers.

Based on our experiences with clients, we’ve come up with six causes that are pretty common. These aren’t the only reasons why a site may not show up in search results, but in dealing with the ones listed below, the solutions are well known.

These are the six indexing issues we see most often:

  • Blocked resources
  • Content blocked by robots/no index
  • Multiple redirect chains
  • Http instead of https
  • Hacked site/malware injection
  • Content behind the firewall

Let’s review each one separately, and the solutions that’ll get the search engine robots back to crawling and indexing your site content and resources.

Blocked Resources

This means certain website functions are listed in the robots.txt file with instructions that they are not to be crawled. The resources you do not want to block from being crawled include JavaScript, CSS and images. You want the Googlebot to see your pages the same way a human visitor views them. A good rule of thumb is if the blocked resource impacts user experience, unblock it!

Your robots.txt file is a great place to start searching for these blocked resources. If you can edit the file, update the command line to fix the issue.

Google Search Console also listed blocked resources. Go to Crawl -Fetch As Google and enter a URL. After analyzing it, you’ll get a list of blocked resources. The report looks like this:

You can see there’s blocked JavaScript, an image and AJAX. They are all in the robots.txt file, and can be edited in the file itself to resolve this issue.

If you’re not comfortable editing the robots.txt file, have your web developer do this task for you.

By the way, did you know you can see your robots.txt file in a web browser? You don’t need to be in the website edit mode to see it. Just go to your site’s home page, and after the “/” add robots.txt. Here’s one:

There are a lot of blocked items for this site that runs on WordPress.

Content Blocked By Robots/No Index

This goes along with the previous issue. It’s a good idea to review your robots.txt file and discuss it with your web development team to determine what content needs to be blocked. If you do find blocked content, should it be blocked, or was this done accidentally? Since your content converts prospects to customers, it probably shouldn’t be hidden from the search engine crawlers to have a chance to show up high in search results.

Multiple Redirect Chains

If your site has had multiple 301 redirects, long redirect chains can cause the crawlers to stop before they get to the most recent version. It depends upon who you ask, but the general consensus is if your site has three or more redirect chains, the old ones should be removed to allow the crawlers to reach the latest version of your domain in fewer steps.

Your web developer can remove these 301 redirect commands. You can find long redirect chains by using a website crawler tool like Screaming Frog. It will tell you how many redirect chains exist, and it’ll show you every URL in the redirect path!

Three More Reasons Why Google Won’t Index A Site

Http Instead Of Https

This is becoming very common, since almost all sites are now being secured to protect web visitor browsing. It’s important to double check that once your site has gone secure (Https), that the proper 301 redirect command is in place to move the spiders to the right version of your site. If both versions (http/https) are being indexed, this sets up a duplicate content issue for your domain. While there’s no duplicate content penalty, Google tends to ignore or push down duplicate URLs in their search index.

Again, using Screaming Frog or Moz will show you if there are any issues with old Http pages being indexed in lieu of, or in addition to Https. Your web developer will do the necessary work to make sure the Https version of your domain is the only one being crawled and indexed.

Hacked Site/Malware Injection

Unfortunately, this problem is also becoming common, as hackers randomly select sites to practice their skills on. This is where having Google Search Console set up will help you and your development team quickly find and diagnose problems. If Google determines your site is too dangerous to show to users, you will see a warning like this in search results:

You may also see this:

In Google Search Console, you will get this message from the search engine:

Your web developer or a cyber security firm will need to clean your site up. Once all the dangerous files have been removed, you’ll need to go back to your Search Console dashboard and notify Google what actions were done to remove the malware/viruses/spyware from your site. If Google sees that all necessary clean up has been done, the search engine will send their crawlers back to your site soon afterwards to get your content back in their index.

Content Behind The Firewall

This is common – you may have premium, valuable content that requires someone to fill out a form for access, or they may need to have login credentials to access a secure part of your site. If so, that content can’t be seen by the search engine spiders, so make sure that blocking access to this content is really what you want to do.

We hope this little primer on common search engine crawler blocking issues is helpful. Got a problem with content visibility in search results? Let Kraus Marketing show you how to get better search engine rankings!