Starting on September 1, 2019 Google announced that they will retire all code that handles unsupported and unpublished rules, such as the no-index directive. The announcement came via social media and the Google Webmaster Blog.
“In the interest of maintaining a healthy ecosystem and preparing for potential future open source releases, we’re retiring all code that handles unsupported and unpublished rules (such as noindex) on September 1, 2019. For those of you who relied on the noindex indexing directive in the robots.txt file, which controls crawling, there are a number of alternative options,” the company said.
Read the official Google tweet here: https://twitter.com/googlewmc/status/1145950977067016192
Why is Google doing this?
The noindex robots.txt directive is not an official directive, which is why Google is dropping it. In the past, Google has supported it but this will no longer be the case.
Didn’t Google obey the noindex directive in the past?
StoneTemple, which is now a part of Perficient Digital, published an article back in 2015 noting that Google didn’t obey the robots.txt noindex directive 100% of the time.
The takeaway from their research back in 2015 was that:
“Ultimately, the NoIndex directive in Robots.txt is pretty effective. It worked in 11 out of 12 cases we tested. It might work for your site, and because of how it’s implemented it gives you a path to prevent crawling of a page AND also have it removed from the index. That’s pretty useful in concept. However, our tests didn’t show 100 percent success, so it does not always work.”
Unfortunately for most SEO companies, that is no longer the case. Google has made it very clear that they will not support the noindex robots.txt directive at all.
Why is Google changing now?
It is well known in the SEO community that Google has been looking to make this change for at least several years and with the tech giant pushing to standardize the protocol, it can now move this aspect of their agenda forward.
Google said they had “analyzed the usage of robots.txt rules” in order to help determine this course of action. Google focuses on looking at unsupported implementations of the internet draft, such as crawl-delay, nofollow, and noindex. “Since these rules were never documented by Google, naturally, their usage in relation to Googlebot is very low,” Google said. “These mistakes hurt websites’ presence in Google’s search results in ways we don’t think webmasters intended.”
What should I use instead of the noindex directive?
Google listed 5 options to the noindex directive on their blog, which you should have been using anyway:
- Noindex in robots meta tags: Supported both in the HTTP response headers and in HTML, the noindex directive is the most effective way to remove URLs from the index when crawling is allowed.
- 404 and 410 HTTP status codes: Both status codes mean that the page does not exist, which will drop such URLs from Google’s index once they’re crawled and processed.
- Password protection: Unless markup is used to indicate subscription or paywalled content, hiding a page behind a login will generally remove it from Google’s index.
- Disallow in robots.txt: Search engines can only index pages that they know about, so blocking the page from being crawled often means its content won’t be indexed. While the search engine may also index a URL based on links from other pages, without seeing the content itself, we aim to make such pages less visible in the future.
- Search Console Remove URL tool: The tool is a quick and easy method to remove a URL temporarily from Google’s search results.
Is Google developing a new standard?
Thankfully, Google also announced that they are working on making the robots exclusion protocol a standard and this is probably the first change coming. In fact, Google released their robots.txt parser as an open source project along with this announcement yesterday.
Why should people care?
The most important thing is to not violate Google’s guidelines with respect to this matter and make sure that you are not using the noindex directive in the robots.txt file. If you are, you will want to make one of the suggested changes above before September 1, 2019. Also, look to see if you are using the nofollow or crawl-delay commands and if so, look to use the true supported method for those directives going forward.