Use the new “indexifembedded” robots meta tag to control indexing of embedded content
A user asked how to block embedded videos from being indexed seperatelty. John recommends using the new “indexifembedded” robots tag (in conjunction with standard noindex robots tags) to control which versions of the embedded videos are being indexed.
If URLs that are blocked by robots.txt are getting indexed by Google, it may point to insufficient content on the site’s accessible pages
Why might an eCommerce site’s faceted or filtered URLs that are blocked by robots.txt (and have a canonical in place) still get indexed by Google? Would adding a noindex tag help? John replied that the noindex tag would not help in this situation, as the robots.txt block means it would not be seen by Google.
He pointed out that URLs might get indexed without content in this situation (as Google cannot crawl them with the block in robots.txt), but they would be unlikely to show up for users in the SERPs, so should not cause issues. He went on to mention that, if you do see these blocked URLs being returned for practical queries, then it can be a sign that the rest of your website is hard for Google to understand. It could mean that the visible content on your website is not sufficient for Google to understand that the normal (and accessible) pages are relevant for those queries. So he would first recommend looking into whether or not searchers are actually finding those URLs that are blocked by robots.txt. If not, then it should be fine. Otherwise, you may need to look at other parts of the website to understand why Google might be struggling to understand it.
Noindexing pages with geo IP redirection is not ideal
One user asked about the use of geo IP redirection in conjunction with noindex tags. The example was having separate pages targeted at users in multiple locations, but using noindex tags to ensure just one is indexed.
John raised the point that Google typically crawls from one location (mostly using a Californian IP address). If the IP address directs Google to one of the URLs you have set to noindex, it might result in those pages not being indexed full stop. This approach, therefore, isn’t recommended. Instead, you should focus on making location-specific content easier to find once the user has landed on the site.
Showing less content to search engines than to users isn’t necessarily a cloaking issue
John was asked about a website that had a lot of noindexed pages that had HTTP errors. They asked whether it’s considered ‘cloaking‘ to show an empty HTML page to bots to get those URLs de-indexed, while still showing users the page.
John mentioned that the part of ‘cloaking’ that is an issue is when search engines get more or vastly different content than users. Google wants to avoid promising users something they can’t find when they go to a page from a query. However, showing an empty page with a noindex will cause Google to drop those URLs and they will not care if users see something different because the page will not appear in search results.
Having a high ratio of ‘noindex’ vs indexable URLs could affect website crawlability
Having noindex URLs normally does not affect how Google crawls the rest of your website—unless you have a large number of noindexed pages that need to be crawled in order to reach a small number of indexable pages.
John gave the example of if a website that has millions of pages with 90% of them noindexed, as Google needs to crawl a page first in order to see the noindex, Google could get bogged down with crawling millions of pages just to find those 100 indexable ones. If you have a normal ratio of indexable / no-indexable URLs and the indexable ones can be discovered quickly, he doesn’t see that as an issue to crawlability. This is not due to quality reasons, but more of a technical issue due to the high number of URLs that will need to be crawled to see what is there.
Speed up re-crawling of previously noindexed pages by temporarily linking to them on important pages
Temporarily internally linking to previously noindexed URLs on important pages (such as the homepage) can speed up recrawling of those URLs if crawling has slowed down due to the earlier presence of a noindex tag. The example given was of previously noindexed product pages and John’s suggestion was to link to them for a couple of weeks via a special product section on the homepage. Google will see the internal linking changes and then go and crawl those linked-to URLs. It helps to show they are important pages relative to the website. However, he also stated that if significant changes are made to internal linking, it can cause other parts of your site which are barely indexed to drop out of the index—this is why he suggests using these links as a temporary measure to get them recrawled at the regular rate, before changing it back.
If a page is noindexed for a long period of time, crawling will slow down
Having a page set to noindex for a long time will cause Google’s crawling for it to slow down. Once a page is indexable again, crawling will pick up again, but it can take time for that initial recrawling to happen. He also mentioned that Search Console reports can show a worse situation than it actually is but you can use things like sitemaps and internal linking to speed up recrawling of them.
To better control page indexing, use ‘noindex’ on pages rather than ‘nofollow’ tags on internal links
Adding rel=”nofollow” tags to internal links is not recommended as a way to control indexing. Instead, John suggests adding noindex tags to pages that you don’t want indexed, or removing internal links to them altogether.
Noindexed pages generally do not count towards content quality algorithms
Google focuses on the quality of the content they have indexed. If it’s not shown in search, it’s generally not taken into account.