APIs & Crawl Budget: Don’t block API requests if they load important content
An attendee asked whether a website should disallow subdomains that are sending API requests, as they seemed to be taking up a lot of crawl budget. They also asked how API endpoints are discovered or used by Google.
You could help avoid crawl budget issues here by making sure the API results are cached well and don’t contain timestamps in the URL. If you don’t care about the content being returned to Google, you could block the API subdomains from being crawled, but you should test this out first to make sure it doesn’t stop critical content from being rendered.
John suggested making a test page that doesn’t crawl the API, or uses a broken URL for it, and see how the page renders in the browser (and for Google).
Legal and age-verification interstitials can affect crawling
Interstitials (for example, those that require users to verify their age before browsing) can have a negative impact on crawling and indexing if not implemented properly. Googlebot doesn’t click buttons or fill in forms, so any interstitial that requires those actions before the page is loaded may prevent Googlebot from crawling the content itself. Ideally, it’s recommended to use JS/CSS to display interstitials on top of existing content that’s already been loaded in. Because the content is still being loaded and users can access it after navigating the interstitial, this would not be seen as cloaking.
Not all hidden text is considered bad
One user asked whether all hidden text goes against Google’s Webmaster guidelines. John explained that hidden text becomes problematic when it’s there to deceive search engines. Google is generally pretty good at identifying when this is the case, so hidden text that’s not deceptive is generally not a problem. One example given is around accessibility. Content that’s there to aid screen readers (but isn’t visible on the page) is just one way that hidden text can serve a very valid purpose.
Showing less content to search engines than to users isn’t necessarily a cloaking issue
John was asked about a website that had a lot of noindexed pages that had HTTP errors. They asked whether it’s considered ‘cloaking‘ to show an empty HTML page to bots to get those URLs de-indexed, while still showing users the page.
John mentioned that the part of ‘cloaking’ that is an issue is when search engines get more or vastly different content than users. Google wants to avoid promising users something they can’t find when they go to a page from a query. However, showing an empty page with a noindex will cause Google to drop those URLs and they will not care if users see something different because the page will not appear in search results.
Google ignores content in ‘noscript’ tags
A question was asked about whether using noscript tags could be a workaround for getting content seen by Google. John said that Google generally ignores content in noscript tags, so it wouldn’t be a workaround if the content you are including in noscript was content that you wanted to be included for indexing.
Google can only index what Googlebot sees
In response to a question about whether there are cloaking issues around showing Google different content vs. what a user would see on a more personalized page, John clarified that only what Googlebot sees is indexed. Googlebot usually crawls from the US and crawls without cookies, so whatever content is there would be what is indexed for the website. So, on personalized pages, make sure that you’re only changing things for users that are not critical to how you want to be seen in search.
Use View Source or Inspect Element to Ensure Hidden Content is Readily Accessible in the HTML
If you have content hidden behind a tab or accordion, John recommends using the view source or inspect element tool to ensure the content is in the HTML by default. Content pre-loaded on the HTML will be treated as normal content on the page, however, if it requires an interaction to load, Google will not be able to crawl or index it.
Ensure Accordion FAQS Are Set up Correctly When Using Structured Markup
An accordion style format can be used with FAQ structured markup to expand the answers when clicked on, as long as the question is visible by default.
Above the Fold Content is Not Always Prioritized For Rankings
Google doesn’t necessarily prioritize the content above the fold when ranking a page, but will also take into account other elements on the site.