APIs & Crawl Budget: Don’t block API requests if they load important content
An attendee asked whether a website should disallow subdomains that are sending API requests, as they seemed to be taking up a lot of crawl budget. They also asked how API endpoints are discovered or used by Google.
You could help avoid crawl budget issues here by making sure the API results are cached well and don’t contain timestamps in the URL. If you don’t care about the content being returned to Google, you could block the API subdomains from being crawled, but you should test this out first to make sure it doesn’t stop critical content from being rendered.
John suggested making a test page that doesn’t crawl the API, or uses a broken URL for it, and see how the page renders in the browser (and for Google).
Do Not Rely on 3rd Party Cookies to Render Content
Because Chrome is going to block 3rd party cookies, and Google uses Chrome to render pages, if your site is dependent on third party cookies to render a page’s content then it won’t be seen by Google.
Different Rendering Processes are Used When Rendering a Page For Indexing & for Users
Googlebot doesn’t have a specific time when it takes the rendered DOM snapshot used for indexing. The main reason is due to the way Google renders pages, as there are different processes when rendering for indexing compared to when users access a page. This can result in elements on the site being processed differently and it may take longer to render the page for indexing purposes.
Use Chrome DevTools and Google Testing Tools to Review a Page’s Shadow DOM
There are two ways to inspect a page’s shadow DOM in order to compare it to what Googlebot sees. The easiest way is by using the Chrome DevTools, within the inspector you will see # shadow route which you can expand, this will display what the shadow DOM contains. You can also use any of the testing tools and review the rendered DOM, this should contain what was originally in the shadow DOM.