A sitemap is a list of all of the live URLs which exist on a site and is used to inform search engine crawlers of the most important pages and therefore which ones should be crawled and indexed. There are several things to consider when creating sitemaps, as well as understanding how search engines view them. We cover a range of these topics within our SEO Office Hours Notes below, along with best practice recommendations and Google’s advice on sitemaps.

It’s okay if the same URL appears on multiple sitemap files

July 21, 2022 Source

It’s fine to have the same URL included in multiple sitemap files. The only caveat is ensuring that there is no conflicting information being provided across the different sitemaps. For example, having a URL in a ‘regular’ sitemap and an hreflang-specific sitemap (for different language versions of your site) is perfectly acceptable, as long as any hreflang annotations given to that page are consistent across both sitemaps.

It’s possible to host sitemap files on a separate domain

March 17, 2022 Source

One user asked whether they could host their sitemap files externally (perhaps on a separate server or a staging site). John explains that yes, that’s possible as long as the sitemaps are handled correctly. This means either having both domains verified in Google Search Console (GSC), or including a link to the sitemap file within robots.txt. Redirecting the old sitemap to the new location is also a best practice here (note some reporting issues may occur in GSC if the sitemaps are on a different domain, but this shouldn’t impact the functionality of the sitemap file itself).

Robots.txt file size doesn’t impact SEO, but smaller files are recommended

January 27, 2022 Source

John confirmed that the size of a website’s robots.txt file has no direct impact on SEO. He does, however, point out that larger files can be more difficult to maintain, which may in turn make it harder to spot errors when they arise.

Keeping your robots.txt file to a manageable size is therefore recommended where possible. John also stated that there’s no SEO benefit to linking to sitemaps from robots.txt. As long as Google can find them, it’s perfectly fine to just submit your sitemaps to GSC (although we should caveat that linking to sitemaps from robots.txt is a good way to ensure that other search engines and crawlers can find them).

Image sitemaps can be useful for sites that use lazy loading

November 1, 2021 Source

When “lazy loading” images on a page in a way that doesn’t include defined image elements, it’s recommended to have back-up in the form of structured data or an image sitemap. That way, Google will know to associate those images with the page even before they’re loaded.

Use Sitemaps Ping, Last Modified and Separate Sitemaps to Index Updated Content

March 20, 2020 Source

To help Google index updated content more quickly, ping Googlebot when a Sitemap has been updated, use Last Modified dates in Sitemaps, and use a separate Sitemap for updated content so it can be crawled more frequently.

Specify Timezone Formats Consistently Across Site & Sitemaps

February 18, 2020 Source

Google is able to understand different timezone formats, for example, UTC vs GMT. However, it’s important to use one timezone format consistently across a site and its sitemaps to avoid confusing Google.

Include Most Recently Changed Content in Separate Sitemap

February 18, 2020 Source

Rather than submitting all of your sitemaps regularly to get Googlebot to find and crawl newly updated pages, John recommends adding recently changed pages into a separate sitemap which can be submitted more frequently, while leaving more stable, unchanged pages in existing sitemaps.

Use the Last Modified Date to Provide a Hierarchy of Changes Made to A Site

February 7, 2020 Source

John recommends using the last modified date in sitemaps in a reasonable way to provide a clear hierarchy of the changes that have been made on a site. This helps Google to understand which pages are important and ensures they focus on crawling these first.

“Discovered Not Indexed” Pages May Show in GSC When Only Linked in Sitemap

October 29, 2019 Source

Pages may show as “Discovered Not Indexed” in GSC if they have been submitted in a sitemap but aren’t linked to within the site itself.

Related Topics

Crawling Indexing Crawl Budget Crawl Errors Crawl Rate Disallow Directives in Robots.txt Last Modified Nofollow Noindex RSS Canonicalization Fetch and Render