How to Audit & Improve Your Sitemaps for Better Website Crawlability
While SEO best practices are constantly evolving with new search engine algorithm updates, new SERP features, and new technologies like voice search, it remains important to consider the longstanding elements of search engine optimization that still offer ample opportunities to help boost your page rankings.
In this article, we’ll look at how sitemaps can help boost your website’s crawlability so that your pages will be easily indexed by search engines and, ultimately, have a better chance at ranking in the SERPs and driving organic traffic.
This post is part of Deepcrawl’s series on Website Health. Our recent “SEO Revenue Funnel” article is a great starting point and guide for digital marketers who want to make sure their SEO strategy is touching on all of the core aspects that contribute to a high-performing website.
Here, we’ll dive deep into sitemaps, an SEO element that relates to your site’s overall Crawlability.
What are sitemaps?
Google led the charge on promoting the use of sitemaps all the way back in 2005. Over at Google Search Central, they are described as “a file where you provide information about the pages, videos, and other files on your site, and the relationships between them.”
Essentially, sitemaps help search engines crawl your website more efficiently.
Improving your sitemaps makes your content more visible to search engines. Do you have video content on your site that you’d like to rank for a certain keyphrase? Adding it to your sitemap increases the chances that search engines will find it. Have valuable pages on your site that don’t currently have many links pointing to them? Even without many backlinks to a particular page, if you’re including it in your sitemaps, you’ll know it’s being crawled.
Sitemaps are particularly important for larger sites. If your site is very small or has just a few pages that are all clearly linked to from the homepage, a sitemap might not be a priority for you. But if you have lots of important content across a high volume of pages, a sitemap can help ensure these pages aren’t being missed by search engines.
Note: While sitemaps are the tried-and-true, longstanding method to help signal structural website changes to search engines like Google, it’s worth keeping an eye on this space. Recently, you may have heard that Bing has introduced an indexing feature called “IndexNow” that will automatically flag new page additions and deletions to Bing. Google also has a less-discussed Indexing API that is intended only for short-lived content, such as job postings or live broadcasts. Per their guidance for the Google Indexing API:
“Currently, the Indexing API can only be used to crawl pages with either JobPosting or BroadcastEvent embedded in a VideoObject. For websites with many short-lived pages like job postings or livestream videos, the Indexing API keeps content fresh in search results because it allows updates to be pushed individually.”
Again, this Google API is meant only for specific types of short-lived content — their guidance on the API page itself clearly recommends that you still submit a sitemap for your website as a whole.
How to find & audit your website’s existing sitemap
Unsure whether or not you have a sitemap already in place? There are several methods you can use to search a website for its sitemap.
As a starting point, here are several options to help you identify whether or not your website has an active sitemap:
1. Manual search within your domain, robots.txt, or Google
- Often, websites will have a URL for their sitemaps from the root domain, though this isn’t a requirement. Try simply typing your domain and adding /sitemap to the end. (ex: examplesite.com/sitemap). If nothing comes up, however, this doesn’t mean you don’t have an existing sitemap.
- You can also check your robots.txt file for a link to your sitemap. Oftentimes, a robots.txt file will include a line for “Sitemap:” with the URL to your sitemap’s XML file.
Another manual method is searching Google for XML files that exist within your website, by typing “site:[your domain name] filetype:xml” into the search. Here’s an example, using mcdonalds.com:
- (However, it’s worth noting that not all sitemaps use XML as their file format. Search engines accept various file formats for sitemaps, including XML, RSS, and text files. Additionally, not all sitemaps are indexed by Google, so even if your search returns no results, it doesn’t necessarily mean your website does not have an active sitemap.)
2. Search for sitemaps in your CMS
- Some Content Management Systems (CMS) automatically generate sitemaps, for example, Shopify and Squarespace. When you’re logged in, you can find the sitemap within your CMS.
- Other CMS won’t automatically generate sitemaps, but users frequently utilize plugins like Yoast or XML Sitemaps to easily generate them from within their CMS. If you use WordPress as your CMS, for example, log in and search your plugins for those that might generate sitemaps and check whether you have an existing sitemap there.
3. Use a tool to find your sitemaps
- If you have a Google Search Console account set up for your website, you can use this tool to find your sitemaps as well by navigating to the ’sitemaps’ section of the platform.
- Additionally, other online tools exist that will allow you to search for a website’s sitemap. Depending on the feature set of the tool you choose, these can usually tell you all the URLs that are currently in your sitemap, those that aren’t, as well as identify things such as orphaned URLs — those pages which are in your sitemap but aren’t being crawled.
- Of course, you can also use a more robust SEO platform like Deepcrawl to find and monitor your sitemaps.
How to create a sitemap from scratch
If you don’t already have a sitemap, don’t worry. It’s easy to create a sitemap for your website and there are plenty of ways to go about it!
As mentioned above, your CMS may be able to generate a sitemap automatically. Or you could use an SEO plugin from within your CMS, such as Yoast or XML Sitemaps. But as a quick Google search will show you, there are plenty of other sitemap generators available as well.
It is also possible to generate a sitemap manually, but it is important to remember that Google only recommends this for sites that only have a few dozen URLs as most. The most obvious way to do this manually would be to write the sitemap up in XML format within a text editor before submitting it to search engines.
It’s easy to see how generating sitemaps manually could get very time-consuming, particularly for large websites or sites with frequently added URLs. It is also a best practice to ensure your sitemap is up-to-date as your website grows. In this case, a dynamic sitemap that updates automatically as pages are added is clearly the way to go.
Note: In addition to general website sitemaps, Google also recommends that you create video sitemaps, image sitemaps, and news sitemaps to help search engines crawl these alternative forms of content.
What to include in your sitemap & sitemap best practices
A key thing to remember when building out your sitemaps is prioritization. As we wrote in our eBook on Site Architecture: “Sitemaps that endeavor to serve everything are a waste of resources and should be rectified. They should only serve HTTP 200 indexable, non-canonicalized URLs. “
For XML sitemaps, sitemaps.org provides an overview of the XML schema for the Sitemap Protocol.
For extra-large websites, such as those belonging to enterprise companies or eCommerce businesses, you can also consider splitting up your sitemaps and using a sitemap index to keep things organized. According to sitemaps.org: “Sitemaps should be no larger than 50MB and can contain a maximum of 50,000 URLs.” This helps ensure your server does not get slowed down by including a very large file.
Other best practices to keep in mind when building your sitemaps:
- Only include URLs that return a 200 status code (redirecting URLs are okay to include only temporarily, but they should not be left in a sitemap as they can cause unnecessary crawling on your site that might negatively impact your crawl budget and crawl efficiency).
- Do not include non-indexable pages (ie, those with a noindex tag or those that are canonicalized to another URL).
- Do not include more than 50,000 URLs in a single sitemap. Use a sitemap index if you have more than 50,000 URLs on your website.
- Whether you manually create your sitemap or use your CMS, a plugin, or another tool to create it, make sure it’s using the standard sitemap protocol if you’re using an XML format for your sitemap.
- For sites with multiple language variations, you can also include this information in your sitemaps.
- Depending on the content of your website, you should also consider including video sitemaps, image sitemaps, and Google News sitemaps to assist with crawlability for these specific types of content.
- Consider using sitemap generators that will automatically update your sitemaps when a new page is created (or when there are changes to existing URLs) to ensure your information stays up-to-date and all your important pages are signaled to search engines.
HTML sitemaps for user experience
We’ve covered a lot of ground with sitemaps in the context of search engine crawlability. But we will all have seen instances where sitemaps are visible to the user as well. Sometimes they appear in the website footer, or the site might have its own dedicated sitemap page.
These HTML sitemaps can be good for UX, making it easier for users to navigate your site from a single page, without needing to click through numerous internal links or use the site search function to find what they are looking for. These user-facing sitemaps can also help with SEO by providing internal links and helping to distribute page rank throughout your website.
How to submit your sitemap to search engines
So you’ve generated your sitemap, but how do you make sure the search engines know about it?
There are a few ways you can let Google know about your sitemap. The first way would be to use the Sitemaps report tool in Google Search Console.
You can also ping Google direct from your browser – which would look like this:
Or you could add the following line to your robots.txt file:
…this will prompt Google to crawl your sitemap next time it crawls your robots.txt file.
Similar to Google, it is possible to let Bing know about your sitemap by pinging them via an HTTP request or by adding the sitemap text line to your robots.txt file.
You can also submit your sitemap directly to the search engine via the Bing Webmaster dashboard:
Don’t overlook submitting your sitemap to Bing. Many other search engines including Yahoo, DuckDuckGo and Ecosia use this search engine (along with other services) to generate their results – so if Bing knows about your sitemap, these other search engines are likely to as well.
SEO is always evolving, but sitemaps are still a critical tool to help your website get crawled, indexed, and ultimately, make your content more visible in the SERPs. They are also something of a “low hanging fruit” with regard to on-site optimization – there are four simple steps to get started with sitemap improvements:
- Check whether or not you have an existing sitemap in place (via a manual check, through your CMS, or using an SEO tool).
- If you don’t have an existing sitemap, generate one with an automated tool or plugin (unless your site is very small and you can write one manually). If you do have a sitemap in place, ensure that it is up-to-date and commit to regularly monitoring it for issues to ensure it remains up to date as your site grows and changes. (Note: Deepcrawl and other SEO tools can be helpful in better monitoring and auditing your sitemaps, particularly for large websites with frequently added or modified URLs).
- Consider incorporating a sitemap into the user experience. An HTML sitemap in the footer, or on a dedicated page, can help users (and crawlers) navigate your site and improve UX (and SEO indirectly) and can help you control how users move through the website.
- Submit your sitemap to Google and Bing.
About Deepcrawl’s Website Health Series
This educational series on Website Health is designed to help digital marketers and SEOs better understand the disparate elements that contribute to a healthy, high-performing website—one that ranks highly in the SERPs. Crawlability (which can be influenced by sitemaps) is an important part of your website’s overall health. Remember: if search engines cannot crawl your website, then they won’t index it (or show it to users in response to their queries!). Over the next few months, we’ll be providing overviews and deep dives into the fundamentals of website health. Whether you’re just getting started with SEO or looking for a refresher on key concepts (or articles to share with the less-techy members of your team), we hope you’ll join us in making website health a priority in 2022 and beyond.