Sitemap Audits and Advanced Configuration

Tristan Pirouz
Tristan Pirouz

On 10th August 2015 • 13 min read

While Google and other search engines are getting better at finding pages on their own, Sitemaps can help by giving them extra information about your pages and help them to crawl more efficiently.

In this post we’ll cover general Sitemap rules and advanced configuration. For a full guide to Sitemap implementation, please see Google Webmaster support.

 

Who needs Sitemaps?

 

Large sites:

If you have a large number of pages that constantly churn, with old pages expiring and new ones being created every day, search engines might have to crawl through thousands of existing pages to find the few hundred new pages created. Sitemaps can help them find the new content quickly.

 

Publishing sites:

If your site is set up to be indexed in Google News, an XML Sitemap containing content less than 48 hours old with additional meta data can significantly improve the indexing of content, even if the web crawler has problems.

 

Uncrawlable Sites:

In the early days of the web, many websites were built with content accessible through forms, which search engines could not crawl. Sitemaps were a way to help work around this problem. However, most websites have been completely rebuilt since this problem was understood so it’s been solved in most cases with good internal linking.

 

Everyone?

A Sitemap might not be required if you have a small or optimized site, but they do mean you’ll get extra Webmaster Tools reports that give great feedback on indexing problems. Consider implementing one as a way to get more information on how your site is performing.

 

Creating Sitemaps: A Quick Guide

 

What to Include

 

Formatting

 

Thresholds

 

Internal Linking & Sitemaps: Identifying Gaps

Pages can exist in the Sitemaps but not linked internally, or they can be linked internally but not included in Sitemaps. Whether accidental or deliberate, both scenarios are a problem and should be fixed. Either improve your internal linking structure to include all pages in the Sitemap, or update your Sitemap(s) to include all pages that are linked within the site.

If you have linked URLs or URLs in Sitemaps that don’t generate traffic or are no longer required, disallow or delete them to minimize your crawl space.

 

Other Considerations

For more information on creating Sitemaps, please visit Google Webmaster support.

For more information on creating Sitemaps, please visit Google Webmaster support.

 

Naming Your Sitemaps

How you name your Sitemaps depends on how public you want them to be: some sites choose to keep them private so that competitors can’t access data about their site’s structure.

 

Public:

If you want to make your Sitemap or index Sitemap accessible to everyone, name it sitemap.xml. Include all of your Sitemap index URLs, or individual Sitemaps, in your robots.txt file so that Google can find them.

 

PRIVATE TO ANYONE WITHOUT A LINK:

To hide your Sitemaps from competitors, consider naming them something that could not be guessed. Remove the Sitemap URL from the Robots.txt and submit your Sitemap(s) manually so that Google can find it.

 

ONLY ACCESSIBLE BY SEARCH ENGINES (ADVANCED):

Do a reverse DNS lookup on the request IP address to confirm the identity of the user and block access. Submit your Sitemap(s) manually.

 

MULTIPLE SITEMAPS FOR THE SAME SITE

Using one Sitemap for a very large site might be unwieldy and unmanageable, putting your site at risk of errors and meaning you’ll waste time by sifting through a large amount of data. Splitting it into multiple Sitemaps can help.

 

USE A DIFFERENT SITEMAP FOR DIFFERENT TYPES OF CONTENT:

Generally it’s useful to include as many Sitemaps as possible, broken down into different types. For example, one for product pages, one for new product pages and one for category pages.

You can also use an extra Sitemap for different purposes, such as:

 

INDEX SITEMAPS:

Index Sitemaps allow you to build multiple Sitemaps and submit them to Google together. The structure of index Sitemaps should be two levels deep: don’t nest index Sitemaps within other index Sitemaps.

 

MULTI-DIMENSIONAL SITEMAPS (ADVANCED):

Multi-dimensional Sitemaps allow you to include the same URL in multiple Sitemaps. For example, for an ecommerce site, you could use a set with your products broken down into main categories. With multi-dimensional Sitemaps, you could also include additional Sitemaps with the products grouped by those in stock, and those out of stock.

Using this method, you might be able to identify a pattern of pages out of stock that are not being indexed, which you wouldn’t necessarily spot from the category Sitemaps.

 

SITEMAP AUDITS WITH DEEPCRAWL: USEFUL REPORTS

 

CRAWL TYPE: UNIVERSAL CRAWL

Run a Universal Crawl to check all the URLs in your Sitemaps and compare them to the rest of your site. Once the crawl has finished, you will be able to drill down to each URL (select your report and click the URL you want to analyze).

universal sitemaps in deepcrawl

A Universal Crawl with a full crawl of the website will also reveal gaps in the Sitemap or internal linking structure by showing you where they don’t match: you can see which URLs are in the Sitemap but not linked, and those that are linked but not contained in your Sitemap.

DeepCrawl will automatically detect your Sitemaps. If you need to manually identify the Sitemaps for DeepCrawl, then the same is probably true for Google.

Remember that, like Google, DeepCrawl will only crawl your Sitemap two levels deep.

 

1. XML SITEMAPS

Navigate to Universal > XML Sitemaps in your report to view the HTTP status, type, errors and the number of URLs for each discoverable Sitemap on your site.

 

2. ALL URLS IN XML SITEMAPS

Check that your Sitemaps contain all the URLs intended by going to Universal > All URLs in XML Sitemaps in your report. Click a URL to see all information about the URL in one dashboard:

Universal Crawl
 

3. BROKEN XML SITEMAPS

Check for XML Sitemaps that return a 4XX or 5XX error using Universal > Broken XML Sitemaps in your report.

 

4. MISSING FROM SITEMAPS

Use Universal > Missing In Sitemaps to find URLs that are linked internally but that aren’t in your Sitemaps. Add these URLs to maximise your indexable space.

 

5. ONLY IN SITEMAPS

Use Universal > Only In Sitemaps within your report to find pages that are included in your Sitemaps, but that aren’t linked internally. If this is a mistake, you can use this information to link these pages from within your site.

Click a URL to see all the information about that page; from here you can view the Sitemaps In tab to see all the Sitemaps the URL is in:

Sitemaps
 

6. REDIRECTING SITEMAP URLS

The report at Universal > Redirecting sitemap URLs will show you all URLs that are included in the Sitemap but that are returning a 3XX status. These should be removed from the Sitemap, or replaced with the new URL if this is not already included.

 

7. HREFLANG IN SITEMAPS

View all the hreflang tags contained in Sitemaps for a particular URL, along with any conflicts.

To get to this information, select any report under Universal and click a URL you wish to examine. Click the HREFLANG tab in the subsequent screen to see the hreflang configuration for that page:

Sitemaps HREFLANG

Author

Tristan Pirouz
Tristan Pirouz

Tristan is a Former Head of Marketing at Deepcrawl.

 

Tags

Choose a better way to grow

With tools that will help you realize your website’s true potential, and support to help you get there, growing your enterprise business online has never been so simple.

Book a Demo