Restricting a Crawl to Certain Pages

Tristan Pirouz
Tristan Pirouz

On 11th October 2016 • 2 min read

You may want to check or analyse a specific section of your website, instead of crawling your full site.

This can be useful after a new website channel addition, to filter out script based URLs and subdomains, or to ensure your URL credits are used for specific sections of a website.

It’s also handy for international websites, where you may only want to analyse a specific country.

You can restrict a crawl to any set of pages, using a mixture of inclusion and exclusion rules found in the Advanced Settings, in step 4 of the crawl setup.

Deepcrawl - Restricting a crawl to certain pages - Crawl settings
 

Include Only URLs (Positive Restriction)

Use the ‘Include Only URLs’ field in Advanced Settings to crawl a single path only.

Deepcrawl - Restricting a crawl to certain pages - Include only

Add your URL paths on separate lines, to limit your crawl to URLs that only include your specified paths.

Please Note:

 

Exclude URLs (Negative Restriction)

Use the ‘Exclude URLs’ field in Advanced Settings, to exclude pages or channels you don’t need to see in your reports.

Deepcrawl - Restricting a crawl to certain pages - Exclude URLs

The include/exclude filters also work on full URLs, including hostnames and protocols.

For example, here’s how you would prevent your HTTPS site from being crawled.

Deepcrawl - Restricting a crawl to certain pages - Exclude https

Please note:

 

Page Sampling

The Page Sampling fields also enable you to limit the % of URLs crawled for groups of pages based on their URL patterns.

Add a name, a regular expression or directory and a maximum % of matching URLs to crawl.

Any URLs matching the group will be counted. When the limit is reached, any new matching URLs will not be crawled, but will be included in the Page Group Restricted URLs report.

Deepcrawl - Restricting a crawl to certain pages - Page sampling

Author

Tristan Pirouz
Tristan Pirouz

Tristan is a Former Head of Marketing at Deepcrawl.

Choose a better way to grow

With tools that will help you realize your website’s true potential, and support to help you get there, growing your enterprise business online has never been so simple.

Book a Demo