Duplicate Content

What is duplicate content in SEO? Duplicate content occurs when there is the exact same (or very similar) content appearing in multiple places on a website.
There are several SEO issues that can occur when a website has duplicate content, including crawl budget issues, search engine indexing issues, index bloat, keyword cannibalization, and canonical tag issues.
Our SEO Office Hours recaps below compile best practices Google has recommended for websites dealing with duplicate content issues.
(See our full guide to duplicate content for even more actionable tips on how SEOs can address duplicate content issues.)

Unless locations have unique content offerings, separate pages are not recommended

December 6, 2021 Source

When asked about whether to canonicalize so-called ‘doorway pages’, John was keen to stress that there’s no one solution that fits every situation. The example given was a site that has separate pages for ‘piano lessons birmingham’ and ‘piano lessons london’. If there’s something unique about the offerings in each city, it’s generally fine to have separate URLs. If the information on both is the same, it’s recommended to consider folding these into one ‘stronger’ page, rather than diluting signals across multiple near-identical ones. You could also consider a mix of the two approaches if there’s a stand-out, unique element in one of those locations.

Make sure important content is not found only on canonicalized pages

November 17, 2021 Source

John answered a question about whether duplicate content that appears in some form on both the canonicalized page and the canonical page needs to match. He replied that they don’t need to have the exact same content. With a canonical tag, Google will try to index the canonical page that was specified. If there is any unique content on the non-canonical pages then it won’t be indexed. So make sure that any content that is critical from canonicalized pages is also on the canonical page.

The URL parameter tool does not prevent pages from being crawled

October 30, 2021 Source

John explained that any URLs set to be ignored within the URL Parameter tool may still be crawled, albeit at a much slower rate. Parameter rules set in the tool can also help Google to make decisions on which canonical tags should be followed.



FAQ Content Should be Specific to Each Page

March 20, 2020 Source

Content you provide in an FAQs section should be specific to each individual page and not copied across multiple pages.

Duplicated Same Language Content for Different Countries May Not be Indexed but Can Show in Search Results

February 21, 2020 Source

If you have same language content for different countries, Google will see them as duplicated and fold them together for indexing, but unfold them in search results.

Technical Issues Can Cause Content to be Indexed on Scraper Sites Before Original Site

January 7, 2020 Source

If content on scraper sites is appearing in the index from those sites before the original site, this could be due to technical issues on the original site. For example, Googlebot might not be able to find main hub pages or category pages or may be getting stuck in crawl traps by following excess parameter URLs.

Google’s Algorithms Should be Able to Detect & Prioritize Original Content From Near Duplicate Versions

October 29, 2019 Source

Google’s algorithms will ideally be able to detect spun content which has been rewritten from another source and see the original content as more valuable.

Having Multiple Pages for Different Product Variations Isn’t a Problem

July 26, 2019 Source

John recommends two approaches for products with multiple variations, either ensure each individual page is indexed or have one main product page with each variation option available. The best method depends on the size of the site and the uniqueness of each variation.

GSC Data Across Duplicate Language Versions Will Only be Shown for Selected Canonical

July 23, 2019 Source

Even if you have hreflang set up correctly, Google can fold together similar language version pages and choose one to index, meaning that data in Google Search Console will only be shown for the one selected canonical page.

Related Topics

Copyright/DMCA Issues Thin Content Embedded Content Images User Generated Content Hidden Content Interstitial Pop-ups Expired Content Keyword Optimization Header and Subhead Tags Page Structure Web Spam Videos Social Media