How to find 404 errors on your website — and what to do with them
It goes without saying that linking to broken pages on your domain is not recommended. Not only are broken links annoying for users, but they can also slow down search engine crawlers and contribute to issues like higher bounce rates, or less time spent by visitors on your site.
But 404 errors are not an inherently bad thing. In fact, returning a 404 error page when a user navigates to a URL that doesn’t exist is often considered best practice. Google has stated many times that 404s alone don’t harm your overall website’s indexing or ranking. So what impact are 404s actually having, and is there any real benefit to cleaning up 404 errors on your site?
This post is part of Deepcrawl’s series on Website Health. We are diving deep into each of the 7 categories of the SEO Funnel to help SEOs and marketers learn more about the many elements of search engine optimization that contribute to a high-performing, healthy website. Here, we’re diving into HTTP status codes — specifically 4xx errors — and discussing how they affect your SEO.
What are 404 errors? (404 HTTP status codes explained)
Every time a user or search engine attempts to access your URL, an HTTP request is made and your server sends out an HTTP status code that is used to indicate whether the request to access a page was successful.
HTTP status codes fall into one of five categories, which determine how they’re treated by Google:
- 1xx – Informational response
- 2xx – Success
- 3xx – Redirection
- 4xx – Client errors
- 5xx – Server errors
Returning a 404 error code signals that a page has not been found. Perhaps the content on that URL has been removed, or perhaps there was never anything there to begin with. All the web browser knows is that the requested content cannot be located at that address.
How to find 404 errors on your website
Finding 404 errors using Deepcrawl
Using Deepcrawl, finding pages that return 404 status codes is as simple as navigating to the “All Pages” report and filtering by “HTTP Status Code > Equals > 404”. You can also use the “Broken Pages” report for a full list of 4xx errors (note that this includes other 4xx responses as well as 404s, such as 403s and 401s. Again, you can use filtering here to remove those from the list).
An overview of all non-200 pages and their status codes is also located in the main dashboard. Simply click the bar next to “Broken Pages (4xx Errors)” to be directed to the relevant report.
Alongside reports that identify all of your site’s 404 errors, there’s the option to filter this down further by source. For instance, finding all of the 404 pages that have backlinks pointing to them is as simple as navigating to the “Broken Pages with Backlinks” report. Or, if you wanted to see all of the 404 pages being linked to internally, the “Broken Links” report is the one to use.
Our “Unique Broken Links” report can also be useful when prioritizing URLs that need urgent attention. Here you’ll find all of the broken pages that are linked to internally on your site, handily sorted by URL and anchor text. It’s an easy way to see which 404 pages are linked to most commonly, and from where.
Finding 404 errors with Google Search Console
If you’re not using Deepcrawl, then Google Search Console is a good starting place for finding URLs that return a 404 error code. The Coverage report in GSC contains a list of URLs that have been submitted to Google and returned a 404 status code when they were last crawled.
A note on “soft 404s”
If you’re using GSC to locate 404 error pages, you might also notice a report on something called a “soft 404”. Soft 404s are pages that tell a user that a page does not exist, but still return a valid 200 response code. A soft 404 is an indication that Google has found no content of value on that page, or is otherwise struggling to make sense of why the page exists.
Soft 404s are different from standard 404s in that they’re not truly returning a broken page response code. However, the label of a soft 404 can be enough for Google to drop a page from its index.
Google often sees empty pages as soft 404s. If you’re using Deepcrawl, try setting up a custom extraction for pages with a word count below a certain threshold. If you see URLs of any value appearing here or in GSC’s soft 404 report, it’s worth taking the time to review these separately and make any relevant on-page improvements.
Finding 404 errors with Google Analytics
Google Analytics doesn’t provide a specified report for 404 pages. However, it’s possible to find them if you know the standard page title given to 404 pages on your domain. Simply head to “Behavior > Site Content > All Pages” and set the primary dimension to “Page Title”. From there, you should be able to filter results by entering the 404 page title into the search box.
When 404s become a problem for SEO
Once you’ve identified your 404 pages, it’s time to determine whether or not they need fixing.
As mentioned, having pages that return a 404 error isn’t necessarily cause for concern. Diagnosing whether or not a broken page needs fixing is more about understanding how and when users might encounter that page.
Returning a 404 error page when you’re certain that a URL should not exist on your site is widely accepted. Google’s own documentation confirms that having some 404 errors alone will not harm your site’s search performance. However, there are some instances where 404s may require some extra attention, including:
- Submitted URLs that return a 404 status code
- Content that has been moved to another location (this should result in a 3xx redirect rather than a 404)
When should a page be a 404?
No two sites are exactly the same, but there are some general rules to follow when deciding which action to take around removed or relocated pages.
If the page should still exist…
While a natural part of the web, 404 errors can still occur where they’re not supposed to. Restore any content that’s been accidentally removed and wait for the page to be re-indexed by search engines.
If the page has been temporarily removed…
A 404 status code isn’t the recommended course of action for a page that’s only been removed temporarily. A 302 redirect is a better choice. Consider further steps like removing internal links while the 302 is in action, then restoring them when the content is reinstated. This gives search engines the best chance of finding those pages quickly.
If the page has been removed but still has value…
A page that no longer exists and has no replacement can usually be allowed to 404. Returning this status code generally results in Google slowing down its crawling of the page, until eventually it gets dropped from the index altogether (this usually takes about a month).
Even if a page is gone for good, however, there are some extra considerations to be aware of:
- Internal links – Does the page in question have internal links pointing to it? If so, it’s worth removing or replacing these links to prevent users from clicking through to a broken page. Linking to 404s internally can also lead to unwanted crawl bloat and negatively impact the time it takes search engines to discover and crawl the pages that really matter.
- Backlinks – Are there any links coming from external sources? If so, allowing the page to 404 could result in wasted link equity. Check the URL’s referring domains before letting a page 404. If there’s anything of value, you may want to consider a 301 redirect instead. The ‘Broken Pages with Backlinks’ report comes in useful here. It’s also possible to narrow down and prioritize pages to redirect based using the ‘Broken Pages with Traffic’ report, as this highlights any 404s that users are finding organically.
If the page has been permanently removed and has no link value…
Pages that have permanently been removed and have no link value can be given a 410 status code. This indicates that the page has gone completely and has been intentionally removed. Google currently views 404 and 410 pages in the same way, but a 410 is a good option if you know for certain that the content will not be reinstated.
Handling valid 404 pages
We’ve discussed all the reasons 404 errors are a natural, and often helpful, part of the web. You could therefore be forgiven for thinking no further action is necessary when a page is left to 404, but that’s not strictly the case.
404 pages that occur naturally should return a proper 404 HTTP response code. They should also not be blocked via robots.txt, as this can make it harder for Google to understand how you want the page to be treated.
You may also need to work on refining your 404 error page, ensuring that it’s user-friendly and informative.
What makes a good 404 error page?
Hitting upon a 404 response code can be frustrating for users. As webmasters, it’s our job to ease that frustration and direct users to the content they’re looking for (or at least the next best thing). That responsibility falls to your 404 error page, so it’s worthwhile to spend some time getting it right.
There’s no hard and fast rule as to what constitutes an effective 404 error page. Often, it depends on the type of site and the nature of the content that’s being searched for.
However, there are some steadfast recommendations for 404 pages that all webmasters should follow:
- Clearly display the error code and explain that the page can’t be found
- Follow the branding on the rest of the site
- Include clear navigational links
Exactly how you meet these recommendations is up to you. Some sites bring the wow factor with stunning visuals, while others use humor in a bid to keep users engaged.
The best approach is to view the page through the eyes of a brand new user. If you were landing on the page for the first time, would you know how to get back on track? Is it clear that the page you’ve requested has not been found, but that there could be other content of interest on the same domain? If the answer is anything but a clear and resounding “yes,” it’s time to improve that 404 error page.