Site Crawl, Day 1: Where Do You Start?

Site Crawl, Day 1: Where Do You Start?. When you're faced with the many thousands of potential issues a large site can have, where do you start? First, prioritization depends a lot on your intent. We've grouped these "Critical Crawler Issues" into our first category, and they currently include 5XX errors, 4XX errors, and redirects to 4XX. You'll see Critical Crawler Issues highlighted throughout the Site Crawl interface: Look for the red alert icon to spot critical issues quickly. Address these problems first. If a page can't be crawled, then every other crawler issue is moot. This does require some assumptions about prioritization, but if your time is limited, we hope it at least gives you a quick starting point to solve a couple of critical issues. So, I filter by URL: I can quickly see that these pages account for 392 of my missing descriptions — a whopping 43% of them. Site Crawl now makes tracking new issues easy, including alert icons, graphs, and a quick summary of new issues by category: Any crawl is going to uncover some new pages (the content machine never rests), but if you're suddenly seeing hundreds of new issues of a single type, it's important to dig in quickly and make sure nothing's wrong.

Author: Dr. Peter J. Meyers / Source: Moz

When you’re faced with the many thousands of potential issues a large site can have, where do you start? This is the question we tried to tackle when we rebuilt Site Crawl. The answer depends almost entirely on your site and can require deep knowledge of its history and goals, but I’d like to outline a process that can help you cut through the noise and get started.

Simplistic can be dangerous

Previously, we at Moz tried to label every issue as either high, medium, or low priority. This simplistic approach can be appealing, even comforting, and you may be wondering why we moved away from it. This was a very conscious decision, and it boils down to a couple of problems.

First, prioritization depends a lot on your intent. Misinterpreting your intent can lead to bad advice that ranges from confusing to outright catastrophic. Let’s say, for example, that we hired a brand-new SEO at Moz and they saw the following issue count pop up:

Almost 35,000 NOINDEX tags?! WHAT ABOUT THE CHILDREN?!!

If that new SEO then rushed to remove those tags, they’d be doing a lot of damage, not realizing that the vast majority of those directives are intentional. We can make our systems smarter, but they can’t read your mind, so we want to be cautious about false alarms.

Second, bucketing issues by priority doesn’t do much to help you understand the nature of those problems or how to go about fixing them. We now categorize Site Crawl issues into one of five descriptive types:

Critical Crawler Issues
Crawler Warnings
Redirect Issues
Metadata Issues
Content Issues

Categorizing by type allows you to be more tactical. The issues in our new “Redirect” category, for example, are going to have much more in common, which means they potentially have common fixes. Ultimately, helping you find problems is just step one. We want to do a better job at helping you fix them.

1. Start with Critical Crawler Issues

That’s not to say everything is subjective. Some problems block crawlers (not just ours, but search engines) from getting to your pages at all. We’ve grouped these “Critical Crawler Issues” into our first category, and they currently include 5XX errors, 4XX errors, and redirects to 4XX. If you have a sudden uptick in 5XX errors, you need to know, and almost no one intentionally redirects to a 404.

You’ll see Critical Crawler Issues highlighted throughout the Site Crawl interface: