An 8-Point Checklist for Debugging Strange Technical SEO Problems

An 8-Point Checklist for Debugging Strange Technical SEO Problems

Here are some common problems I’ve come across that were mostly horses. Can Google crawl the page once? JavaScript-rendered vs hard-coded directives You might be setting one thing in the page source and then rendering another with JavaScript, i.e. you would see something different in the HTML source from the rendered DOM. If Google isn't getting 200s consistently in our log files, but we can access the page fine when we try, then there is clearly still some differences between Googlebot and ourselves. If you’ve got a JavaScript-heavy website you’ve probably banged your head against this problem before, but even if you don’t this can still sometimes be an issue. The following tools will let us do that: Fetch & Render Shows: Rendered DOM in an image, but only returns the page source HTML for you to read. Is there a difference between Fetch & Render, the mobile-friendly testing tool, and Googlebot? What is Google actually seeing? JavaScript is rendered separately from pages being crawled and Googlebot may spend less time rendering JavaScript than a testing tool. Again, once we’ve found the problem, it’s time to go and talk to a developer.

The Simple 6 Step Checklist for Analyzing Your Competition on the Web
A Simple Three-Point Checklist for Documenting Your B2B Content Strategy Right Now
How to Write Headlines That Get Your Brand What It Wants [Checklist]

Occasionally, a problem will land on your desk that’s a little out of the ordinary. Something where you don’t have an easy answer. You go to your brain and your brain returns nothing.

These problems can’t be solved with a little bit of keyword research and basic technical configuration. These are the types of technical SEO problems where the rabbit hole goes deep.

The very nature of these situations defies a checklist, but it’s useful to have one for the same reason we have them on planes: even the best of us can and will forget things, and a checklist will provvide you with places to dig.

Fancy some examples of strange SEO problems? Here are four examples to mull over while you read. We’ll answer them at the end.

1. Why wasn’t Google showing 5-star markup on product pages?

  • The pages had server-rendered product markup and they also had Feefo product markup, including ratings being attached client-side.
  • The Feefo ratings snippet was successfully rendered in Fetch & Render, plus the mobile-friendly tool.
  • When you put the rendered DOM into the structured data testing tool, both pieces of structured data appeared without errors.

2. Why wouldn’t Bing display 5-star markup on review pages, when Google would?

  • The review pages of client & competitors all had rating rich snippets on Google.
  • All the competitors had rating rich snippets on Bing; however, the client did not.
  • The review pages had correctly validating ratings schema on Google’s structured data testing tool, but did not on Bing.

3. Why were pages getting indexed with a no-index tag?

  • Pages with a server-side-rendered no-index tag in the head were being indexed by Google across a large template for a client.

4. Why did any page on a website return a 302 about 20–50% of the time, but only for crawlers?

  • A website was randomly throwing 302 errors.
  • This never happened in the browser and only in crawlers.
  • User agent made no difference; location or cookies also made no difference.

Finally, a quick note. It’s entirely possible that some of this checklist won’t apply to every scenario. That’s totally fine. It’s meant to be a process for everything you could check, not everything you should check.

The pre-checklist check

Does it actually matter?

Does this problem only affect a tiny amount of traffic? Is it only on a handful of pages and you already have a big list of other actions that will help the website? You probably need to just drop it.

I know, I hate it too. I also want to be right and dig these things out. But in six months’ time, when you’ve solved twenty complex SEO rabbit holes and your website has stayed flat because you didn’t re-write the title tags, you’re still going to get fired.

But hopefully that’s not the case, in which case, onwards!

Where are you seeing the problem?

We don’t want to waste a lot of time. Have you heard this wonderful saying?: “If you hear hooves, it’s probably not a zebra.”

The process we’re about to go through is fairly involved and it’s entirely up to your discretion if you want to go ahead. Just make sure you’re not overlooking something obvious that would solve your problem. Here are some common problems I’ve come across that were mostly horses.

  1. You’re underperforming from where you should be.
    1. When a site is under-performing, people love looking for excuses. Weird Google nonsense can be quite a handy thing to blame. In reality, it’s typically some combination of a poor site, higher competition, and a failing brand. Horse.
  2. You’ve suffered a sudden traffic drop.
    1. Something has certainly happened, but this is probably not the checklist for you. There are plenty of common-sense checklists for this. I’ve written about diagnosing traffic drops recently — check that out first.
  3. The wrong page is ranking for the wrong query.
    1. In my experience (which should probably preface this entire post), this is usually a basic problem where a site has poor targeting or a lot of cannibalization. Probably a horse.

Factors which make it more likely that you’ve got a more complex problem which require you to don your debugging shoes:

  • A website that has a lot of client-side JavaScript.
  • Bigger, older websites with more legacy.
  • Your problem is related to a new Google property or feature where there is less community knowledge.

1. Start by picking some example pages.

Pick a couple of example pages to work with — ones that exhibit whatever problem you’re seeing. No, this won’t be representative, but we’ll come back to that in a bit.

Of course, if it only affects a tiny number of pages then it might actually be representative, in which case we’re good. It definitely matters, right? You didn’t just skip the step above? OK, cool, let’s move on.

2. Can Google crawl the page once?

First we’re checking whether Googlebot has access to the page, which we’ll define as a 200 status code.

We’ll check in four different ways to expose any common issues:

  1. Robots.txt: Open up Search Console and check in the robots.txt validator.
  2. User agent: Open Dev Tools and verify that you can open the URL with both Googlebot and Googlebot Mobile.
    1. To get the user agent switcher, open Dev Tools.
    2. Check the console drawer is open (the toggle is the Escape key)
    3. Hit the … and open “Network conditions”
    4. Here, select your user agent!
  1. IP Address: Verify that you can access the page with the mobile testing tool. (This will come from one of the IPs used by Google; any checks you do from your computer won’t.)
  2. Country: The mobile testing tool will visit from US IPs, from what I’ve seen, so we get two birds with one stone. But Googlebot will occasionally crawl from non-American IPs, so it’s also worth using a VPN to double-check whether you can access the site from any other relevant countries.
    1. I’ve used HideMyAss for this before, but whatever VPN you have will work fine.

We should now have an idea whether or not Googlebot is struggling to fetch the page once.

Have we found any problems yet?

If we can re-create a failed crawl with a simple check above, then it’s likely Googlebot is probably failing consistently to fetch our page and it’s typically one of those basic reasons.

But it might not be. Many problems are inconsistent because of the nature of technology. 😉

3. Are we telling Google two different things?

Next up: Google can find the page, but are we confusing it by telling it two different things?

This is most commonly seen, in my experience, because someone has messed up the indexing directives.

By “indexing directives,” I’m referring to any tag that defines the correct index status or page in the index which should rank. Here’s a non-exhaustive list:

  • No-index
  • Canonical
  • Mobile alternate tags
  • AMP alternate tags

An example of providing mixed messages would be:

  • No-indexing page A
  • Page B canonicals to page A

Or:

  • Page A has a canonical in a header to A with a parameter
  • Page A has a canonical in the body to A without a parameter

If we’re providing mixed messages, then it’s not clear how Google will respond. It’s a great way to start seeing strange results.

Good places to check for the indexing directives listed above are:

  • Sitemap
    • Example: Mobile alternate tags can sit in a sitemap
  • HTTP headers
    • Example: Canonical and meta robots can be set in headers.
  • HTML head
    • This is where you’re probably looking, you’ll need this one for a comparison.
  • JavaScript-rendered vs hard-coded directives
    • You might be setting one thing in the page source and then rendering another with JavaScript, i.e. you would see something different in the HTML source from the rendered DOM.
  • Google Search Console settings
    • There are Search Console settings for ignoring parameters and country localization that can clash with indexing tags on the page.

A quick aside on rendered DOM

This page has a lot of mentions of the rendered DOM on it (18, if you’re curious). Since we’ve just had our first, here’s a quick recap about what that is.

When you load a webpage, the first request is the HTML. This is what you see in the HTML source (right-click on a webpage and click View Source).

This is before JavaScript has done anything to the page. This didn’t use to be such a big deal, but now so many websites rely heavily on JavaScript that the most people quite reasonably won’t trust the the initial HTML.

Rendered DOM is the technical term for a page, when all the JavaScript has been rendered and all the page alterations made. You can see this in Dev Tools.

In Chrome you can get that by right clicking and hitting inspect element (or Ctrl + Shift + I). The Elements tab will show the DOM as it’s being rendered. When it stops flickering and changing, then you’ve got the rendered DOM!

4. Can Google crawl the page consistently?

To see what Google is seeing, we’re going to need to get log files. At this point, we can check to see how it is accessing the page.

Aside: Working with logs is an entire post in and of itself. I’ve written a guide to log analysis with BigQuery, I’d also really recommend trying out Screaming Frog Log Analyzer, which has done a great job of handling a lot of the complexity around logs.

When we’re looking at crawling there are three useful checks we can do:

  1. Status codes: Plot the status codes over time. Is Google seeing different status codes than you when you check URLs?
  2. Resources: Is Google downloading all the resources of the page?
    1. Is it downloading all your site-specific JavaScript and CSS files that it would need to generate the page?
  3. Page size follow-up: Take the max and min of all your pages and resources and diff them. If you see a difference, then Google might be failing to fully download all the resources or pages. (Hat tip to @ohgm, where I first heard this neat tip).

Have we found any problems yet?

If Google isn’t getting 200s consistently in our log files, but we can access the page fine when we try, then there is clearly still some differences between Googlebot and ourselves. What might those differences be?

  1. It will crawl more than us
  2. It is obviously a bot, rather than a human pretending to be a bot
  3. It will crawl at different times of day

This means that:

  • If our website is doing clever bot blocking, it might be able to differentiate between us and Googlebot.
  • Because Googlebot will put more stress on our web servers, it might behave differently. When websites…

COMMENTS

WORDPRESS: 0
DISQUS: 0