The second being how to analyze that data and how to get insights, and then the third being how to use that to optimize your pages and your site. Log data So what are you going to see in these logs? But essentially you'll likely see that IP, so that red IP address, that timestamp, which will commonly look like that, that method (get/post), which I don't completely understand or necessarily need to use in some of the analysis, but it's good to be aware of all these things, the URL requested, that status code, all of these things here. So what I've done in the past is I will import a CSV log file into Excel, and I will use the Text Import Wizard and you can basically delineate what the separators are for this craziness. But essentially what you can look at in Excel is: Okay, what are the top pages that Googlebot hits by frequency? On top of that, you can also look into: What are the most common Googlebot types that are hitting your site? Isolate 301s and fix frequently hit 301 chains: 301s, so a lot of questions about 301s in these log files. It's actually a lot easier to do this when you have kind of paired it up with crawl data, because now you have some more insights into that chain. If your site has gone mobile first, you can dig into that, into the logs and evaluate what that looks like. Reevaluate Lastly, it's really helpful to connect the crawl data with some of this data.
Log file analysis can provide some of the most detailed insights about what Googlebot is doing on your site, but it can be an intimidating subject. In this week’s Whiteboard Friday, Britney Muller breaks down log file analysis to make it a little more accessible to SEOs everywhere.
Click on the whiteboard image above to open a high-resolution version in a new tab!
Hey, Moz fans. Welcome to another edition of Whiteboard Friday. Today we’re going over all things log file analysis, which is so incredibly important because it really tells you the ins and outs of what Googlebot is doing on your sites.
So I’m going to walk you through the three primary areas, the first being the types of logs that you might see from a particular site, what that looks like, what that information means. The second being how to analyze that data and how to get insights, and then the third being how to use that to optimize your pages and your site.
For a primer on what log file analysis is and its application in SEO, check out our article: How to Use Server Log Analysis for Technical SEO
So let’s get right into it. There are three primary types of logs, the primary one being Apache. But you’ll also see W3C, elastic load balancing, which you might see a lot with things like Kibana. But you also will likely come across some custom log files. So for those larger sites, that’s not uncommon. I know Moz has a custom log file system. Fastly is a custom type setup. So just be aware that those are out there.
So what are you going to see in these logs? The data that comes in is primarily in these colored ones here.
So you will hopefully for sure see:
- the request server IP;
- the timestamp, meaning the date and time that this request was made;
- the URL requested, so what page are they visiting;
- the HTTP status code, was it a 200, did it resolve, was it a 301 redirect;
- the user agent, and so for us SEOs we’re just looking at those user agents’ Googlebot.
So log files traditionally house all data, all visits from individuals and traffic, but we want to analyze the Googlebot traffic. Method (Get/Post), and then time taken, client IP, and the referrer are sometimes included. So what this looks like, it’s kind of like glibbery gloop.
It’s a word I just made up, and it just looks like that. It’s just like bleh. What is that? It looks crazy. It’s a new language. But essentially you’ll likely see that IP, so that red IP address, that timestamp, which will commonly look like that, that method (get/post), which I don’t completely understand or necessarily need to use in some of the analysis, but it’s good to be aware of all these things, the URL requested, that status code, all of these things here.
So what are you going to do with that data? How do we use it? So there’s a number of tools that are really great for doing some of the heavy lifting for you. Screaming Frog Log File Analyzer is great. I’ve used it a lot. I really, really like it. But you have to have your log files in a specific type of format for them to use it.
Splunk is also a great resource. Sumo Logic and I know there’s a bunch of others. If you’re working with really large sites, like I have in the past, you’re going to run into problems here because it’s not going to be in a common log file. So what you can do is to manually do some of this yourself, which I know sounds a little bit crazy.
Manual Excel analysis
But hang in there. Trust me, it’s fun and super interesting. So what I’ve done in the past is I will import a CSV log file into Excel, and I will use the Text Import Wizard and you can basically delineate what the separators are for this craziness. So whether it be a space or a comma or a quote, you can sort of break those up so that each of those live within their own columns. I wouldn’t worry about having extra blank columns, but you can separate those. From there, what you would do is just create pivot tables. So I can link to a resource on how you can easily do that.
But essentially what you can look at in…