7 Search Ranking Factors Analyzed: A Follow-Up Study

: *Note: Only blogs with complete ranking data were used in the study. Correlation 1: Time and target keyword position First we will map the target keyword ranking positions against the number of days its corresponding blog has been indexed. Correlation 2: Time and total ranking keywords on URL You’ll find that when you write an article it will (hopefully) rank for the keyword you target. But often times it will also rank for other keywords. It stands to reason that, over time, a blog will gain links (and ranking potential) over time. As we are demonstrating in this article, there may be many other factors at play that need to be isolated and tested for correlations in order to get the full picture, such as: time indexed, on-page SEO (to be discussed later), Domain Authority, link profile, and depth/quality of content (also to be discussed later with MarketMuse as a measure). We pulled every article’s content score, along with MarketMuse’s recommended scores and the average competitor scores, to answer these questions. Correlation 1: Overall MarketMuse content score Does a higher overall content score result in better rankings? Let’s take a look at the percentage of articles with their target keywords ranking 1–10 that also have a 90% on-page score or better. On-page optimization score by rankings Percentage of KWs ranking 1–10 with ≥ 90% score 73.5% Percentage of keywords ranking >10 with ≥ 90% score 53.2% This is enough of a hint for me.

Author: Jeff Baker / Source: Moz

Grab yourself a cup of coffee (or two) and buckle up, because we’re doing maths today.

Again.

Back it on up…

A quick refresher from last time: I pulled data from 50 keyword-targeted articles written on Brafton’s blog between January and June of 2018.

We used a technique of writing these articles published earlier on Moz that generates some seriously awesome results (we’re talking more than doubling our organic traffic in the last six months, but we will get to that in another publication).

We pulled this data again… Only I updated and reran all the data manually, doubling the dataset. No APIs. My brain is Swiss cheese.

We wanted to see how newly written, original content performs over time, and which factors may have impacted that performance.

Why do this the hard way, dude?

“Why not just pull hundreds (or thousands!) of data points from search results to broaden your dataset?”, you might be thinking. It’s been done successfully quite a few times!

Trust me, I was thinking the same thing while weeping tears into my keyboard.

The answer was simple: I wanted to do something different from the massive aggregate studies. I wanted a level of control over as many potentially influential variables as possible.

By using our own data, the study benefited from:

The same root Domain Authority across all content.
Similar individual URL link profiles (some laughs on that later).
Known original publish dates and without reoptimization efforts or tinkering.
Known original keyword targets for each blog (rather than guessing).
Known and consistent content depth/quality scores (MarketMuse).
Similar content writing techniques for targeting specific keywords for each blog.

You will never eliminate the possibility of misinterpreting correlation as causation. But controlling some of the variables can help.

As Rand once said in a Whiteboard Friday, “Correlation does not imply causation (but it sure is a hint).”

Caveat:

What we gained in control, we lost in sample size. A sample size of 96 is much less useful than ten thousand, or a hundred thousand. So look at the data carefully and use discretion when considering the ranking factors you find most likely to be true.

This resource can help gauge the confidence you should put into each Pearson Correlation value. Generally, the stronger the relationship, the smaller sample size needed to be be confident in the results.

So what exactly have you done here?

We have generated hints at what may influence the organic performance of newly created content. No more, and no less. But they are indeed interesting hints and maybe worth further discussion or research.

What have you not done?

We have not published sweeping generalizations about Google’s algorithm. This post should not be read as a definitive guide to Google’s algorithm, nor should you assume that your site will demonstrate the same correlations.

So what should I do with this data?

The best way to read this article, is to observe the potential correlations we observed with our data and consider the possibility of how those correlations may or may not apply to your content and strategy.

I’m hoping that this study takes a new approach to studying individual URLs and stimulates constructive debate and conversation.

Your constructive criticism is welcome, and hopefully pushes these conversations forward!

The stat sheet

So quit jabbering and show me the goods, you say? Alright, let’s start with our stats sheet, formatted like a baseball card, because why not?:

*Note: Only blogs with complete ranking data were used in the study. We threw out blogs with missing data rather than adding arbitrary numbers.

And as always, here is the original data set if you care to reproduce my results.

So now the part you have been waiting for…

The analysis

To start, please use a refresher on the Pearson Correlation Coefficient from my last blog post, or Rand’s.

1. Time and performance

I started with a question: “Do blogs age like a Macallan 18 served up neat on a warm summer Friday afternoon, or like tepid milk on a hot summer Tuesday?”

Does the time indexed play a role in how a piece of content performs?

Correlation 1: Time and target keyword position

First we will map the target keyword ranking positions against the number of days its corresponding blog has been indexed. Visually, if there is any correlation we will see some sort of negative or positive linear relationship.

There is a clear negative relationship between the two variables, which means the two variables may be related. But we need to go beyond visuals and use the PCC.

Days live vs. target keyword position
PCC	-.343
Relationship	Moderate

The data shows a moderate relationship between how long a blog has been indexed and the positional ranking of the target keyword.

But before getting carried away, we shouldn’t solely trust one statistical method and call it a day. Let’s take a look at things another way: Let’s compare the average age of articles whose target keywords rank in the top ten against the average age of articles whose target keywords rank outside the top ten.

Average age of articles based on position
Target KW position ≤ 10	144.8 days
Target KW position > 10	84.1 days

Now a story is starting to become clear: Our newly written content takes a significant amount of time to fully mature.

But for the sake of exhausting this hint, let’s look at the data one final way. We will group the data into buckets of target keyword positions, and days indexed, then apply them to a heatmap.

This should show us a clear visual clustering of how articles perform over time.

This chart, quite literally, paints a picture. According to the data, we shouldn’t expect a new article to realize its full potential until at least 100 days, and likely longer. As a blog post ages, it appears to gain more favorable target keyword positioning.

Correlation 2: Time and total ranking keywords on URL

You’ll find that when you write an article it will (hopefully) rank for the keyword you target. But often times it will also rank for other keywords. Some of these are variants of the target keyword, some are tangentially related, and some are purely random noise.

Instinct will tell you that you want your articles to rank for as many keywords as possible (ideally variants and tangentially related keywords).

Predictably, we have found that the relationship between the number of keywords an article ranks for and its estimated monthly organic traffic (per SEMrush) is strong (.447).

We want all of our articles to do things like this:

We want lots of variants each with significant search volume. But, does an article increase the total number of keywords it ranks for over time? Let’s take a look.

Visually this graph looks a little murky due to the existence of two clear outliers on the far right. We will first run the analysis with the outliers, and again without. With the outliers, we observe the following:

Days live vs. total keywords ranking on URL (w/outliers)
PCC	.281
Relationship	Weak/borderline moderate

There appears to be a relationship between the two variables, but it isn’t as strong. Let’s see what happens when we remove those two outliers:

Visually, the relationship looks stronger. Let’s look at the PCC:

Days live vs. total keywords ranking on URL (without outliers)
PCC	.390
Relationship	Moderate/borderline strong

The relationship appears to be much stronger with the two outliers removed.

But again, let’s look at things another way.

Let’s look at the average age of the top 25% of articles and compare them to the average age of the bottom 25% of articles:

Average age of top 25% of articles versus bottom 25%
Top 25%	148.9 days
Bottom 25%	73.8 days

This is exactly why we look at data multiple ways! The top 25% of blog posts with the most ranking keywords have been indexed an average of 149 days, while the bottom 25% have been indexed 74 days — roughly half.

To be fully sure, let’s again cluster the data into a heatmap to observe where performance falls on the time continuum:

We see a very similar pattern as in our previous analysis: a clustering of top-performing blogs starting at around 100 days.

Time and performance assumptions

You still with me? Good, because we are saying something BIG here. In our observation, it takes between 3 and 5 months for new content to perform in organic search. Or at the very least, mature.

To look at this one final way, I’ve created a scatterplot of only the top 25% of highest performing blogs and compared them to their time indexed:

There are 48 data plots on this chart, the blue plots represent the top 25% of articles in terms of strongest target keyword ranking position. The orange plots represent the top 25% of articles with the highest number of keyword rankings on their URL. (These can be, and some are, the same URL.)

Looking at the data a little more closely, we see the following: