Correlations between DA and SERP rankings The most important component of Domain Authority is how well it correlates with search results. I chose to use this particular data set for a number of reasons: The true first page is what most users experience, thus the predictive power of Domain Authority will be focused on what users see. This isn't surprising, in that Domain Authority is trained to predict rankings while our competitor's strength-of-domain metrics are not. Random domains Moz customers Blog comment spam Low-quality auction domains Mid-quality auction domains High-quality auction domains Known link sellers Known link buyers Domainer network Link network While it would be my preference to release all the data sets, I've chosen not to in order to not "out" any website in particular. Methodology For each of the above data sets, I collected both the old and new Domain Authority scores. Even random domains drops. Because Domain Authority trains against actual rankings, it's reasonable to expect that the link buyers data set would not be impacted as highly as other techniques because the neural network learns that some link buying patterns actually work. The neural network behind Domain Authority was able to drop comment spammers' average DA by 34%. Domainer networks: -97% If you spend any time looking at dropped domains, you have probably come upon a domainer network where there are a series of sites enumerated and all linking to one another. If your DA is decreasing over time, especially relative to your competitors, you probably need to get started on outreach.
Moz’s Domain Authority is requested over 1,000,000,000 times per year, it’s referenced millions of times on the web, and it has become a veritable household name among search engine optimizers for a variety of use cases, from determining the success of a link building campaign to qualifying domains for purchase. With the launch of Moz’s entirely new, improved, and much larger link index, we recognized the opportunity to revisit Domain Authority with the same rigor as we did keyword volume years ago (which ushered in the era of clickstream-modeled keyword data).
What follows is a rigorous treatment of the new Domain Authority metric. What I will not do in this piece is rehash the debate over whether Domain Authority matters or what its proper use cases are. I have and will address those at length in a later post. Rather, I intend to spend the following paragraphs addressing the new Domain Authority metric from multiple directions.
Correlations between DA and SERP rankings
The most important component of Domain Authority is how well it correlates with search results. But first, let’s get the correlation-versus-causation objection out of the way: Domain Authority does not cause search rankings. It is not a ranking factor. Domain Authority predicts the likelihood that one domain will outrank another. That being said, its usefulness as a metric is tied in large part to this value. The stronger the correlation, the more valuable Domain Authority is for predicting rankings.
Determining the “correlation” between a metric and SERP rankings has been accomplished in many different ways over the years. Should we compare against the “true first page,” top 10, top 20, top 50 or top 100? How many SERPs do we need to collect in order for our results to be statistically significant? It’s important that I outline the methodology for reproducibility and for any comments or concerns on the techniques used. For the purposes of this study, I chose to use the “true first page.” This means that the SERPs were collected using only the keyword with no additional parameters. I chose to use this particular data set for a number of reasons:
- The true first page is what most users experience, thus the predictive power of Domain Authority will be focused on what users see.
- By not using any special parameters, we’re likely to get Google’s typical results.
- By not extending beyond the true first page, we’re likely to avoid manually penalized sites (which can impact the correlations with links.)
- We did NOT use the same training set or training set size as we did for this correlation study. That is to say, we trained on the top 10 but are reporting correlations on the true first page. This prevents us from the potential of having a result overly biased towards our model.
I randomly selected 16,000 keywords from the United States keyword corpus for Keyword Explorer. I then collected the true first page for all of these keywords (completely different from those used in the training set.) I extracted the URLs but I also chose to remove duplicate domains (ie: if the same domain occurred, one after another.) For a length of time, Google used to cluster domains together in the SERPs under certain circumstances. It was easy to spot these clusters, as the second and later listings were indented. No such indentations are present any longer, but we can’t be certain that Google never groups domains. If they do group domains, it would throw off the correlation because it’s the grouping and not the traditional link-based algorithm doing the work.
I collected the Domain Authority (Moz), Citation Flow and Trust Flow (Majestic), and Domain Rank (Ahrefs) for each domain and calculated the mean Spearman correlation coefficient for each SERP. I then averaged the coefficients for each metric.
Moz’s new Domain Authority has the strongest correlations with SERPs of the competing strength-of-domain link-based metrics in the industry. The sign (-/+) has been inverted in the graph for readability, although the actual coefficients are negative (and should be).
Moz’s Domain Authority scored a ~.12, or roughly 6% stronger than the next best competitor (Domain Rank by Ahrefs.) Domain Authority performed 35% better than CitationFlow and 18% better than TrustFlow. This isn’t surprising, in that Domain Authority is trained to predict rankings while our competitor’s strength-of-domain metrics are not. It shouldn’t be taken as a negative that our competitors strength-of-domain metrics don’t correlate as strongly as Moz’s Domain Authority — rather, it’s simply exemplary of the intrinsic differences between the metrics. That being said, if you want a metric that best predicts rankings at the domain level, Domain Authority is that metric.
Note: At first blush, Domain Authority’s improvements over the competition are, frankly, underwhelming. The truth is that we could quite easily increase the correlation further, but doing so would risk over-fitting and compromising a secondary goal of Domain Authority…
Handling link manipulation
Historically, Domain Authority has focused on only one single feature: maximizing the predictive capacity of the metric. All we wanted were the highest correlations. However, Domain Authority has become, for better or worse, synonymous with “domain value” in many sectors, such as among link buyers and domainers. Subsequently, as bizarre as it may sound, Domain Authority has itself been targeted for spam in order to bolster the score and sell at a higher price. While these crude link manipulation techniques didn’t work so well in Google, they were sufficient to increase Domain Authority. We decided to rein that in.
The first thing we did was compile a series off data sets that corresponded with industries we wished to impact, knowing that Domain Authority was regularly manipulated in these circles.
- Random domains
- Moz customers
- Blog comment spam
- Low-quality auction domains
- Mid-quality auction domains
- High-quality auction domains
- Known link sellers
- Known link buyers
- Domainer network
- Link network
While it would be my preference to release all the data sets, I’ve chosen not to in order to not “out” any website…