It’s sad to see an industry that started out so full of questions, doubt and mistrust of the status quo readily accepting derived conclusions that are inapplicable, or just plain wrong. I love stats just as much as the next man (if not more, maybe even a bit too much), but if the numbers are irrelevant or conclusions have been wrongly derived, I will readily discard data (but not before having a good rant and moan).
One recent study that got me a bit wound up (in part because I retweeted it before reading it fully) was the iCrossing “importance of page-one visibility” paper, foundĀ here (PDF download link) or here (if you want to hand over all your data). Don’t get me wrong, it’s a great thing to aim to study and solid data was used, but this data has little use when presented in this way and the derived conclusions are inaccurate.
I work for an SEO agency, but I’m not bashing iCrossing because I work for a competing company. In fact I’m not meaning to bash them in particular; I like their blog and they seem to know what they are doing. Also, they gave me some nice branded juggling beanie bag things at a conference. It’s just about the data and the fact, we as an industry accept these wrong conclusions.
What’s wrong with the data?
Reporting these data in a non-aggregate format, these data are fine. Fine at least for telling clients the breakdown of what pages their traffic comes from. Claiming this breakdown means anything in an aggregate is stretching it and concluding that “roughly 95.3 percent of all non-branded natural search traffic comes from page-one of the SERPs” is just wrong.
These claims would be fine if iCrossing clients held all positions on all pages being included in the study; I’m going to go with the assumption that they don’t. The problem is that iCrossing are watching the data from the wrong end (or in the wrong way, depending on how you look at it). I will illustrate my point with Client A and Client B:
Format = Keyword -> Ranking -> Number of visitors received
Client A
Dog -> 1st -> 5,000
Dog Collars -> 2nd -> 300
Dog Pics -> 12th -> 20
Dog in a plane -> 16th -> 1
Traffic from 1st page = 99.61%
Traffic from 2nd page = 0.39%
Client B
Dog -> 13th -> 300
Dog Collars -> 15th -> 24
Dog Pics -> 1st -> 100
Dog in a plane -> 2nd -> 4
Traffic from 1st page = 24.30%
Traffic from 2nd page = 75.70%
In conclusion the issue is that the result we get is heavily dependent on the SERP performance of the websites studied.
What are the alternatives?
I’ve highlighted the problem, so I probably should provide some solutions.
The AOL data is still by far the biggest source of data, and whilst it’s old, isn’t from Google and has no data on the various universal results we have these days, it is still the best data I’m aware of.
Getting more up to date data is possible, but will be time consuming or likely rather expensive. You can do this yourself, but you need to make sure you have enough data and a reliable and accurate way of reverse engineering Google. Fairly recent developments have eased this process somewhat, but I won’t be sharing my exact methodology in the public domain for the foreseeable future.
The AOL data was mined by number of people, but the depth of that mining was fairly shallow. One of the simplest ways to improve your data is to strip out any branded searches; something that I have to commend iCrossing for doing in their study.
However, the point of this post wasn’t to provide alternative data sources. The point was to pose a question (or series of).
Why aren’t we questioning these things? Are we ignorant to what good data is? Or is it that we just don’t give a shit about data accuracy?
Do we not have a responsibility to ensure data accuracy? After all, the data we use shapes the decisions of our clients (if you work for clients).