Businesses rely on analytical data about their Web traffic for all sorts of important things, like working out how much to charge advertisers, or how to market their products.
Google analytics is essentially the eyes and ears for most small to medium businesses online.
But what if you were able to subvert Google analytics? What if I was able to get your analytics to start reporting the wrong thing? How would this affect your business?
This article shows how easy it is to exploit a weakness in Google's analytics code in order to render their data essentially meaningless.
How I discovered the exploit
In late August Google performed an unknown update that wiped out about 95% of this site's organic search traffic.
SME Pals's twitter profile now outranks the entire site as a result of the penalty. Clearly Google believes that our hundreds of articles (like this one), plus our free forum advice for small business owners is not as important as the site's own twitter feed.
In any event, the lack of organic traffic meant that I was able to notice an unusual problem with the data my analytics reported.
It was showing hits on pages that did not exist on my site.
Usually visits to non-existent pages display as a 404 error, but these visits did not. Analytics was convinced that people were visiting my site on pages I didn't have.
Could this have something to do with the penalty? I had to investigate.
How analytics was being fooled
After doing a bit of detective work, I found out that a malicious website was scraping my pages. Not just the content, but the actual page source code - right down to the copyright notice and analytics urchin.
I wrote about my findings on Google+, and shared it with Matt Cutts (Head of Webspam), and a few other players in the SEO industry. Here's what I said:
Page source scraping working wonders in search!
I am currently getting more "hits" from a scraping site that has copied my website's source code (right down to the copyright notice and analytics urchin).
Visits to this site show up in my analytics because they have copied the source code which is confusing Google.
They have also copied the canonical URLs, so Google probably thinks this is me.
My site is currently outranked by my own Twitter profile, and has lost approximately 97% of its organic search traffic - since August.
Yet this scraper site is appearing in the top 3 results, for a range of terms (I know because their traffic is higher than mine, so its showing up more prominently in my analytics).
Anyone care to shed light on why my site ends up penalized (I assume for the crime of having my source code scraped) when this nonsense ranks just fine?
I got no response, so I was going to leave it at that.
Except, it's frustrating to have a malicious website that is scraping my source code to distribute malware rank higher in search results and get more traffic. I just couldn't leave it alone.
I went to the forums for advice and got told to issue a DMCA take down on that site. I did that, and got no response from the ISP.
And yet, analytics continued to report the organic traffic arriving at the scraper site. Then it occurred to me:
What if someone wanted to purposefully distort a website's traffic data using this technique?
The Analytics exploit
Let's assume we have two competitors, A and B.
Competitor A does a good job of creating content, running marketing campaigns, etc. Competitor B is getting beaten, but doesn't want to go down without a fight.
Competitor B can copy the analytics urchin from Competitor A's site, and paste it onto another website. They can then drive traffic to that site (you could even automate this process to fool the analytics code).
Google analytics will then report to Competitor A that they are getting all sorts of great traffic.
Unlike in my situation however, Competitor B could be really sneaky and make their relative URLs match pages on Competitor A's site - so that they wouldn't notice this problem in the first place.
As far as Competitor A is concerned, they are generating huge amounts of traffic to their own pages.
Competitor B could then distort the information coming in to Competitor A at will, and get them to make bad decisions based on incorrect data.
How would you feel if you discovered this happening to your business?
Sure, you could check the server logs, but honestly, when was the last time you had real reason to suspect that your analytics traffic was being exploited to harm your business?
Are you worried yet about how much you rely on analytics to inform your strategy, advertising, marketing, and decision making?
At the time of writing, a malicious website using scraped source code is driving more "hits" to my website than my own organic traffic - according to Google analytics.
But, while my problems don't extend much further than Google's ability to penalize my site without comment, support, or good reason (I've looked for any possible cause, believe me), it may be that Google itself could be subverted to mislead your business. That's something far more serious in the grand scheme of things.
What are your thoughts on this potential exploit and weakness in analytics? Clearly it could easily be rectified by Google because all they have to do is ignore data coming from any domain that is not the one registered to that particular urchin.
But then again, for a company that earns the big bucks, shouldn't this already be the case?
Share your thoughts in the comments.