Google Analytics is a powerful reporting tool, recording the people and entities that visit your site and what they do while they’re there. There’s just one problem; robots. Robots ruin everything. Or, rather, robots skew your data tracking and data reporting. If a search engine crawler hits your page and indexes all 2,500 pages it can see, that’s going to count as thousands of views in your statistics. It can be difficult to see the true percentages of conversions and actions when a robot is in the mix, inflating the disengaged user stats.
Up until very recently, there was no good way to deal with this. You had to follow some arcane steps to block bots from your reporting, and they still showed up as views for sampling that caused issues.
Identifying the Bot Problem
Some websites have larger problems with bots than others. You should check your analytics to see if you have the problem in the first place. If you don’t, you probably don’t need to take action against bots, though it can be helpful to know how regardless.
1. Log into your Google Analytics dashboard.
2. Visit the Audience > Technology > Browser and OS report.
3. Look for browser agents identified as bots.
The bot agents sometimes have tricky names. One of the most common is the Mozilla Compatible Agent, which is used by a number of bots and some mobile browser apps. A high percentage of views from one of these browser agents is the sign that you have bot traffic skewing your results.
1. Visit the Audience > Technology > Network report.
2. Look for bot service providers.
This report will show you your traffic broken down by Internet service provider. Bots come from branded ISPs, such as Microsoft Corp, Google Inc and Inktomi Corporation. These ISPs will have some very telling statistics; 100% new visits, 100% bouce rate, a visit duration of zero and only a single page viewed per visit. These are sure signs of a bot.
Fixing the Problem – The Old Way
The old method to fixing this problem in Google Analytics is roundabout. You essentially need to create a reporting filter so that any future traffic that comes in will be stripped of bot traffic before generating the report. It’s not an elegant solution, and it only applies per view and from the date applied. This means your historical data will remain unfiltered and skewed.
To set up a filter, go to your Google Analytics view and click to set up a new filter. Name the filter something recognizable, such as “bot excluder” so you know what it does. Set the filter type to custom and Exclude. Under “filter field” select ISP Organization.
For the filter pattern, you are going to create a regular expression with the names of the ISPs of the offending bots. For example: inktomi corporation, yahoo! Inc and Microsoft corp would be blocked with this expression:
^(inktomi corporation|yahoo\! inc\.|Microsoft corp)$
Each ISP is added with their exact spelling, separated by a |. The whole expression is included in ^()$. Include the ISPs of any bots that are bothering you and skewing your metrics. A bot that visits once per quarter and hits three pages isn’t going to be a problem, so don’t worry about blocking absolutely everything.
One thing to note, when you’re searching for information about blocking bots, you’re going to encounter a lot of information about the robots.txt file. You can use this file to block bots from accessing your site entirely, but you probably don’t want to do this.
The reason is that you’re trying to block bots from your reporting, not from your site entirely. Blocking, for example, the Google Corp bot would cause huge search related problem. Block Google from your page and they can’t index your site, which means it won’t show up in search results, which means you can’t pull in traffic for your keywords.
Fixing the Problem – The New Way
All of that, as you can see, has issues. It only applies to reports on data after the filter was applied. It requires a custom filter to remove bots from data collection. It requires a perfectly formulated filter, or series of filters for each bot ISP. It can, potentially, filter out useful traffic that happens to piggyback on those ISPs.
Thankfully, Google has heard your feedback and has implemented a way within Google Analytics to block traffic reporting from common bots.
This new option is found at the view level within Google Analytics. Simply visit the view settings and check it out. It’s down at the bottom under “Bot Filtering”, a simple checkbox labeled “Exclude all hits from known bots and spiders.”
This gets around a few problems with the filter method. First of all, it’s easy to implement. All you need to do is check the box and save your settings. Secondly, it includes a list of bots created and maintained by Google themselves. You are guaranteed to avoid failing to block bots due to an improperly typed filter, and you are guaranteed to block bots you might not have even noticed. Plus, it’s at the view level, so you can have another view of the same traffic without the filter.
Knowing the Filters
Why bring up the old method if the new method works better, is easier to implement and has fewer flaws? Well, for one thing, you can still use those filters for a variety of tasks. Filtering by ISP allows you to generate traffic reports on specific subsets of your users, though you would have to have the filter in place. There are more elegant ways to do this in Google Analytics, however.
Another reason is if you’re using the old style analytics. This feature is only available in Universal Analytics. If you’re stuck using the older analytics suite, you’re going to have to jump through a few hoops.
Essentially, it’s just good to have a comparison of how easy Google makes things for webmasters. Google Analytics is packed full of awesome features and it’s easy to forget how hard things are without them.