Sometimes, you want to post content to the Internet, but you don’t want Google to find it. There are any number of reasons to want to hide content, many of them perfectly legitimate. Unfortunately, many of the methods for hiding content have been abused by blackhat SEOs in the past, and now they incur search penalties. Fortunately, there are still a few good ways to hide content without souring your reputation.
Be aware that none of these methods is truly infallible. Google is very good at what it does, and it’s always finding ways to parse content it couldn’t previously. If you really, truly, absolutely need the content to never appear, you’re best off not publishing it at all. If you must publish, use several of these methods to hide it and hope they stick.
Reasons to Hide Content
Blackhats have been using content hiding for years for various nefarious purposes. Skipping over those purposes, there are legitimate reasons to hide content from Google.
• Privacy concerns. If you have something you want to share but you don’t want made public, hide it from the search engines. It’s better to post it privately, rather than on a public Internet, but sometimes it can’t be helped.
• Duplicate content. Some standard web practices, such as creating printer friendly pages or using certain ecommerce solutions, generates duplicate content. A reasonable amount of useful duplicate content won’t hurt you, but it’s still better to hide it to focus SEO on the page that needs it.
• Gated content. If you want some content to be limited to paying subscribers or to users who have performed some important action, the last thing you need is for them to find it free through Google.
• Submission forms. A contact us page, a mailing list signup, a registration form; these are all fields targeted by spammers. If you hide the content from crawlers, you’re much more likely to be invisible to most of the spammers. You’ll still find some, for sure, but you won’t be swamped in spam content.
Ways to Hide Content Legitimately
Hide it using your robots.txt file. Robots.txt has a number of parameters you can set, to control the behavior of search engines. Setting it to hide the content you want hidden will hide it – from a crawler browsing your site. It will not, however, hide it if the crawler gets to that content from an external link. If you’re not already using your robotx.txt, you should start. It’s a very powerful tool for controlling how you site looks from the outside.
Setting meta robots directives. Unlike robots.txt, that governs your site as a whole, meta robot tags are set at the page level. You can do several things with this that you can’t with the main robots.txt file. The NOINDEX tag is the most important; it tells the crawler not the index the content on the page. You can also set the FOLLOW or NOFOLLOW tags, to govern link behavior. You can exclude the content of the page while still allowing the crawler to follow links, for new page discovery. Alternatively, you can disable the following of links from that content as well, to help keep several linked pages hidden.
Gated content. Hiding your content behind a text submission form is a simple and effective way of hiding it. Requiring a login or a search to find the content keeps it hidden. This is because Google will not submit a form in order to view content; it doesn’t want to spam contact forms or mailing lists, after all. Remember, however, that if your content is hidden behind a login but can be viewed by direct link without a login, that link can be crawled to the page and it can be indexed.
IFrames. An iFrame is an inline frame, a type of window from one page into a different page. One common example of iFrames in use is the Facebook comments plugin. From a search engine’s perspective, there’s no comments section on your blog. There’s an empty box where one might be. For users, however, the iFrame is a portal to a string of Facebook comments. You can use iFrames to embed content you want hidden into a page you want indexed. Note that Google is entirely capable of reading and indexing iFrames, it just doesn’t always do so. This method is hit or miss.
Text as images. The same theory behind captchas is what keeps Google from machine-reading text on images. It’s easier for the search engine to ignore images and read their ALT text than it is to develop a competent text parser. It’s not a convenient solution for mobile users or for adaptable content, but it works to hide individual snippets of information.
IP Blocking. Search engines use web crawlers that operate from specific servers. This limits the number of IP addresses they can use. Most search crawlers publish their IP addresses, or have had their addresses published by a third party. Find those IP ranges and block them in your robots.txt or brute forced through your hosting. This method has some drawbacks, of course. If your IP range is too large, you might block legitimate users. You also are unlikely to catch every web crawler in the world. You may get Google, Bing and Yahoo, but what about smaller sites you don’t know about? It can be far more hassle than it’s worth.
Flash Content. Using Adobe Flash to create content for your page will effectively hide it from the search engines. It also hides it from certain mobile devices and users without the updated Flash plugins, but that’s a risk you take using a multimedia format to hide content. It’s far from a convenient solution, but if nothing above has worked, you’re probably about to try anything.
In the end, there’s no perfect solution for hiding your content from the search engines. Like everything online, if you don’t want it to be seen, don’t post it. If you must, well, hopefully one of these solutions will work for you.