Google has been known to be ruthless with some of their violations and the associated penalties. It’s one thing to be put in the sandbox or to be pushed down to page two, or three, or deeper in search. It’s quite enough to go to Google, type in “site:www.myurl.com” and get nothing in return.
When that happens, it means you have been removed from the search index entirely. Oh, Google still has data on your site data indexed and saved, they’ve just pulled it from the live index. They’re saying “you did something wrong and we’re removing you from search results until you’ve fixed the problem.”
There are a handful of reasons why this might have happened, and a bunch of steps you can take to remedy the situation. Here’s a process you can use to recover, ideally quickly. If you’re good, you can get through this and have your rankings restored in as little as 12 hours.
When your rankings drop, you have some explaining to do, and the panic that results can throw logic out the window in the scramble to fix the problem. The first thing you want to do is use a rank checker across your full page, and/or the Google Site Search (by putting in site:www.yoururl.com into the search engine) to see what does and does not appear.
There are three possible outcomes here.
The noindex attribute is a meta tag that can appear in two different places in such a way as to cause a page to be removed from Google’s search results. The first is in the meta data section for a given page. If you find out that only certain pages have been removed from the index, this would be what you might want to check first. It’s particularly true if you were editing or changing those pages just prior to the removal taking place. You will find the tag in the head section of your page. It will look something like this.
There may be more data; it doesn’t matter. If the name is robots and the content is noindex, you’re telling search robots – like Google’s web spiders – not to index the page. The next time Google sees that page and reads that data, it will say “oh the site owner doesn’t want this page indexed, I had better remove it from the search results.”
If you find this meta directive in the header of the page that’s no longer indexed, congratulations; you’ve found the problem. All you need to do is remove the noindex tag. You can, if you want, submit a ping to the page through Google webmaster tools, or submit a fresh sitemap that lists the change date of that page as very recent. Either way, Google will soon discover that the directive is no longer in place and will happily index the page again. Ideally, there will be no loss of ranking once the site is restored to the index.
The other location you might find the noindex directive is in your robots.txt file. You will find this file in your root directory, ideally. There can be subdirectory robots.txt files, but if there are, you should remove them and merge them into your main site robots.txt.
A robots.txt file is a simple text file that has some basic information and directives for search engine robots. If you see a line that looks like “Disallow: /” then you’re banning search engine bots from crawling your website. The / is the offending character. If you remove it, everything will be restored. If you change Disallow to Allow, the same thing happens. If you remove the line entirely – or the entire file, if nothing else of import is in it – everything will be allowed.
Typically, this error comes up when you’re trying to do a site revamp and you’re testing it in a live environment. You might disallow indexing of the files in case anyone discovers them, so you don’t have to worry about people trying to use your testing site. This isn’t the ideal way of doing this, but I’ve seen it happen.
Thankfully, you can analyze your robots.txt file easily using a tool like this one provided by SEOBook. Google also offers one, which you can find here assuming you have your site listed in webmaster tools.
Speaking of webmaster tools…
There are two primary items you want to check in Google’s Webmaster Tools. The first is the manual actions section. This is where you will see Google penalties taken against you, that aren’t algorithmic. Manual actions tend to be all-or-nothing removals, while algorithmic penalties are the penalties that knock your search ranking down but don’t remove you entirely. If you have any manual actions in place, you will be able to see what they are, and you will be able to work to remove them.
The second thing you should check is the remove URLs feature. You can find this in the “Google index” > “remove URLs” menu. If you had a page that was being indexed and you didn’t want it to be – like a system page or something of the sort – you would be able to request its removal from the index here. This helps you hide crucial files or minimize the accessibility of the backdoors to your site. Ideally, all you’ll see here is “no URL removal requests.” If you see anything else, you may have somehow requested the deindexing of certain pages on your site. Rescind those requests and you’ll be good to go.
When a server is not responsive, Google can’t crawl the page. It tries, and all it receives is a time-out. If this happens, Google will often remove the page from the index and crawl the next one. Since the next one is likely another page on your site, it will determine that your entire site is missing. There’s no difference to Google between a missing page on a 404, an entire missing site, or a server that’s not responding. They all result in the site no longer being available to access, and that means they all can lead to your page being removed from the index.
The reason for this is simple and temporary; Google wants to serve the best results, so if a result isn’t loading, it’s not the best. It will be removed until it is detected as loading again. Thankfully, Google knows that downtime can happen, and they’re not going to take weeks to get back to you. Often, when a page doesn’t respond, they will come back to check and index it again within a day. It’s only when there’s a second strike, or a third, that they get more serious about ignoring you.
Unfortunately, unless your web host has uptime records published, there’s no good way to monitor your server responses in retrospect. You have to have your site enrolled in some kind of monitoring service, like Pingdom. Pingdom is great, and it’s what I recommend to keep an eye on your server uptime.
This is one error that might have cropped up if somehow you were implementing canonicalization and got some wires crossed.
Canonicalization is an important tool for keeping URL parity across your site, as well as minimizing possible duplicate content penalties when you have dynamic content generation through something like a product search.
The idea is simple; in any page that may be duplicated, you add a canonical tag to the meta data pointing at the real version of the page. So, for example, your site might be www.example.com. You can add canonicalization so that whenever anyone visits https://www.example.com they are redirected to the proper version. This minimizes cases where the two are counting as different URLs and splitting your page ranking power.
The same action is used when you have dynamic URL generation. Each unique URL counts as a different page, so Google might see 1,000 different pages that all share identical content. You canonicalize it so Google understands that they’re all the same page, just with strange dynamic URLs.
The problems come when you specify the wrong URL when you add canonicalization. If all of your canonical tags are pointing at a URL that doesn’t exist or that isn’t your site, it will essentially remove your site from the ranking and give all of your link juice to the site that says it’s the original source. This is very rarely going to be a real issue, but it’s something worth checking.
Now, by this point you will probably have seen signs of being hacked if there were any present. You might have strange activity in your server access logs. You might have odd obfuscated code in your pates. You might have entirely new pages you don’t remember creating.
In any case, there are a bunch of different ways a site might be hacked. Someone may have added pages and left your site alone; the main site is fine, but the additional pages are used in spam, which gets you blacklisted by Gmail and thus from Google entirely. You might have your homepage replaced with a spam page. You might have subpages replaced. You might had white-on-white text color matching links added to your homepage or to other important pages, stealing pagerank from your page and giving it to the spammer. You might be serving up malicious downloads or redirecting a user into a chain of spam.
All of these are signs of your site being compromised by an external force, which is something Google really doesn’t like. Your site being full of malicious code is a sure-fire reason for Google to remove you from the index. You’ll be able to restore your ranking, but it might take some time, because Google wants to make sure that it’s not going to happen again.
The road to recovery from hacking is not an easy one. You need to change your names and passwords for any account associated with your web host, including social media, emails, other web logins, and anything else that uses similar information. At the same time, you will need to check to make sure you aren’t sending password recovery emails to a different recovery address. Some hackers slip in their dummy info so that if you try to reset it, they will be able to reset it as well.
Only once your site is secure can you start to repair it. Ideally you will have a recent backup and won’t lose much or anything from your recent updates. Restore that backup and your data will be restored. Then you’ll have to ping Google to let them know that your recovery has ended.
Unfortunately, this will take some time and will mean that Google won’t be restoring your ranking right away. Fortunately, it’s a recoverable error, and you can get it fixed before too long.
You can read more about the reinclusion/reconsideration requests here, at Matt Cutts’ blog. He also has a link to the official documentation. It’s an old post, but it’s still relevant, because the process has hardly changed.
If you’ve looked over all of this and still can’t determine why your site was removed, you may want to contact Google support. They can give a look over your site and check to see if it is under the influence of a soft penalty or a hold of some kind. If that still doesn’t help, your site has fallen into limbo and the only way out is to travel to the great sage of the mountain, who can give you his ageless wisdom.