Everything written about SEO hinges on one thing; your page being indexed. If your page isn’t in the Google index, it’s not going to earn a quality score, a PageRank or anything else. You can put every white hat technique into play. You can have a fast-loading, well-designed site with hundreds of articles in the archives. You can take every step in the book, but if your pages aren’t indexed, it doesn’t matter.
What might be causing your pages to not be indexed? There are a few reasons, ranging from simple mistakes to search penalties.
The simplest and least detrimental reason your pages aren’t being indexed is simple; they aren’t being found. Imagine you upload a new website and then just sit on it. You don’t link to it from any social profiles, you don’t link to it from another website, you don’t tell the URL to friends; nothing. Who is going to find that page? No one is going to accidentally type in your domain name and stumble upon your page.
The same thing happens if you have a website indexed, but you post a single update with no links pointing towards it. Google can’t scan your domain for every possible variation and brute-force index everything you submit. It would be impossibly resource-heavy to try, let alone succeed. If you post a page but nothing links to it, Google will have a hard time finding it.
There are a few ways Google can find a page. The first is to follow a link to it. This is a simple fix; when you post a new page, link to it from an existing indexed page. Maybe all it takes is a Facebook post. You might also submit a sitemap to Google and put a link to the new page on your sitemap. When Google checks the sitemap, it will see and index the new link. You can even ping Google directly, telling it that you’ve posted a new page.
There are a few reasons Google might not see a page, even when it’s linked. Primarily these reasons center around the Noindex attribute. This attribute tells Google that, even though it can see the page, it should not index it. This is most often used for system pages and files that aren’t necessary for the public to see.
Noindex can be found in a few places. First, in the header of an individual page. Check the <head> tags of the page you’re having trouble with and look for the entry <meta name=”robots” content=”noindex”>. If you see this entry, remove it; it’s telling Google to ignore the page.
The second place you might find the noindex attribute is in the robots.txt file in the root directory of your server. For most sites, this file will be blank. If there’s some complex list of ignored directories, check to see if any of them are blocking important pages. You may need to have a chat with your web designer.
Once the noindex attribute is removed, Google will be able to index the page without issues.
What happens when you tell Google that a page it sees is actually just another version of a different page? This happens fairly often with ecommerce sites; multiple versions of a search results page will have URLs pointing towards the various filters used in the search. Rather than risk duplicate content penalties, you would apply a rel=”canonical” tag to the page, pointing to the blank search page.
The problem lies when you accidentally apply rel=”canonical” to a page that doesn’t need it. Once again, look in the <head> section of your page. If you see a meta tag with rel=”canonical” and a link to a different page, check that other page. If that other page is not supposed to be the canonical version of the page you’re on, remove the canonical entry.
Essentially, what this is doing is telling Google that the page you’re on is not the real version of the page. Google goes instead to the canonical version of the page and indexes that version. This can lead to your original page, the one you want indexed, to be avoided.
Sometimes, there are reasons you may want to remove a page from the index. For example, if a page was hacked and compromised, or if it was full of duplicate content and you plan to fix it, you may remove it from the index temporarily while you apply these fixes. The problem, then, is that Google has been told to ignore the page. Now, when you want the page to be indexed once again, you need to tell Google it’s okay to index the page again.
To do this, you will need to go to Google’s Webmaster Tools. In the tools, you need to find the Remove URLs section. Google doesn’t like this section now that Noindex exists, but some people still use it. If you find entries in this section, you may need to adjust the blockage so that it expires and your page can be indexed once more.
If you didn’t tell Google to remove the page from the index manually, but it has still been removed, you may be subject to one of many possible Google penalties. Some penalties only cause your page to sink in the rankings, while others will cause your site to disappear entirely. Hidden links and other black hat techniques can cause such an action. You will have to audit your site for any black hat techniques that may have been applied.
You can also check with Google Webmaster Tools to see if there are any penalties applied to your site. If there are, Google will tell you what those penalties are and give you the first steps toward fixing them. Some issues include excessive duplicate content, affiliate links, thin content, cloaking and other such issues.
One particularly dangerous reason your site may have been removed from the index is when your site is compromised and hacked. If Google detects malware, viruses, phishing code or other malicious intent on your site, they will block your page from view. This will show up in Webmaster Tools, but the recovery process is long and difficult.
If you haven’t yet checked Webmaster Tools, you can discover if your site has been compromised by running a site search for your domain. If you find a URL and click it, Google will display a “this site may harm your computer” or “this site may be hacked” warning before it allows you to proceed. Fix these issues and your site will be restored.