Let me present to you a number of scenarios:
1: On a Google search for your brand name, one of your blog posts that happens to mention your brand more than normal is the number one search result, followed by your FAQ. Your homepage only comes in at number three.
2: On a Google search for your brand name, the number one result is an old and out of date version of your homepage. The real homepage is not present, or is ranked lower on the page.
3: On a Google search for one of your primary keywords, an older guide from 2014 is showing up higher ranked than a newer guide, with more detail and better information, published in 2017.
4: On a Google search for a keyword where your content ranks highly, you’re seeing entries in the search results for pages that are system pages created for images, in addition to the pages where those images are hosted.
All three of these are examples of times where Google is indexing your site in some way improperly. Your better content should be rising to the top. Your homepage should rank higher than sub-pages for your brand name. Your images shouldn’t have dedicated pages of their own.
There are, frankly, a ton of different reasons why Google might be ranking the wrong pages in preference to pages you would prefer to rank in their place. Sometimes those reasons have to do with quirks of the algorithm, sometimes with your site structure, and sometimes with technical issues on the pages in question. Let’s examine possible causes and look for solutions, shall we?
I’m covering this one first, primarily because it’s a new issue that has come up recently due to a change in Yoast’s SEO plugin, which means a lot of people will probably be experiencing it. We wrote about the specific issue in more detail here, but I’ll cover it more generally now.
Essentially, what happens is that you have system pages on your site that shouldn’t be indexed, but are. In the case of the Yoast issue, it’s attachment pages for images that are generated automatically by WordPress. These pages rank on Google, at least temporarily, because your site has its own SEO value and the pages are new posts on your site. However, after a few days or a few weeks, Google notices that all of these pages are extremely thin content. They have an image, a caption, and nothing else outside of your site theme.
That right there is a recipe for a massive Panda hit against your site SEO. The Yoast issue is just one potential cause, but any time you see a system, category, or otherwise non-standard page indexed on Google, you can bet there are a lot more indexed that you probably don’t want visible.
Usually the problem here is that these system pages became visible when they shouldn’t have been. Either you had a setting in an SEO plugin like Yoast hiding them, or you had directives in your robots.txt or .htaccess files preventing search crawlers from finding them. Obviously, they’re not good pages to have publicly visible, so when they become visible, it can throw off a lot of your SEO.
The solution is to re-hide them, or redirect them to the host page, in the case of image attachment pages. You essentially want to tell Google “hey, you shouldn’t be here, we don’t want anyone to be here, you should instead go here.” Eventually, Google will figure it out and will properly organize your site. Unfortunately, it can take a while, especially if Panda hit your site along the way.
Example number three up above is one where this could potentially be the cause. Canonicalization is a method used in meta data to make sure Google and other search engines can identify a specific version of a piece of content as the “real version” of that content.
For example, imagine that you have an internal database of 1,000 products, and you have a site search to browse through those products. Plus One result 2020 page looks just like any other. You might have 1,000,000 different URLs all with virtually identical content. If Google indexed every possible search result, your site would be utterly destroyed with the duplicate content penalties.
Google is generally smart enough to recognize this specific scenario, but there are a lot of issues where duplicate content can hurt you, resulting from dynamic URL generation, URL parameters, and other sorts of technical causes. In these cases, you want a canonical URL. Any time Google loads a page with a weird URL, they will see “hey, actually X is the canonical URL for this page” and they will know not to index it, and to index it as X instead.
If you have canonicalization in place when you, for example, change and buff up an old piece of content, you might find that it points to the other version of the content instead. Google might find and really like the newly enhanced version of the content, but if the canonical URL points to a different place, they won’t be able to properly index it.
If you can’t find any canonicalization issues, you can look for hard redirects in your .htaccess and other system files. A similar issue that would cause the intended page to not rank at all is if the intended page redirects some or all traffic to the ranking page.
In some cases, you will find that one page is ranking highly for a specific keyword when that keyword is not something you would consider to be a primary keyword for that piece. In fact, you might have another piece of content that focuses on that keyword as the primary keyword, but the other content ranks better.
What might be happening here is “keyword cannibalization.” The content that ranks high for the keyword might have a bunch of value on it, giving it an already high ranking in general. Then you have internal links using the keyword as anchor text, pointing at the content. Those links lend that keyword to the target page, and make Google think the page is relevant to that keyword. Thus, it ranks highly, when perhaps another piece of content should. You can read more about this issue here.
In the case where homepages are not ranking as highly as subpages on your domain, you might have an issue making it indistinct which page specifically is your homepage.
Now, for most sites, the homepage is just the domain name. Google’s homepage is Google.com. Moz’s homepage is moz.com. However, I’ve seen plenty of sites that have strange setups for their homepage. Something like “www.blog.site.com/home.html”. Can you tell me what’s wrong with that?
For one thing, it’s on a subdomain. Subdomains can screw with things, and if you intend for that to be your homepage, it probably won’t work. Google will prefer site.com as the homepage, rather than blog.site.com. If EVERY page on your site is on some subdomain or another, it can confuse the search engines.
The second is that “home.html” is not necessary to modern web design. It used to be common that a home.html or index.html represented the homepage, and Google can still identify index.hmtl as the homepage, but it’s not necessary. Most of the time, it’s just the basic domain name.
You essentially need to move your homepage to the right intended space in the plain domain, or redirect the plain domain to your intended homepage, if you want Google to reach it properly.
In some cases, a site that is clearly relevant to a query can rank highly for a query it is not actually relevant to. Moz goes over one such example in this post. Essentially, a glasses-focused query had a Lenscrafters result that didn’t match the query. Lenscrafters has a lot of SEO power and thus ranks anyways on the potential for it to be relevant, even though the page that ranks is not actually relevant.
This is one of those problems that makes it hard for small sites to out-rank large sites, but also that makes it possible in the first place. Google looks for relevance first, and site value second. Thus a small site with highly relevant content can out-rank a large site with barely-relevant content, but that large site is still going to have a lot of inertia to overcome.
If that larger site were to create a more relevant piece of content, they would immediately snap to position number one in the search results, with the weight of their SEO. This is why it’s harder to compete against a vigilant site; if they want to dominate a niche, they will, just as soon as they notice it.
Another issue identified in that Moz post is accidental blocks or crawling issues. If a page you want to rank isn’t ranking, and the “next best thing” on your site is ranking in its place, it’s possible that the primary piece of content is having issues being indexed properly.
Canonicalization and redirects can cause this, but there are other potential issues as well. A lot of site designers like to post their designs live, hidden behind a “noindex” attribute, so that they aren’t indexed before they’re ready to go. It’s good to test content live, but if you forget to remove the noindex, Google will never index it.
There are other reasons why a piece of content might not rank well as well. For example, you could have an extremely good guide for a tricky task that ranks very poorly for a query related to that task, even though all the existing content is terrible. How? Well, if the content is purely video, or is on an infographic, it has no weight. Though they’re working on it, Google currently has a hard time accurately indexing the content of a video or image, and will likely remain having a hard time for quite a while. It’s very difficult to parse audio in comparison to text, after all. Images can have OCR applied to them, but the near-infinite variety of fonts, coupled with the need to recognize various symbols and languages, can lead to improper indexation at best.
This kind of plays into the previous point somewhat, but it’s entirely possible that a targeted page on your site isn’t actually very good. A page with little or no indexable content, or a page with usability issues, or a page where the content is dynamically generation; these are all hard to index.
Sometimes you just don’t have the right kind of content on your page to rank properly for the keyword you’re targeting. You might be ranking just fine, but ranking on the second or third page, simply in comparison to the other sites ranking for the same content.
Normally I would mention competition driving your content down, but that’s not really Google indexing the wrong page, it’s just other pages being better than yours. The issues we’re concerned with are just those where Google is ranking page A over page B on your site, when page B is the better page on the same topic.
In these cases, it’s up to you to decide what to do. If the two pages are similar enough, I generally like to combine them and redirect one to the other. The combined value of the content from both pages reduces traffic cannibalization, combines link value, and generally bolsters the power of that single page.
Have you run into any similar issues? If so, how have you diagnosed and solved them?