Since the beginning of the Panda updates in 2011, Google has waged a war against duplicate content. To many webmasters, this has little effect. To some, it’s a warning to change their business practices. A small minority, who based their business around peddling copied content, faced the music. Yet many businesses today may be flagged for duplicate content without knowing why. How can you solve this issue?
Why Duplicate Content is Bad
The obvious answer to the question is that Google says duplicate content is bad, so it’s bad. If the search giant chooses to penalize duplicate content, webmasters must comply. The alternative is to lose traffic, ranking and revenue. Why, though, does Google penalize duplicate content?
The answer lies in how search engines read and index content. Web crawlers have a number of different feeds to search for new content, including links on existing indexed pages and social media posts, among others. There is no single feed showing posted content in upload order. In other words, when you post content, it might not be seen right away. In some cases — rare though they are — a content scraper may steal your content and post it on another site, and that site may be discovered first.
Google is often faced with tough decisions. When the same piece of content is posted on two, three or more different sites, which site was the original? Timestamps are not always accurate and can be faked. PageRank and other ranking indicators aren’t necessarily accurate either. A high profile site may be guilty of stealing content from smaller competitors, hoping that their reputation will protect them.
This is why Google penalizes duplicate content. When using stolen content for SEO, a business is consciously saying, “we know what the rules are and we are flaunting them to our benefit.”
Is duplicate content really all that bad? The answer varies from case to case. Matt Cutts, webspam guru at Google and public face of webmaster relations, claims that most duplicate content is not an issue. Most sites with minor accidental duplicate content issues — like those issues outlined below — are not penalized. It’s only when a site starts intentionally using duplicate content in a spammy way that issues arise and penalties are dealt.
That’s not to say you shouldn’t do something about duplicate content. As a webmaster, you want to optimize your site for the best rank it can achieve. Duplicate content may not actively penalize you, but it does hamper your natural SEO growth. If you address the issue, the link juice from each duplicate page is rolled into one, giving it that much more power.
Where Unintentional Duplication Comes From
As mentioned, many sites have issues that cause duplicate content, often without the webmaster realizing what’s happening. How does it happen?
- • Code issues with the way a site handles individual page URLs can cause duplicate content. A url at www.example.com/subsite, and the same site at www.example.com/subsite?category=option, look different to a search engine. Even if that subcategory identification simply directs a user to a particular opened active script or part of the page. To Google, both pages are different. To the user, they’re the same. The search engine sees duplicate content that doesn’t strictly exist.
- • Including a printer friendly version of your content is good, right? Unfortunately, a search engine sees “printer friendly” as a link and crawls it, only to find the exact same content as on the original page. It reads this as duplicate content, even when the content is intentionally the same.
- • Session ID tracking. Some forms of session tracking append a session ID to the URL. Each visitor to www.example.com has a URL that looks like www.example.com/SESSID=12481632. This includes web crawlers, which causes the search engine to see a different URL for each piece of content each time it’s indexed.
See the trend? Content is most often identified by its URL, so any trick of coding that changes the URL shows up as duplicate content.
How to Manage Duplicate Content
There are a few ways to deal with duplicate content. First, you need to make sure your duplicate content issues are not part of a black hat SEO scheme. This means that any duplicate content you have on your site needs to be similar to the above examples. If you are intentionally posting the same piece of content on several sites, you are spreading it in a way that pushes SEO value over content uniqueness, which is a legitimate duplication issue. Invest in spun articles or original content. Be aware, however, that article spinning techniques are also monitored by Google, and insufficient spinning can be flagged as a spam technique as well.
Once you have ensured that your duplicate content issues are legitimate, you have three options.
- • Ignore it.
- • 301 Redirects.
- • Rel=”Canonical”
The first option, ignoring the issue, is obviously the easiest. Matt Cutts himself claims that as much as 25 percent of the content on the Internet is duplicated, and that Google does not concern itself with normal content duplication. With no risk of penalty, your site is fine if you just ignore the issue. In fact, a huge number of sites online do ignore the issue, and are perfectly fine in doing so. That said, ignoring it does ignore the potential SEO power of combining those “duplicate” pages into one entity.
The 301 redirect is the most complex solution. Setting up a 301 redirect is a complicated process and doesn’t always work for the typical duplicate content issues. It is, however, a valuable tool for updating resources with new URLs. A 301 redirect combines the link power and PageRank of two pages with duplicate content, forming a sort of super-page. It’s not the most practical solution, however, and doesn’t always work.
The Canonical Tag
Canonical is a reference tag that Google promotes as the best solution to duplicate content. In the HTML header of your website, for a page that isn’t the “correct” version. For example, say you have two URLs. One is www.example.com/realpage and the other is www.example.com/realpageduplicate. In this instance, in realpageduplicate, you would add the header code “link href=”www.example.com/realpage/” rel=”canonical” “. What this code does is tell the search engine that any link juice, PageRank or SEO power attached to realpageduplicate should be passed to realpage. It’s quick and easy to implement and it’s an elegant solution to a complex problem.
Dealing with Black Hat Content Duplication
On a final note, occasionally a black hat site will copy your content for their own nefarious purposes. In rare instances, the black hat site will have a larger presence and higher SEO power — and thus higher search ranking — than your site. This is frustrating, because the other site is gaining more credit from stolen content than you get for original content. The solution to this problem isGoogle’s new Scraper Report tool, which helps you identify and report scraped content. This has no immediate effect on your site or theirs, but it helps Google identify scraper pages for future analysis and removal.
Duplicate content can be an issue for some sites, but the canonical tag is a quick and easy fix for nearly every instance. If you fear that duplicate content is holding you back, take the time to implement the tag and see how your ranking improves.