If you work as a content writer for SEO service, you certainly have to face or deal with duplicate content issues. There are many definition of duplicate content that you can discover. But, among so many definitions, some people still love referring to Google’s definition. According to Google, duplicate content generally refers to substantive blocks of content within or across domains that either completely matches other content or is appreciably similar.
Broadly, Google categories two types of instances of duplicate content. The first type is for duplicate content on the same domain. The other type is for duplicate content across multiple domains. This way, you can review whether you have that kind of duplicate content on your site.
Problems caused by Duplicate Content
After reviewing the existence of duplicate content on you site, now, you have to know the problems that can be caused by having duplicate content.
- Link Popularity Dilution
You will end up creating and distributing different versions of your site links as you start link-building. This is because search engines could not interpret that all the URLs pointed to the same target location as you don’t set a consistent URL structure for your site.
- Showing unfriendly URLs
When Google discovers two identical or appreciably similar resources on the web, it chooses to show one of them to the searcher. Mostly, Google will select the best one to be displayed on search engine result lists. But, sometimes Google might show a poor looking URL version of your site.
There wouldn’t have been this confusion and the user would only see the best and most branded version of your URL, if you avoided the instance of duplicate content in the first place.
- Zapping Search Engine Crawler Resources
Since Google will surely crawl your site, so it is important to understand how search bots will crawl over your new content. These crawler cycles could have otherwise been used to crawl over and index any freshly published content that you might have added to your site. This not only will waste crawler resources but also will hurt your SEO.
Solutions for Treating the Duplicate Content Problem
In the earlier section, you can see that a majority of instances of duplicate content occurs when the URL structure is inconsistent. Therefore, it will be better to standardize your preferred link structure along with proper use of canonical tags. To let Google knows your preferred URL version, you can set your preference in your Google Webmasters account. Here some things that you could opt to set your preferred domain:
- Sorts duplicate content issues with the www and non-www version
- Retains link juice
Next step after setting your preferred domain in Google Webmaster Tools is to set up 301 redirects from all of the non-preferred domain links on your site to your preferred ones. This step will make visitors learn about your preferred version.
Canonical tags can be implemented through several ways, such as follows:
- Set the preferred version: www and the non-www
- Manually point to the canonical link for all the pages
- Set up 301 redirects
- Use the hreflang tag to handle localized sites
Hreflang tag is important to help the search engines choose the right version of your content, especially after you use translated content. For example, you translate your English site into Spanish to meet the local audience language, you should add the tag, “<link rel=”alternate” href=”http://example.com” hreflang=”en-es” />” to the Spanish version of your site.
Following this method will avoid you from the risk of search engines considering it as duplicate content and in the other side, it will also improve the user experience when they want to be served in their native language.
- Use the hashtag instead of the question mark operator when using UTM parameters
To measure the effectiveness of different Channels, it’s common to use tracking URL parameters like the source, campaign and medium. But, when you create a link like http://yoursite.com/?utm_source=newsletter4&utm_medium=email&utm_campaign=holidays, search engines crawl it and report instances of duplicate content.
Hence, an easy way is to use the # operator rather than the question mark. In order to avoiding duplicate content issues, search engine bots come across the # sign in a URL, they ignore all that follows the sign.