As hard as it is for me to believe, most people don’t know what a browser is. (It’s the software you use to interact with the web…Internet Explorer, Firefox, Chrome, Safari, etc.) So, even less are going to know what “duplicate content” or the “cononical link tag” is. However, this is a very important topic for anyone with a website who wants it to rank in the search results.
Many content management systems, like WordPress for blogs, e-commerce carts like Interspire and OSCommerce, Drupal, Joomla, etc. create multiple versions of the same page, with different URL’s. That’s duplicate content…when you have the same content available on two different “pages”. Search engines don’t know which page to rank, and the “link juice” you get from people who link to your site will be watered down…split between the duplicate pages. If you’re using a content management system, it’s likely that you need to do something about this for sites you want to rank well.
To give you an example, some e-commerce sites have sort options for products. You can click on a link to sort by price, in alphabetical order, etc. Each one of these sort options has a different URL. So when Googlebot comes along and visits the links, it sees multiple versions of a nearly identical page. This is bad. WordPress has similar issues, splitting link juice between categories, archives, tags, etc. (Yoast has created a WordPress plugin to deal with this. You can also use his Robots Meta plugin to stop Google from indexing certain pages.)
Getting back to the canonical tag… Google came out with a canonical link tag, which allows you to specify which version of your URL you want them to index, as a solution to some of these duplicate content issues. Matt Cutts, the head of Google’s Web Spam Team, also posted about this canonical link element. The problem is, IT DOESN’T SEEM TO WORK! Read Here.
So what can you do?
- Disallow pages/areas via robots.txt (this leaks a bit of “page rank”, but is probably better than duplicate content)
- Add a noindex, follow tag to the header of pages you don’t want indexed, but still want “page rank” to flow through.
It’s not a simple topic, but it’s a problem that many webmasters have, and one that needs to be dealt with if you want your site to rank. If you have any questions, post them in the comments below.