Canonicalisation Issues

What is canonicalisation?

“Canonicalisation is the process of picking the best URL when there are several choices…”

Basically, quite often a web page will have several URLs for the same page, for example:
http://www.cpbhaiseo.com
http://cpbhaiseo.com (notice without the dub dub dub)

Both of these URLs load the same page, the homepage! There can also be other versions of the URL loading the same page with additional parameters such as /index.php  or even /home.php In addition the owner of a website might have bought several domains (TLDs), for example I also own the .co.uk TLD: http://www.cpbhaiseo.co.uk If this additional domain is just pointed to the website/page this will again load the same page. So potentially I could have 8 different URLs loading the homepage for Verve Search.

What is the canonical issue?
A canonical issue arises when 301 redirects are not properly in place. This means that your website can be accessed by search engines from several different URLs. This means that search engines can then potentially index your site under different URLs, meaning that it will look like a site of duplicated content.

What can be done to resolve the canonical issue?

The best and most effective way to resolve the canonical issue is with a permanent 301 redirect. This can be implemented in a number of ways, as detailed below. Depending on what server your website is hosted on will determine the method which you use to implement a redirect.

This is a problem for several reasons, fundamentally because when the search engine visits your website the search engine spiders is likely to be having this experience:


It would be even more complicated for the search engine spiders if in addition to all these URLs your website also contained URL based sessionIDs (sessionIDs=dynamically generated a separate URL for each user in each session, including the spiders) For example http://www.cpbhaiseo.com/?PHPSESSID=123 . Each page would then be likely to have hundreds, maybe even thousands, of separate URLs for the same page. The real problem then comes when the spiders indexes one of these sessionID URLs instead of your main URL. Yes it will look rubbish, BUT the real problem is that this URL is unlikely to have any link authority as it’s a unique URL just for the session when the spider crawled the site. The real problem is when loads of these URLs find their way into the search engine index, as these sesssion URLs are likely to have any link authority, so if you are trying to rank within a competitive market this could be holding your site back significantly. Worst case scenario the spiders can be indexing a sessionID instead of the main URL to a page.

Note: the reason some sites use sessionIDs is usually to be able to do in depth tracking of each session. For those of you that do this I would recommend using cookie based sessions instead of URL based session IDs. Yes, cookie based tracking might not be as accurate if users disables cookies but I believe it’s better in the long run as session based URLs could potentially harm your SEO efforts and over complicate things

How canonicalization issues affects link authority!

In your mind the http://www.yourdomain.com/ is usually your main URL, but don’t assume this is obvious to users and search engines. If you haven’t chosen a canonical URL (and implemented the appropriate redirects or rel=canonical tags, don’t worry explanation will come) it is likely that some links will go to one of the other URLs, for example a user types in my website direct into browser but uses the .co.uk TLD, it finds the page they wanted to link to and links to it using the .co.uk. Another example could be a user following an internal link and the internal link goes to /page/index.php but your link builders are getting links to the main URL, now you have links going to both URLs and the link authority is being diluted. You still following me? Now imagine you also have sessionIDs on your site and a user have visited your site, gets a sessionID and bookmarks the page (with the sessionID) then links to it via his/her blog. Now you have 3 different URLs to the same page with links, imagine how much more powerful the page would be if all of the links went to one URL??!!

How to fix canonicalisation problems

There is now 2 different ways of fixing canonicalization issues to your site. Quite recently Google announced supporting a new “canonical tag” that lets you specify in the HTML header that the URL in question should be treated as a “copy” and names the canonical URL that all link authority and content metrics should flow back to.

Example:

Within the HTML header of the page loading on this URL http://www.vervesearch.com/index.php   there would be a parameter like this:
<link rel=”canonical” href=”http://www.vervesearch.com/” />

This would “tell” the search engines that they should index the canonical URL specified in this tag and also weigh any link authority from the /index.php URL to the canonical URL. The rel=canonical tag should be implemented on every URL you have that is loading the same page (except from the main canonical URL you want to use of course).

This tag is really easy to implement and can solve a lot of canonicalization issues, BUT it has its limitations. For example you can’t use this for your country specific TLDs (which essentially a separate domain) or other additional domains you might have bought. There might also be issues with the fact that this tag only “redirects” the engines attention to the correct URL, users will still be able to use all the different URLs and within your analytics these are likely to come up as different pages.








No comments:

Post a Comment