Check Your Domain URL For Canonicalization Problems

by

If you have your own domain, this is a simple test to see if you might have a canonicalization problem…

Go to Edward Lewis’s great header checking tool at his SEO website. Put in your main website address as you think of it. For example, my website address that give out is http://www.1918.com – change the User Agent to “Googlebot”, then click Check Headers.

What you want to get back is something like:

1. REQUESTING: http://www.1918.com
GET / HTTP/1.1
Connection: Keep-Alive
Keep-Alive: 300
Accept:*/*
Host: www.1918.com
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

SERVER RESPONSE: 200 OK

What that tells me is that my web server is responding exactly how I want it to.

But what about the multiple other ways people may try to get to my site? Without the www, with or without the trailing slash, with or without the filename index.php

Let’s check in order of common problems:

Without the www in front of the domain:

1. REQUESTING: http://1918.com
GET / HTTP/1.1
Connection: Keep-Alive
Keep-Alive: 300
Accept:*/*
Host: 1918.com
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

SERVER RESPONSE: 301 MOVED PERMANENTLY
Date: Tue, 07 Sep 2010 02:04:30 GMT
Server: Apache/2.2
Location: http://www.1918.com/
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 190
Keep-Alive: timeout=2, max=10
Connection: Keep-Alive
Content-Type: text/html; charset=iso-8859-1
Redirecting to http://www.1918.com/ ...

2. REQUESTING: http://www.1918.com/
GET / HTTP/1.1
Connection: Keep-Alive
Keep-Alive: 300
Accept:*/*
Host: www.1918.com
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

SERVER RESPONSE: 200 OK

Perfect. What I want to happen is that any request that comes in without the www, I just want to re-route it to my one canonical url.  The reason is, I don’t want links to be split between 2 different pages that are actually identical.

Now let’s check to see if the trailing slash causes any problem – there wasn’t, same as first test.

Finally, with index.php appended to the main url:

1. REQUESTING: http://www.1918.com/index.php

GET /index.php HTTP/1.1
Connection: Keep-Alive
Keep-Alive: 300
Accept:*/*
Host: www.1918.com
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

SERVER RESPONSE: 301 MOVED PERMANENTLY
Date: Tue, 07 Sep 2010 02:19:18 GMT
Server: Apache/2.2
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
X-Pingback: http://www.1918.com/xmlrpc.php
X-Powered-By: W3 Total Cache/0.9.1.2
Set-Cookie: PHPSESSID=4c36183d09e3acaecef7c7e1ca7a13a7; path=/
Vary: Accept-Encoding,User-Agent
Location: http://www.1918.com/
Content-Encoding: gzip
Content-Length: 20
Keep-Alive: timeout=2, max=10
Connection: Keep-Alive
Content-Type: text/html; charset=UTF-8

Redirecting to http://www.1918.com/ ...

2. REQUESTING: http://www.1918.com/
GET / HTTP/1.1
Connection: Keep-Alive
Keep-Alive: 300
Accept:*/*
Host: www.1918.com
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

SERVER RESPONSE: 200 OK

Again, exactly what I want to happen. For the same reason as last time, I want any link to my homepage to point at one page, not one of it’s cousins, so I make sure all doors lead back to the main entry.

If you did this test and your server answered for some of these variations, you may have a dreaded canonicalization problem! I’ve talked about one quick way to fix canonicalization but if you need more help, let me know.

Previous post:

Next post: