drupal

Giving search engine spiders direction with a 301 redirect

If you are porting over an existing website to Drupal, one consideration is how you redirect the old page URLs to the appropriate pages on the Drupal version of the site. If you don't want to create custom rewrite paths within Drupal for those nodes -- or perhaps cannot due to clean URLs or filename suffixes -- then 301 redirects are considered the best way to handle redirected pages, for they inform search engines to update their databases with the new paths. This way, you should not risk your search engine pagerank or lose site visitors with 404 "not found" errors.

Yet establishing 301 redirects is quite easy, provided you have mod_rewrite enabled in your .htaccess file.

How to create 301 redirects in Drupal Apache mod_rewrite

Edit your .htaccess file in a text editor. [Note: Be sure to save the file in "UTF-8" format.]

In the file, you will find the commands:

# Various rewrite rules.
<IfModule mod_rewrite.c>
  RewriteEngine on

  # Modify the RewriteBase if you are using Drupal in a subdirectory and
  # the rewrite rules are not working properly.
  #RewriteBase /drupal

RewriteBase /

Immediately after that code -- and before the Drupal-provided "Rewrite old-style URLs" commands -- add your rewrite rules using the following format:

#custom redirects

RewriteRule ^old/URL/path$ http://yourdomain.com/new/path [R=301,L]

#end custom redirects

Note the convention: The old path is simply the path off the root. The new path is the full path, including the domain. The [R=301,L] code is the 301 redirect instruction. (axbom notes: "The 301 tells browsers and spiders it is a permanent redirect, and the L ensures that no other rewrites are processed on the URL before it reaches Drupal; Hence place this code above Drupal 's own URL rewrite, but below the command RewriteEngine on.")

If you have more paths to add, insert the rewrite commands as their own line as above.

--------------------------------------------------------------------------------------------------------

Scenario

You are upgrading your web site and as part of the upgrade, it means moving and renaming particular files.

Danger

Search engines have indexed your entire site and many pages rank well. By moving and renaming these files, you run the risk of losing a lot of traffic and leaving visitors to your site who follow a search engine link with the dreaded "Error 404 - File not found"

Strategy 1 - Custom Error Page

You could create a custom error page. The problem with this solution is that:

a) You will lose rankings on the next search engine update as the file will appear to be non-existent. It could be some time before the page in it's new location or with a new name reappears.

b) Your web site visitors may be frustrated by the fact that they then have to dig through your site to find the desired information.

Strategy 2 - Meta Refresh

A meta refresh can be implemented in the <head> statement of your source code in blank page with the old file name, which then automatically redirects visitors to the new page. Example:

<HEAD> 
<META HTTP-EQUIV="refresh" content="0;URL=http://www.new.com/new.htm"> 
<TITLE>Page has moved</TITLE> 
</HEAD> 

Warning: This is a technique often used by spammers to trick search engines and it should be avoided, unless the page is in a section of your site that isn't spidered. 

What the search engine spammers do is to create a page that is optimized for certain keywords and phrases - it usually has no real content. The page is then picked up by some search engines, but when a visitor clicks on the search engine entry, they are redirected to another site, often unrelated. 

It's a despicable trick, but thankfully most search engines have filters to detect this. Using this form of SE deception will see a site eventually banned or penalized by major players such as Google.

Strategy 2 - 301 Redirect

A 301 redirect is the most efficient and spider/visitor friendly strategy around for web sites that are hosted on servers running Apache (check with your hosting service if you aren't sure). It's not that hard to implement and it should preserve your search engine rankings for that particular page. If you *have* to change file names or move pages around, it's the safest option.

A 301 redirect is implemented in your .htaccess file.

What is a .htaccess file?

When a visitor/spider requests a web page via any means, your web server checks for a .htaccess file. The .htaccess file contains specific instructions for certain requests, including security, redirection issues and how to handle certain errors.

What is a 301 redirect?

The code "301" is interpreted as "moved permanently". After the code, the URL of the missing or renamed page is noted, followed by a space, then followed by the new location or file name

How do I implement a 301 redirect?

First of all, you'll need to download the .htaccess file in the root directory of where all your web pages are stored. If there is no .htaccess file there, you can create one with Notepad or a similar application. Make sure when you name the file that you remember to put the "." at the beginning of the file name. This file has no tail extension.

If there is a .htaccess file already in existence with lines of code present, be very careful not to change any existing line unless you are familiar with the functions of the file. 

Scroll down past all the existing code, leave a line space, then create a new line that follows this example:

redirect 301 /old/old.htm http://www.you.com/new.htm 

It's as easy as that. Save the file, upload it back into your web and test it out by typing in the old address to the page you've changed. You should be instantly and seamlessly transported to the new location.

Notes: Be sure not to add "http://www" to the first part of the statement - just put the path from the top level of your site to the page. Also ensure that you leave a single space between these elements:

redirect 301 (the instruction that the page has moved)

/old/old.htm (the original folder path and file name)

http://www.you.com/new.htm  (new path and file name)

A more powerful set of directives for manipulating URLs is contained in the Apache mod_rewrite module,  especially useful when changing domain names and/or folder names containing large numbers of files. Read our basic tutorial on the apache mod_rewrite module.

Redirecting entire sites with 301

The 301 directive is quite powerful. You can redirect not just single files but entire sites, for example when changing domain names e.g. 

redirect 301 / http://www.you.com/

The first "/" indicates that everything from the top level of the site down should be redirected. As long as you are using the same paths and filenames, then this option is a very simple way to perform site redirection in the situation where you have only changed your domain name. 

If the site redirection doesn't work for you, check to ensure you have the trailing "/" on the destination URL. You may also like to try some of the other suggestions in our basic tutorial on the apache mod_rewrite module.

Canonical issues: www vs. non-www

There's been much talk lately of canonical issues and search engines. This is where both the www and non-www versions of your pages are listed in a search engine. This is said to possibly trigger a duplicate content penalty and/or split page rank. If this is of concern to you, you may wish to use the following, but be aware that you may suffer a further loss of traffic while the engines sort out what's what. This example is where you wish to direct all non-www traffic to www. Add the following to your .htaccess file.

Options +FollowSymLinks 
RewriteEngine on 
RewriteCond %{HTTP_HOST} ^yoursite.com [NC] 
RewriteRule ^(.*)$ http://www.yoursite.com/$1 [L,R=301] 

Ensure that all your links to folders always end in a trailing / if there is no filename after that link. 

FrontPage users:  in addition to the above, you'll also need to change the .htaccess files in:  

_vti_bin
_vti_bin /_vti_adm
_vti_bin/ _vti_aut

Replace "Options None" to "Options +FollowSymLinks" 

Those folders are part of your FrontPage extensions on the server, so you'll need to gain access via FTP.

Note: test, test and test again after making changes. Test *immediately* after implementing 301 redirects. If you find anything wrong, remove the redirect immediately. User a server header checker to ensure that you're getting a correct 301 response when using the old URL.



Search engine spiders & 301 redirects

The 301 redirect is the safest way to preserve your rankings. On the next spidering, the search engine robot will obey the rule indicated in your .htaccess file. The search engine spider doesn't actually read the .htaccess file, but recognizes the response from the server as valid. 

In the next update, the old file name and path *should* be dropped and replaced with the new one. Sometimes you may see alternating old/new file names during the transition period, along with some possible fluctuations in rankings as things settle. Don't panic - this is normal and may take a number of weeks before everything is back to normal; but the bottom line is, any change you make has risks - whether it's altering page text, moving/renaming pages or changing domain names. Search engines run by their own rules and can change those rules at any time. 

If you're changing domain names and using a 301 redirect, you'll need to leave the old domain name and files in place for a few weeks to give the major search engines time to catch on to the changes and don't forget to notify your link partners of the domain name change as soon as possible. Once you deactivate the old domain, any search engine kudos you've built up through those links will be gone.