When to use robots.txt and 301 redirects for Panda SEO recovery

With Google's Panda update #25 lurking just around the corner, it's important that sites under penalty from previous Panda updates are taking the right steps to resolve their SEO problems.

The Panda algorithm deals predominantly with content quality, including issues like duplicate content, thin content, and generally poor quality content. Learn more about how Panda may be affecting your site's Web traffic by reading Top 5 reasons your site is losing Web traffic to Google's Panda.

This article highlights and contrasts the use of robots.txt and 301 redirects to "shape" your content into a Panda friendly, high traffic machine.

What is robots.txt?

The robots.txt file can be used to control which directories and pages search engines have access to. While the robots file cannot enforce the rules it lays out, most of the important search engines (like Google) will obey robots.txt.

Here's an example of how to prevent Google from crawling a page entitled affiliate_links.html:

User-agent: *
Disallow: /affiliate_links.html

You aren't limited to listing each and every file that you wish to be Disallowed.

In fact, there is quite a bit to learn about how to use robots.txt, and every webmaster should have a basic knowledge of how to use this file. I recommend you visit the Web Robots Pages for more information.

What is a 301 redirect?

301 redirects are extremely important for SEO purposes. A 301 response code tells Google that a page has been permanently moved to a different address. This is obviously important when a page name or path changes, or a domain name change occurs.

Without redirection, Google will treat the same content at a new address as completely new content. And what's worse, it will most likely treat this as duplicate content since the old address is likely still indexed.

301 redirects can be implemented programmatically, or from an .htaccess file. People using hosted website solutions may also not be able to edit .htaccess directly, although leading website builders should provide other ways to implement 301 redirects.

A typical redirect from one page to another looks like this:

Redirect 301 /oldpage.html /newpage.html

URL redirection is actually a lot more powerful and flexible than this quick example, and it is possible to use regular expressions to redirect entire domains, directories, or pretty much anything you like.

Wikipedia has a pretty good URL redirection write-up, if you want to learn more.

When to use robots.txt and 301 redirects for SEO

So robots can block search engines from crawling pages, and 301s can redirect old pages to new pages. Sounds pretty simple, right?

The problem is knowing when to use one and when to use the other, and how to avoid conflict between the two.

Getting things wrong can make your site an easy target for a variety of penalties, or at the very least, undesirable consequences.

More importantly, if your site is currently under a Panda penalty, then it is very likely that understanding and implementing robots and 301s will help your site recover quickly - once you've identified the underlying causes.

Limit usage of robots.txt

Google doesn't like being blocked from crawling pages so it advises that robots is only used when you know absolutely that Google should not be crawling a page or directory. Good examples are log in pages, system report pages, image folders, code folders, or website administration pages.

When it comes to actual content, only pages that may damage your page rankings (such as low quality or thin affiliate pages) should be blocked - although there probably isn't a good reason to have low quality or thin affiliate pages in the first place.

When in doubt, don't block content using robots.txt. Make sure you use 301 redirects or canonical tags to indicate the correct versions of webpages. Restrict robot exclusions to parts of the website that should never be part of Google's index.

Don't use robots to block duplicate content!

Often webpages are accessible by a number of different URLs (this is often true in content management systems like Drupal). The temptation is to block the unwanted URLs so that they are not crawled by Google.

In fact, the best way to handle multiple URLs is to use a 301 redirect and/or the canonical META tag. You can learn more about canonical URLs at Matt Cutts' blog SEO advice: URL canonicalization.

Don't combine robots and 301s

Most commonly, people realise that Google is crawling webpages it shouldn't, so they block those pages using robots.txt and set up a 301 redirect to the correct pages hoping to kill two birds with one stone (i.e. remove unwanted urls from the index and pass PageRank juice to the correct pages). However,

Google will not follow a 301 redirect on a page blocked with robots.txt.

This leads to a situation where the blocked pages hang around indefinitely because Google isn't able to follow the 301 redirect.

Never 301 a robots.txt file

When moving from one domain to another, many webmasters simply set up a 301 redirect at the domain level so that all files and folders on the old domain redirect to the new one.

This works fine, except for the robots file, because any changes you implement in the old robots file will be missed (because it is redirecting to the new file), and any changes you make on the new site will be applied to everything.

To avoid potentially confusing and disastrous SEO situations, its best not to 301 a robots.txt file.

Those are my SEO tips regarding robots and 301s. What experiences (good or bad) have you had with Google, Panda, SEO and using robots and 301s?

Share your advice and tips in the comments.

