To Block or Not to Block with a Robots File

A question that I hear a lot is, “What should we be blocking in our robots.txt file?”  This is a very good question.  There are a few things that you want to make sure that are definitely blocking in the file.  These pages include any login or admin pages that you do not want the crawlers to get access to.  There is no reason for these pages to be indexed or others to find, so it is best practice to have these pages blocked.  In addition to the admin pages that you don’t want others to have access to, you may have pages within your site that contain customer specific information.  It is necessary to block these pages from being crawled in order to help maintain the privacy of your customers.  Not a good look if a customer’s order information makes it into a Google search results page.

On some e-commerce sites, there are session ID’s that are set within the URL and create a new URL on every visit.  You do not want these URL’s to be indexed, so blocking the session ID’s within the Robots.txt file is another way of doing this.

There are some precautions that you must take into considerations when creating this file.  If there is doubt on whether or not you should add something to the file, you may want to just leave it out.  It is better to allow parts of the site to be crawled, than to have parts blocked that are important and should not be blocked.

Remember, the robots.txt is an easy way to block access to private pages or sections of the site that you do not want to be indexed or crawled by search engines.

One thought on “To Block or Not to Block with a Robots File

  • February 21, 2012 at 9:59 pm
    Permalink

    Keep in mind that the robots file is accessible to the public, so any potential snooper can figure out the secure areas of your site if you blocked them in the robots file. Better in that case to use the robots element to stop spiders from indexing/crawling the page.

Comments are closed.