Skip to content

Performance icon

Are Searchbots Killing Your Bandwidth?

Every webmaster wants search engines to index their sites but what do you do if searchbots start killing your bandwidth? In this tutorial we take you through the steps that will help keep searchbots under control.

Some people agonise over how long it takes before their web site is indexed and returning results in search engines. But what happens when the bots come to call and search engine crawler robots make huge hits on your server resources? Not only can this eat your bandwidth but server resource usage can get so high that web hosts temporarily disable your account. Worse still, some hosts do not correctly identify search bots as the problem and they then try to convince you to upgrade your hosting plan or move to a dedicated server.

So, what to do?
The first thing to do is follow the recommendations in the first post of this thread. They will improve performance.

The #1 tip is the most important - don't use the inbuilt Mambo statistics!

#2 - make sure search bots stay on the same domain.
Some people have sites that use links containing both www and n-www. This means that when the bots come calling they will follow your links and end up indexing both your www and no-www versions of your site. You will effectively be doubling the number of crawls. Choose www or no-www but don't use both!

#3 - use a robots.txt file to tell search bots what you don't want them to index. They will still crawl these directories but will not index what they find. And on subsequent visits they often leave those directories alone.

#4 - set a crawl rate in robots.txt.
You do this by adding this to your robots.txt file, at the top:

    User-agent: *
    Crawl-delay: 60

60 means 60 seconds. I use 5 but in times of heavy search bot activity I change this to a higher value to tell search engine robots to slow down.

More information about robots.txt can be found here: http://www.robotstxt.org/wc/robots.html

#5 - open a free account with Google Webmaster Tools.
For Googlebots this is more effective than the robots.txt. You can tell Google directly how fast or slow you want them to crawl. The tools also give good information about optimising your web site for Google indexing and gives reports on any problems Google has with crawling your site.
https://www.google.com/webmasters/to…/en/about.html

Q. What if I follow all these tips and my site is still being hammered?

A. Check your server logs to see what activity is occurring on your site at the times when server loads get high. This will help you to identify if search engine robots are causing the problem or if you have other things happening - such as spam bots, hacker activity, or, ideally, a sudden genuine increase in visitors to your site.

Bookmark This:
  • bodytext
  • Technorati
  • del.icio.us
  • Facebook
  • Google
  • StumbleUpon
  • Reddit

Whether I am developing Mambo or working on tutorials I am fuelled by coffee. Caffeine keeps me going so if you like the work I am doing please click on the cup to buy me a coffee today. Just $10 covers the cost of getting my caramel macchiato ;)

If you enjoyed this article make sure you subscribe to my RSS feed!

Leave a Reply

This is a gravatar-friendly site, enter your email address to use your gravatar.

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

By submitting a comment here you grant this site a perpetual license to reproduce your words and name/web site in attribution.