google-crawl-rate-control

Crawl Control - Google Sitemap Crawl Rate Control

Extra, extra: read all about it! Bigmouthmedia in Google Sitemap XML Alpha Testing for Website Crawl Rate Control! Wait, Google Website 'Crawl Rate Control' - how can this be? Controlling the Googlebot?

This project is so early-stage alpha that there is hardly an article out there on the internet about Crawl Control - that is to say, on 'Google Crawl Speed Control' on Google's Sitemap XML Console. So what is Crawl Rate Control and why is Crawl Speed Important? Well, firstly it would be useful to have a little background on Google's Sitemap XML...

Bigmouthmedia has always been riding the Sitemap XML wave. We produced our Google Sitemap XML white paper a day after the beta project was unveiled, on the 2nd of June. Within a few months we had a raft of clients raving about and using the Sitemap XML project. The advantages of Google's Sitemap XML have long been speculated upon in forums and the like. Our in-house XML tests have shown that using the Sitemap controls have a range of benefits.

And the Google Crawl Rate Control? - Well, just hang on I'm getting to that part later... Firstly, to recap:

A hosted XML file lists all the URLs from a site. The technique favours sites where URLs come and go quickly as the sitemap can be updated and Google told of the change. The sitemap is a great asset to overcoming spider accessibility issues too.
The Google Sitemap XML does nothing to help additionally promote page positions within Google. Webmasters can set a priority level for each URL but this action simply "suggests" an importance for inclusion (would it be terrible if this page wasn't in Google?) and does not set an order of importance for search results. The Google Sitemap XML was released under the Creative Commons license and this will make it easier for other search engines to tap into the technology and make use of the sitemap XML files too.

Over time the Sitemap XML console has been moving more towards complete integration with the Google Webmaster Console. A verification file became the key to unlocking Google data for your site - A Sitemap XML verification file belongs to a Sitemap XML account/email combination, and the file name acts something like a password. Google passes on crawl reports for sites and verified Sitemap XML accounts to whoever accesses the account.

About five weeks ago, the Sitemap XML interface was updated again and took us closer to the unofficial Google Webmaster Console. The additions to the Google Sitemap XML Console saw:

Vanessa Fox from Google Engineering said: "Since the Google Sitemaps program is built on the idea of two-way communication between Google and webmasters, we hope this update gives you as much information as possible to help you debug your site and help ensure it is crawled and indexed as effectively as possible"

Matt Cutts the Google Guru added: "In the early days when Google had 200-300 people there was no way we could do everything we wanted to do. But as Google grows, we get more of a chance to "go back and fix things," to build the ideal search engine. And part of doing that is having more and better communication with webmasters. I believe the ideal search engine would help site owners debug and diagnose crawl problems, and the Sitemaps team has made great strides with that in Google's webmaster console. But I think the ideal search engine would also tell legitimate site owners when they risk not doing well in Google."

So the Google Sitemap XML console continues to grow, and the latest addition - Google Sitemap Crawl Rate Control - is a very subtle addition.

"We are testing an alpha version of our new tool with a small percentage of webmasters who use Sitemaps. You should leave this control at the Normal setting unless you are having trouble with the speed at which Googlebot is crawling your server.

Simply select the rate at which you would like the Googlebot to crawl your server and click save. During this stage of testing, we will evaluate requests to determine the best way of using this data and providing this tool to everyone."

Fastest - A faster crawl may enable Google to crawl more of your site's pages, but it may also use more network bandwidth and computational resources.

Normal - The recommended crawl rate for your site.

Slowest - A slower crawl will reduce the Googlebot's traffic on your server, but Google may not be able to crawl as many site pages.

Crawl Speed has always been a sticking point between Google and webmasters, and has much to do with crawl bandwidth usage and the time required for Google to wait before requesting a new page to Crawl. Controlling this was never going to be easy as crawling fast and hard can cause problems for sites and the effect is felt by the servers and their users. Crawling to slowly means Google may not be able to crawl all of your pages.

The Google Crawl Rate Control system has only been live a number of days, and we have slowly started to review a number of Crawl Rate Control experiments to see what's what. We will keep you posted on the Crawl Rate developments as we ride the crest of the Google Sitemap XML Webmaster Console Crawl Rate Control wave.
  • Print this page
  • Send this page to a friend
  • Digg
  • delicious
  • Reddit
  • Google
  • Twitter
  • Sphinn
  • StumbleUpon
  • YahooBuzz
  • Facebook
  • Mixx

MoreMore

LessLess

MoreMore

LessLess

MoreMore

LessLess
bigmouthmedia - the engine listings analysis experts
© bigmouthmedia 2010