23 October 2008Crawling efficiency - make it easier for engines to search
Search engines have a hard time crawling the entire web - it's pretty large playing field after all - so making your site as easy to crawl as possible can provide you with a definite advantage.
A
search engine like
Google, MSN or
Yahoo! will typically crawl a one year old website every few days to check if there have been any changes or new content added since its last visit. These frequent crawling times came about because users wanted fresher content readily available in
search engines indexes and they mean that your site could be visited multiple times in a week.
Therefore, one of the key factors of organic search is crawling efficiency; if your website is not efficient for crawling, a search engine may not be able to or simply choose not to crawl all of your content. This could then result in valuable parts of your website not getting indexed (and we can't have that now can we?).
However, you'll be pleased to know that there are a number of solutions to aid crawling efficiency. Two of the most popular are:
- rel="nofollow" attribute - A microformat used by Google to prevent the crawling or passing of PageRank value to the destination page.
Example use of rel="nofollow" - You have a shopping website and in the "electronics" category at the foot of the page you use pagination to allow the user to view the next page of content e.g. Page 1, 2, 3, 4. Often where pagination is used, page 1 is the same page as the category or landing page, but due to website technologies, both pages reside on different URLs' e.g. /category/electronics/ and /category/electronics/page-1/ - rather than having Google crawl and index both URL's which are in fact duplicates of one another, add the rel="nofollow" attribute to the page 1 link to prevent Google crawling or passing any value to the page increasing the crawling efficiency of the website. This technique is also known with the digital world as "PageRank sculpting".
- Sitemap XML - Is an XML formatted file that allows a website to tell supported search engines about all the URL's that exist within the site.
Example use of sitemap XML - You have a large extensive website that has thousands of pages. By generating and allowing search engines to then download and crawl a list of all the URL's will aid the chance the entire website will become indexed. The Sitemap Protocol also allows you to set additional attributes to pages e.g. "changefreq", "priority" and "lastmod". By providing such details, you can help search engines know what areas of your website are important to you. This may then impact how they treat and index URL's.
But these aren't the only options - other areas of search engine marketing can impact the crawling efficiency of your website too. For example, when trying to improve the usability of your website, you might use an analytics package to track particular user routes or click-through areas on your site. This requires you to append URL's with particular tracking codes. By appending these tracking codes on URL's, you then provide an alternative route for a search engine crawler to find and index your content. This can then often generate duplicate versions of the same URL and therefore decrease the crawling efficiency of your website. Instead of appending URL's with tracking codes, you could investigate alternative solutions e.g. the ability to append the links e.g.
with a form of onclick JavaScript. A method that works towards the same end but allows you to maintain the crawling efficiency of your site in the eyes of search engines.
Sophisticated search engines like Google use a wide variety of heuristic techniques to analyse pages, but that doesn't mean you shouldn't make your pages as crawl efficient as possible - after all, what you really want is for people to see them!