SEO Technology
Originally, the first commercial
search engines were
directories, such as
Yahoo! and
Galaxy, and as such, site technology of the sites in their index was (by and large) not a real issue, except for aesthetic and site quality considerations. However, with the introduction of the major early
spider based
search properties, such as
Lycos,
AltaVista and
Inktomi, the ability of
"robots" to investigate websites became a major consideration.
Robots, which are colloquially known as
spiders, are pieces of software used by some search engines to investigate the content of websites, and then present their findings to the
search engine database. Search results are then ranked according to an algorithm that attaches certain priorities to aspects of the database, and orders the sites in the search engine results pages as an effect of this.
However, there are many aspects of site coding that may present barriers to search engine
robots. Many of the
robots were programmed at around the time of the first few
search engines, and so their reference to HTML is in many ways stuck in the mid-late 1990's. Many have difficulty parsing HTML that is taken for granted by
webmasters, such as framesets, embedded tables, image links and maps, and JavaScript/dHTML. Although some robots have evolved well, Googlebot (the
robot used by Google) being a notable example of a robot that moves well with the times, there are as many that have not really changed in the six years or so that they have been in existence. This means that an
SEO company must have a full understanding of how intricate site coding may present barriers to search engine
robots, and also of how site coding may present opportunities to improve rank by simple technical changes to the site.
However, it should be remembered that site coding should not be abused to artificially inflate rank, as again this will be considered
spam by search engines, and may cause the site to be penalised or barred by those
search engines.
One complex, and fairly crafty way of fooling search engines is by using a technique that is commonly called
cloaking. This technique involves recognising site visitors by their user agent (browser or robot name) or by their
IP address. This allows you to present pages specifically optimised for specific search engines, meaning that each engine can be optimised for individually. In principle this sounds like quite a good idea, but it is obviously open for abuse, and has been banned by many search engines as a result.
Google, for example, takes the line that what its spider indexes, must be what the users of your site will see. IP and
user agent delivery has been used in the past to fool robots into thinking they were indexing popular sites such as Hotmail or Microsoft, where in fact they were indexing pornographic sites. This is obviously not in the interests of search engines, and it is easy to see their point of view.
Robots Exclusion
This article was first published on 03 June 2002 and does not necessarily match current events or the current opinions and views of bigmouthmedia ltd.