Google has announced that scanned documents will now be indexed thanks to the use of Optical Character Recognition (OCR) technology. This comes in the same week that Google's book-scan lawsuit was settled, meaning that - pending approval from a US federal court judge in Manhattan - the search engine specialist will be able to continue the Book Search programme it started four years ago.
The acceptance of scanned documents was revealed on the official Google blog this week. Previously, scanned documents were indexed if they were converted to PDFs with text, which meant they were rarely included in searches; when they were, it was solely because of their meta data rather than their content.
The stumbling block came about because scanned documents are generally only a picture of text, which is easily identifiable to a human reading it, but poses problems for computers. For example, should a circular character be a zero, a letter O, or merely a circle shape? It might even be something to be ignored, such as a stain or wrinkle that has accidentally been scanned in the document. Whilst some pieces of software have been able to do this accurately over the past decade, it has been near impossible to implement this on all of the PDFs that can be found by Google.
There have been a couple of other changes to Google this week too. A recent line has been added to the homepage advertising the new G1 phone, the first handset to be powered by the Google Android software.

Its robots.txt page has also been temporarily updated for Halloween to ward off any online zombie invasions.


















