The obscured text systems that function as a handy anti-spam tool are about to become a little bit more complicated for unethical types to fool. Most internet users will be familiar with obscured text systems, the anti-spam tools that operate by asking the visitor to transcribe specific obscured words or characters on their screen - this allows the site to verify that it's a person trying to gain access. CAPTCHAS (Completely Automated Public Turing test to tell Computers and Humans Apart) is the name given to describe the systems.
CAPTCHAS were initially designed as a type of challenge response test to ensure that responses given are not generated by the computer, however, they have since become more commonly used to prevent spammers from exploiting websites, which has included sending out junk mail or attempting to obtain individual email addresses.
For the most part random words have been used as the format of text required to be transcribed, however, many sites are now intending to replace these random words with text scanned and captured from old books and documents, with the help from ReCAPTCHA, a project created by Luis von Ahn at Carnegie Mellon University in Pittsburgh, the BBC reports.
The words that free anti-bot service ReCAPTCHA captures from texts and documents are ones that have been marked as unreadable by computers using optical character reading software - it is able to pick up 20% of words from documents where either the ink has faded or the paper quality has significantly deteriorated. These words are then shared out to the websites who have signed up to use ReCAPTCHA, with a control word also provided to ensure that it is a human response that is being received.
The project has been described as almost completely accurate by the ReCAPTCHA team and a colossal number of websites have already signed up to use the system. In fact, such CAPTCHA schemes are used up to 100 million times a day by websites in attempts to deter scammers.
The ReCAPTCHA team have recently digitalised the entire archive of the New York Times from 1908 - so you can be sure they'll never be short of words, despite the fast take-up speeds of the new system.
















