Google has announced that its spiders will now begin including scanned documents in its search results. In the past, scanned documents were rarely included in search results. Google said they are now able to perform OCR on any scanned documents that they find stored in Adobe’s PDF format. This Optical Character Recognition (OCR) technology lets them convert a picture (of a thousand words) into a thousand words — words that can be searched and indexed, so that these valuable documents are more easily found.
In the past Google has indexed documents saved as PDF’s but scanned documents are a lot more difficult for a computer to read. From now on Google searches will include the text within these scanned images in normal search results. When you encounter a scanned document you’ll be able to view it in its original form as a PDF, or as a converted text file (click “View As HTML”).
To see the new system at work, click on search queries below. Note the document excerpt in the search results, along with the full text presented after the ‘View as HTML’ link:
[repairing aluminum wiring]
[spin lock performance]
[Mumps and Severe Neutropenia]
[Steady success in a volatile world]
If you enjoyed this post, make sure you subscribe to my RSS feed!