Webicy Blog

October 31, 2008

Google spiders to index Scanned Documents

Filed under: Google, Industry News — Tags: , , , — SticKer @ 6:32 am

Google has announced that its spiders will now begin including scanned documents in its search results. In the past, scanned documents were rarely included in search results. Google said they are now able to perform OCR on any scanned documents that they find stored in Adobe’s PDF format. This Optical Character Recognition (OCR) technology lets them convert a picture (of a thousand words) into a thousand words — words that can be searched and indexed, so that these valuable documents are more easily found.

In the past Google has indexed documents saved as PDF’s but scanned documents are a lot more difficult for a computer to read. From now on Google searches will include the text within these scanned images in normal search results. When you encounter a scanned document you’ll be able to view it in its original form as a PDF, or as a converted text file (click “View As HTML”).

To see the new system at work, click on search queries below. Note the document excerpt in the search results, along with the full text presented after the ‘View as HTML’ link:

[repairing aluminum wiring]
[spin lock performance]
[Mumps and Severe Neutropenia]
[Steady success in a volatile world]

Similar Posts:

If you enjoyed this post, make sure you subscribe to my RSS feed!

6 Comments »

  1. 1

    Google is everywhere!!!

    Comment by Commodity Trading Account (2 comments) — October 31, 2008 @ 5:02 pm

  2. 2

    This is a good development especially if it will appear as normal search results and when it can be viewed as a converted text file because there are really some instances when the files we are searching is uploaded as scanned documents.

    Comment by Nordvestjylland (1 comments) — November 2, 2008 @ 4:19 pm

  3. 3

    Hello,
    Nice post .Thanks for sharing the article.Lots of knowledge here.

    Comment by Complete Designing solutions (1 comments) — November 3, 2008 @ 4:28 am

  4. 4

    Great to know about this. It was a good news to everyone.Images could tell the exact meaning on what you want to deliver and those words are the back up to make it more understandable.

    Comment by horse racing results (1 comments) — November 25, 2008 @ 7:03 am

  5. 5

    and what about my privacy?

    Comment by SpadesCardGames (1 comments) — March 9, 2009 @ 8:34 pm

  6. 6

    This is really cool, I didn’t know that they could use OCR on scanned documents. I wonder if it also works with handwritten documents.

    Comment by Tampa Movers (1 comments) — June 26, 2009 @ 7:44 pm

RSS feed for comments on this post. TrackBack URI

Leave a comment

Powered by WordPress