INFOGRAPHIC: 300dpi vs 200dpi – Resolution Is Important For OCR Success

By Optiform Blogger, October 11, 2016

BACK TO ALL BLOG POSTS

What is the best resolution for document scanning?

In a recent call with a prospective client, we learned that the gentleman had been scanning documents into his OCR platform in 200dpi resolution based on a recommendation that he received from someone in the industry. When we brought up the recommendation (or requirement rather) that he would need to use a document scanner that would produce 300dpi resolution images, he asked us to educate him on the pros and cons of 200dpi versus 300dpi. We saw this as the perfect opportunity to turn what we shared with him as a new blog post for you.

Highlighted Text

This gentleman said that his company receives a lot of documents that have highlighted text on them. He said they actually used to use 300dpi images, but his scanner was not recognizing text that had been highlighted anymore. In fact, not only was it not recognizing the highlighted text, but it was now displaying the text in the image not as if it had been highlighted, but rather crossed out with a black permanent marker. When that happens, the OCR engine, ABBYY FlexiCapture in our infographic examples below, begins looking for the next possible field it thinks it should be recognizing to populate the corresponding field. You can imagine what this would do for your verification operators and the overall accuracy of your scans. They may as well be manually entering data again, right?

Resolution and Color vs Black & White

We speak with many professionals that are doing the discovery required for their company to implement a data capture solution to save time, money, and improve data entry accuracy. The number of questions that come from scanning typically end up pointing back to two ultimate topics:

    1. In which resolution should I be scanning my documents to have the best possible OCR results for my projects?
    2. Can I scan color documents or do I have to scan black and white only?

Resolution

As you have probably gathered from the above scenario, using 300dpi resolution images is the best choice when using a scanner for document processing. ABBYY FlexiCapture software comes with built-in image cleanup, deskewing, despeckling, and orientation correction, but to set yourself up for success, start with 300dpi images. If you choose to scan in 200dpi because for some reason you can’t produce 300dpi or you just choose to go against industry recommended standards, just know that the OCR software image cleanup will be adding extra pixels to the 200dpi characters to make them appear larger and while it can at times result in successful OCR, a lot of times what will happen is characters such as the letter “B” can be confused as the number “8”, and so on. Make less work for yourself by just going with 300dpi to start.

Color or Black & White

Again, this comes back to the resolution in which you will be scanning. If you are importing 300dpi images into your OCR software, documents with color will typically scan just fine based on the built-in image cleanup features. We put together an infographic to represent the scenarios we have mentioned in this article.

Infographic details:

Example 1: This particular document had the name highlighted in yellow. The combination of the 300dpi scan and the software’s image cleanup made this scan a success. It stripped out the highlighter and imported the full name, without error, into the appropriate field.

Example 2: We also included a 200dpi scan in greyscale so you could see how the image would scan. You’ll see that the highlighter was perceived as a black marker trying to cross out the name versus highlighting the name. The OCR engine skipped right past the name field and went down to the zip code and phone number fields instead.

Example 3: And finally, see the infographic below for a demonstration on what a color documents looks like being scanned in 200dpi. The document that we used as an example produced results that were only 56% accurate.

p.s. We also liked this article with detail on why OCR at 300dpi is a standard. 

See the differences: 300dpi vs 200dpi

(click to see the full-sized image)

300-vs-200dpi-ocr

 

 

 

Position your team for success

Optiform offers a broad spectrum of data processing solutions customized to fit any business need. From human resource departments to the information produced through clinical trials, Optiform has a long history of maximizing the efficiency of all its clients. Optiform is here to provide  personalized support and recommendations both pre and post project, to ensure your long-term satisfaction and paper-free results. Contact us today to set up a personalized one-on-one consultation for software and business process recommendations. Already a valued Optiform client? Our customer support is ready to help.