Optical character recognition (OCR) is technology that allows computer software to convert text found in a scanned document or image into machine-readable text.
Anyone who has ever been to the airport, sent a letter in the mail, or deposited a check at an ATM has used OCR technology.
The most common use of OCR technology is the extraction of printed or handwritten text from physical documents to be used and understood by computer software.
By converting image data into machine encoded text, scanned documents become significantly more functional, providing the user of the digital version with the ability to search, view, and edit it’s contents, retrieve information, and more.
For this reason, the popularity of OCR technology has grown immensely, and can frequently found in both professional and consumer grade scanning software.
OCR is an absolutely essential technology for anyone who wants to be able to digitize text-heavy documents, and make immediate use of the information they contain.
How does OCR work?
Optical character recognition software extracts text found in an image using a combination of computer vision, pattern recognition, and artificial intelligence.
For the sake of simplicity, we are going to looking at OCR relative to the document scanning services we provide, but the concepts are basically the same in any OCR application.
OCR allows us to convert paper documents into digital files that can be searched for by any text in the file, which can be edited in a word processing software, and accessed remotely from the cloud.
We follow a simple 4 part procedure to complete this process.
1. The scanning process
The first, and arguably most important part of the process is the initial scanning of the document. It is critical that the resulting image is an accurate representation of the original document, clear and free of any defects that could interfere with the OCR process.
Documents should be scanned in at the maximum resolution allowed, providing the OCR software with the best chance of accurately identifying the text.
Ideally, the scanner should be calibrated against a sample document, and in the case of bulk scanning, re-calibrated several times throughout the process.
2. Image Processing
In the next step, the OCR software will process the scanned image to facilitate the optimal conditions for character recognition.
First, the software will correct any alignment issues introduced during the scanning process, rotating the image to ensure the document is properly oriented.
Imperfections such as dust, stray marks, and digital artifacts are removed and edges are smoothed.
Next, the color information is discarded, and the contrast of the resulting grayscale image is increased, resulting in a high contrast black and white image ( referred to as binarization). This maximizes the separation between the foreground ( the text ) and the background, reducing the chance of misidentified characters.
3. Character Recognition.
It is during the character recognition process when the OCR software converts the text found in the document into its machine language equivalent.
First, the document is analyzed for layout, identifying the locations of text blocks and paragraphs.Then, each location is broken down further by line and then individual words.
Finally, each individual character is isolated ( called “segmentation”) to be translated.
In simple OCR applications, the raw pixel data of each character is compared directly against a database of known alphanumeric shapes to identify the closest match.
However, most modern OCR applications generally use one of two methods for character identification:
- Pattern recognition: Pattern recognition works by analyzing each character as a whole, comparing it against a matrix of characters stored within the software. The drawback of this method is that it relies on the input characters and the stored characters being a similar shape and scale.
- Feature extraction: Feature extraction is a more sophisticated and versatile method of character recognition that more closely emulates the way the human mind processes text. An algorithm breaks down each character into its individual features, identifying straight lines, curves, angles and intersections. Then, it matches the presence of these physical features with the corresponding letter.The advantage of this method is that it does not rely on a particular font or set of fonts for identification.
After each character has been identified, the resulting text is cross referenced against internal dictionaries and known lexicons to improve the overall accuracy of the final output.
Using near neighbor analysis, OCR software looks for letters and words that are commonly seen together, and uses those “rules” in order to identify errors and make corrections.
For example, common digraphs (a pair of letters representing a single speech sound) including “qu”, “ea”, and “ch” can be reliably corrected when a misidentification occurs based on these guidelines.
What are the different types of OCR?
There are a few different types of OCR technology out in the wild. A few examples of OCR you may encounter include:
- Simple optical character recognition technology converts printed text into machine readable characters by analyzing a database of font and text images, which can be used by pattern matching algorithms to identify individual characters from within a document.
- Zonal Recognition is a type of OCR typically used to extract information from within a document by scanning specific areas of a document, often useful when digitizing forms
- Optical mark recognition can be used to process information marked in fields, on surveys or tests. It can also be used to identify logos, watermarks, and other symbols that may appear within a document.
- Barcode readers use lasers to extract information stored within a barcode, whereas barcode OCR extracts this information from within a digital image of a barcode. This automates the process of extracting barcode from documents during the digitization process.
- Intelligent word recognition can be used on its own or in combination with other types of OCR to recognize entire words from within a document. This can add to the accuracy of the OCR process, as characters can be identified and verified based on the context in which they appear.
How accurate is OCR?
For all intents and purposes, OCR is an extremely accurate and efficient way to digitize text.
Most professional applications are able to recognize characters with 98-99% accuracy. For example, a 2,000 character document analyzed by OCR could contain anywhere from 20 – 40 misidentified characters.
For this reason, it is important to proofread and correct OCR output, especially in cases where a highly accurate transcription is required.
How is the accuracy of OCR measured?
In order to measure the accuracy of a particular OCR solution, a comparison needs to be made between the original document and the digitized output. Each error needs to be documented and tallied in one of two methods:
- Character-level accuracy : (Total correctly identified characters / Total characters scanned ) * 100
- Word-level accuracy (Total correctly identified words / Total words ) * 100
What are the benefits of OCR?
Optical character recognition is the fastest, cheapest, and most efficient method of digitizing the content contained in a physical paper document. OCR technology provides a variety of benefits including:
- Enhancing the efficiency of manual data entry
- Reducing both operational and labor costs
- Saving space
- Enabling the automation of time consuming business processes
- Reducing the probability of filing errors
- Ease of management for manually completed forms
- Automatic translation of documents into other languages
- Works with read-only documents
- Improved customer service and communication
- Improve data security
Enhance the efficiency of manual data entry
OCR can be used to supplement and support an existing data entry team by eliminating the initial effort of manually keying information into systems.
OCR saves businesses a significant amount of time and money that would otherwise be spent on manual data entry. What might take a data entry specialist an hour to transcribe, OCR can do near instantaneously.
OCR also reduces operational costs. By converting paper documents into text searchable digital files, employees are able to easily locate the information they need, creating more efficient workflows and improving turnaround times.
OCR eliminates the need for storing large volumes of paper documents. Digital storage is cheaper and more efficient, helping businesses free up valuable office space for more important purposes.
Automate business processes
Businesses that consistently process large volumes of paper documents can save substantial time and resources with OCR enabled document scanning services. OCR allows important data to be extracted from documents during the scanning processes, which can be easily transferred to relevant systems, enabling businesses to implement super efficient automated workflows.
Reduce filing errors
OCR can be used to automatically tag and categorize documents, reducing the probability of lost or misplaced/misfiled documents.
Manage manually completed forms
Businesses that utilize handwritten forms and questionnaires can leverage OCR to instantly convert customer responses into searchable, actionable data.
OCR processed documents can be easily translated to another language, either by the OCR software itself, or with automated text-translation tools such as Google Translate.
Convert read-only documents
OCR can be used to convert any non-editable digital document into editable digital text, not just scanned documents.
Improve Customer Service
OCR can help to facilitate better communication with customers, efficient data retrieval, and better organization, allowing businesses to quickly respond to customer inquiries.
Enhance Data Security
Paper is an incredibly insecure way to store important data. This is because paper documents are susceptible to theft, loss, and damage. Extracting data from paper documents via OCR allows you to store important data digitally, enabling enhanced access controls, data encryption, and automated backup and recovery.
What are the limitations of OCR?
While OCR provides numerous advantages over manual data entry, it also has a few important drawbacks that should be noted, including:
- Structuring the Data Involves More than Just OCR.
- OCR only works well with high quality scans
- Specialized software is needed for handwritten content
- Proofreading is almost always required
- OCR can have difficulty with complicated images
OCR doesn’t help structure important data
While OCR is excellent at digitizing written text, it has no ability to actually understand it at the macro level. Documents processed with OCR still need to be tagged, categorized, and organized by some other manual process to become fully useful for professional purposes.
OCR only works well with high quality scans.
In order for OCR to properly recognize the text in a document, the original image created during the scanning process should be as clear as possible.
To improve the chance of success, It is important to ensure that documents are free of smudges, blurred text, or marks that could lead to errors during the scanning process.
The scanner should be properly calibrated against a sample taken from the source material, and should be checked periodically throughout the scanning process to ensure the optimal digital output.
The resulting image should be saved at a high resolution, ideally 150DPI (dots per inch) or more, with a high text to background contrast ratio.
Specialized software is needed for handwritten content
Traditional OCR is built on the principle of “studying” predefined fonts and symbols enough to identify individual letters from similarly shaped text.
While early OCR systems were generally only capable of recognizing a single font, most modern systems are able to apply a basic set of rules that allows the software to match characters from almost any standard serif or non-serif character set.
However, this does not apply to handwritten text.
Handwriting presents a number of challenges for OCR software as there is significantly more variation when it comes to handwriting when compared to printed text. While there is software capable of digitizing handwritten text, anyone who requires extremely accurate transcription will find that manual review is going to be required.
Proofreading is almost always required
While the output of OCR software is usually pretty accurate, it’s important not to rely on it for critical data. Each document processed through OCR software needs to be carefully reviewed for errors, and manually corrected before the data can be fully trusted.
For example, in a situation in which a scanned invoice is processed by OCR, relying on an invoice amount that contained an uncaught error could result in inaccurate records or charges.
OCR can have difficulty with complicated images
Text positioned over an intricate background can make it difficult for OCR software to properly isolate individual characters, resulting in inaccurate results.
What are common use cases for OCR technology?
OCR has many practical uses for both businesses and consumers alike. From a practical standpoint, OCR can be used as a stand-in wherever manual data entry is completed, automating the process of extracting important data from a set of printed documents into the electronic system where the information will be stored.
Some of the most common uses cases for OCR are travel, banking, healthcare, and government.
OCR in Travel
OCR is used heavily in the travel industry to help to provide a more seamless and convenient customer experience. Airports, train stations, and subways all leverage OCR technology for both security and data storage purposes.
OCR reduces the time consuming processes involved with manually entering customer details, looking up long ticket or order ID numbers, and sorting baggage.
OCR in Banking
The banking industry is one of the largest consumers of OCR technology. OCR not only helps banks enhance customer experience, it also reduces manual data entry.
Optical character recognition technology is used by ATMs to verify deposited checks, scanning and extracting handwritten amount information and confirming the presence of a valid signature. OCR is often implemented in mobile banking applications as well, enabling customers to deposit checks simply by uploading an image.
OCR in Healthcare
OCR technology has been extremely beneficial for the healthcare industry, enabling healthcare providers and medical professionals to more easily process and store data.
It’s not uncommon for medical offices to have to deal with a ton of physical documents, such as customer intake forms, handwritten doctor’s records, invoices, receipts, and more. OCR helps to reduce the manual labor of moving this data into the relevant systems, improving both customer service and quality of care.
OCR in Government
Government agencies are one of the largest sources of paper-based data. OCR technology allows these agencies to modernize their record systems, combining the convenience of paper with the security and efficiency of digital data storage.
OCR has many practical applications for government agencies. Critical information stored inside large historical paper archives can be extracted and stored digitally, reducing unnecessary paper storage expenses. OCR also enables the government to provide convenient service options to their customers.