Federal Agencies
Digitization Guidelines Initiative

Home > Glossary > O > OCR

Term: OCR

 “Search Glossary” button searches only the glossary. Temporary note: search not enabled for two- and three-character terms; browse by alphabet.
 “Search“ button at the top right of the page searches the Web site, not the glossary.

Suggest a term

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

Term: OCR


Optical Character Recognition (OCR) is a technology that allows dots or pixels representing machine generated characters in a raster image to be converted into digitally coded text. In addition to recognizing and coding text, OCR programs attempt to recognize and code the structural elements of a document page, such as columns and non-text graphical elements. Intelligent Character Recognition (ICR) is a related technology designed to recognize hand written characters.

OCR is generally part of a workflow that begins with the scanning documents. Scanned images may be further processed or "cleaned" (for example, see contrast stretching) prior to OCR to improve accuracy of the recognition process. Modern OCR applications are capable of producing multiple output formats such as ASCII, RTF, Microsoft Word or PDF. While some hardware applications for OCR exist, the vast majority of OCR is performed by software applications.

See also: