OCR Xpress delivers fast and accurate full-page optical character recognition (OCR) to software developers in .NET and ActiveX COM toolkits. Use OCR Xpress to add full-page text recognition, auto rotate, and searchable document creation to your application. This software development kit (SDK) also supports deskew, binarization, character position information, and segmentation of documents into image and text elements. It supports output to multiple text and text-plus-image formats including Microsoft® Word®-compatible RTF files and standard Adobe® PDF files.
Recognize text in thirteen languages: English, French, German, Italian, Spanish, Portuguese, Danish, Dutch, Swedish, Norwegian, Hungarian, Polish, and Finnish. OCR Xpress provides a dictionary for each language, and also supports a user-defined dictionary for words that are application-specific.
The auto rotate feature in OCR Xpress detects the correct orientation of the text in an image, and rotates the entire page accordingly. It can also deskew documents that become skewed during the scanning process.
Character position information allows users of OCR Xpress to redact or highlight text in the original image using the included NotateXpress component. Users can also build their own PDF files, using the position information to place the hidden text in the correct location. With the help of reported recognition confidence for each character, OCR Xpress can also be used in conjunction with other OCR engines such as SmartZone to perform voting, thereby improving resulting recognition accuracy.
OCR Xpress flags characters recognized with low confidence, allowing developers to easily build text proofing and character replacement functions into their applications. This enables users to review and make corrections to text prior to output.
OCR Xpress includes advanced segmentation to locate regions of the input image and identify them as either images (whose color can be preserved) or areas containing recognizable text. The various regions can be accessed for individualized processing, or automatically recombined into fully-formatted documents. The binarization function can convert color to black and white documents to improve recognition without affecting non-text regions, which may be retained in full color for reinsertion into the output document.
OCR Xpress complements the Pegasus Imaging product line by offering full-page OCR, auto rotate, and searchable text output capabilities. Pegasus Imaging’s SmartZone product is recommended for recognition of English-language text in zones on structured forms (zonal OCR). OCR Xpress can also be used for European-language recognition in zonal OCR applications.
Included Components
Both editions of OCR Xpress use the same set of .NET controls, and COM controls. Access to specific functions is determined by the edition.
OCR Xpress Professional - Includes the OCR Xpress v1 component, plus ImagXpress Document v8, NotateXpress v8, ThumbnailXpress v1, TwainPRO v4, and
PrintPRO v3 components.
OCR Xpress Standard - All features of OCR Xpress Professional except for PDF output.