Description
This is a pretrained pipeline designed to extract printed text from document images, enabling the accurate and efficient conversion of printed content into machine-readable digital text. The model leverages advanced Optical Character Recognition (OCR) techniques to recognize and transcribe text from a wide range of document types, including printed books, forms, and scanned papers.
Predicted Entities
Live Demo Open in Colab Download
How to use
img_pipeline = PretrainedPipeline('image_printed_transformer_extraction', 'en', 'clinical/ocr')
img_path = '/content/images/'
img_example_df = spark.read.format("binaryFile").load(img_path).cache()
result = img_pipeline.transform(img_example_df)
val img_pipeline = new PretrainedPipeline("image_printed_transformer_extraction", "en", "clinical/ocr")
val img_path = "/content/images/"
val img_example_df = spark.read.format("binaryFile").load(img_path).cache()
val result = img_pipeline.transform(img_example_df)
Example
Input
Output
STARBUCKS Store #19208
11902 Euclid Avenue
Cleveland, OH (216) 229-U749
CHK 664250
12/07/2014 06:43 PM
112003. Drawers 2. Reg: 2
¥t Pep Mocha 4.5
Sbux Card 495
AMXARKERARANG 228
Subtotal $4.95
Total $4.95
Change Cue BO LOO
- Check Closed ~
"49/07/2014 06:43 py
oBUX Card «3228 New Balance: 37.45
Card is registertd
Model Information
Model Name: | image_printed_transformer_extraction |
Type: | pipeline |
Compatibility: | Visual NLP 5.0.2+ |
License: | Licensed |
Edition: | Official |
Language: | en |