Pretrained Pipeline for Reading Printed Text with Image Documents

Description

Pretrained pipeline designed to extract printed text from document images. It empowers accurate and efficient conversion of printed content into digital text, making it an invaluable tool for text recognition tasks.

Predicted Entities

Live Demo Open in Colab Download

How to use

img_pipeline = PretrainedPipeline('image_printed_transformer_extraction', 'en', 'clinical/ocr')

img_path = '/content/images/'
img_example_df = spark.read.format("binaryFile").load(img_path).cache()

result = img_pipeline.transform(img_example_df)
val img_pipeline = new PretrainedPipeline("image_printed_transformer_extraction", "en", "clinical/ocr")

val img_path = "/content/images/"
val img_example_df = spark.read.format("binaryFile").load(img_path).cache()

val result = img_pipeline.transform(img_example_df)

Example

Input

Screenshot

Output

STARBUCKS Store #19208
11902 Euclid Avenue
Cleveland, OH (216) 229-U749

CHK 664250
12/07/2014 06:43 PM
112003. Drawers 2. Reg: 2

¥t Pep Mocha 4.5
Sbux Card 495
AMXARKERARANG 228
Subtotal $4.95
Total $4.95
Change Cue BO LOO
- Check Closed ~

"49/07/2014 06:43 py

oBUX Card «3228 New Balance: 37.45
Card is registertd

Model Information

Model Name: image_printed_transformer_extraction
Type: pipeline
Compatibility: Visual NLP 5.0.2+
License: Licensed
Edition: Official
Language: en