Description
Pretrained pipeline designed to extract printed text from document PDFs. It empowers accurate and efficient conversion of printed content into digital text, making it an invaluable tool for text recognition tasks.
Predicted Entities
Live Demo Open in Colab Download
How to use
pdf_pipeline = PretrainedPipeline('pdf_printed_transformer_extraction', 'en', 'clinical/ocr')
pdf_path = '/content/pdfs/'
pdf_example_df = spark.read.format("binaryFile").load(pdf_path).cache()
result = pdf_pipeline.transform(pdf_example_df)
val pdf_pipeline = new PretrainedPipeline("pdf_printed_transformer_extraction", "en", "clinical/ocr")
val pdf_path = "/content/pdfs/"
val pdf_example_df = spark.read.format("binaryFile").load(pdf_path).cache()
val result = pdf_pipeline.transform(pdf_example_df)
Example
Input
Output
STARBUCKS Store #19208
11902 Euclid Avenue
Cleveland, OH (216) 229-U749
CHK 664250
12/07/2014 06:43 PM
112003. Drawers 2. Reg: 2
¥t Pep Mocha 4.5
Sbux Card 495
AMXARKERARANG 228
Subtotal $4.95
Total $4.95
Change Cue BO LOO
- Check Closed ~
"49/07/2014 06:43 py
oBUX Card «3228 New Balance: 37.45
Card is registertd
Model Information
Model Name: | pdf_printed_transformer_extraction |
Type: | pipeline |
Compatibility: | Visual NLP 5.0.2+ |
License: | Licensed |
Edition: | Official |
Language: | en |