Pretrained Pipeline for Reading Handwritten Text with PDF Documents

Description

This is a pretrained pipeline designed to extract handwritten text from document PDFs. It leverages advanced text recognition techniques to accurately convert handwritten content into digital text, enabling efficient transcription of handwritten notes, forms, or documents. The model is specifically optimized to handle the unique challenges posed by handwritten text, ensuring high accuracy and minimal errors in the conversion process.

Predicted Entities

Live Demo Open in Colab Download

How to use

pdf_pipeline = PretrainedPipeline('pdf_handwritten_transformer_extraction', 'en', 'clinical/ocr')

pdf_path = '/content/pdfs/'
pdf_example_df = spark.read.format("binaryFile").load(pdf_path).cache()

result = pdf_pipeline.transform(pdf_example_df)
val pdf_pipeline = new PretrainedPipeline("pdf_handwritten_transformer_extraction", "en", "clinical/ocr")

val pdf_path = "/content/pdfs/"
val pdf_example_df = spark.read.format("binaryFile").load(pdf_path).cache()

val result = pdf_pipeline.transform(pdf_example_df)

Example

Input

Screenshot

Output

"This is an example of handwritten
text .
Let's # check the performance !
I hope it will be awesome ."

Model Information

Model Name: pdf_handwritten_transformer_extraction
Type: pipeline
Compatibility: Visual NLP 5.0.2+
License: Licensed
Edition: Official
Language: en