Extract Text from Documents - Live Demos & Notebooks
PDF to Text
Extract text from generated/selectable PDF documents and keep the original structure of the document by using our out-of-the-box Spark OCR library. (...)
DICOM to Text
Recognize text from DICOM format documents. This feature explores both to the text on the image and to the text from the metadata file. (...)
Image to Text
Recognize text in images and scanned PDF documents by using our out-of-the-box Spark OCR library. (...)
DOCX to Text
Extract text from Word documents with Spark OCR (...)
Extract text from Powerpoint slides
This demo shows how PPTX texts can be extracted using Spark OCR. (...)
Detect Text in Document Images
This demo detects text in documents using our pre-trained Spark OCR model. (...)
Recognize Printed
This demo includes details about how to recognize printed information in documents using our pre-trained Spark OCR models. (...)
Detect Text in Document Images
This model detects text in documents using our pre-trained Spark OCR model. (...)
Pretrained pipeline for reading on printed documents
Pretrained pipeline based on our pre-trained Spark OCR models, pipeline for doing transformer based OCR on printed texts. It ensures precise and efficient text extraction from printed images of various origins and formats, improving the overall OCR accuracy. (...)
Pretrained pipeline for reading and removing noise on mixed scanned and digital PDF documents
Pretrained pipeline based on our pre-trained Spark OCR models, pipeline for doing transformer based OCR on printed texts. It ensures precise and efficient text extraction from printed images of various origins and formats, improving the overall OCR accuracy. (...)
Pretrained pipeline for reading on printed PDF documents
Pretrained pipeline based on our pre-trained Spark OCR models, pipeline for doing transformer based OCR on printed texts. It ensures precise and efficient text extraction from printed pdfs of various origins and formats, improving the overall OCR accuracy. (...)
Pretrained pipeline for reading on mixed scanned and digital PDF documents
Pretrained pipeline based on our pre-trained Spark OCR models, for conducting Optical Character Recognition (OCR) on mixed scanned and digital PDF documents. It ensures precise and efficient text extraction from PDFs of various origins and formats, improving the overall OCR accuracy. (...)