Extract Text from Documents - Visual NLP Demos & Notebooks

Run 300+ live demos and notebooks

Demos Categories

Spark NLP: EnglishFree

Spark NLP: World LanguagesFree

Clinical NLP

Voice of Patients

Medical NLP: World Languages

Medical Large Language Models

Biomedical NLP

Visual NLP

Finance NLP

Legal NLP

Extract Text from Documents - Live Demos & Notebooks

PDF to Text

Extract text from generated/selectable PDF documents and keep the original structure of the document by using our out-of-the-box Spark OCR library. (...)

DICOM to Text

Recognize text from DICOM format documents. This feature explores both to the text on the image and to the text from the metadata file. (...)

Image to Text

Recognize text in images and scanned PDF documents by using our out-of-the-box Spark OCR library. (...)

DOCX to Text

Extract text from Word documents with Spark OCR (...)

Extract text from Powerpoint slides

This demo shows how PPTX texts can be extracted using Spark OCR. (...)

Detect Text in Document Images

This demo detects text in documents using our pre-trained Spark OCR model. (...)

Recognize Printed

This demo includes details about how to recognize printed information in documents using our pre-trained Spark OCR models. (...)

Detect Text in Document Images

This model detects text in documents using our pre-trained Spark OCR model. (...)

Pretrained pipeline for reading on printed documents

Pretrained pipeline based on our pre-trained Spark OCR models, pipeline for doing transformer based OCR on printed texts. It ensures precise and efficient text extraction from printed images of various origins and formats, improving the overall OCR accuracy. (...)

Pretrained pipeline for reading and removing noise on mixed scanned and digital PDF documents

Pretrained pipeline for reading on printed PDF documents

Pretrained pipeline based on our pre-trained Spark OCR models, pipeline for doing transformer based OCR on printed texts. It ensures precise and efficient text extraction from printed pdfs of various origins and formats, improving the overall OCR accuracy. (...)

Pretrained pipeline for reading on mixed scanned and digital PDF documents

Pretrained pipeline based on our pre-trained Spark OCR models, for conducting Optical Character Recognition (OCR) on mixed scanned and digital PDF documents. It ensures precise and efficient text extraction from PDFs of various origins and formats, improving the overall OCR accuracy. (...)