Visual Document Understanding - Visual NLP Demos & Notebooks

Run 300+ live demos and notebooks
Demos Categories

Visual Document Understanding - Live Demos & Notebooks

Visual Document Classification
Classify documents using text and layout data with the new features offered by Spark OCR. (...)
Extract Data from FoundationOne Sequencing Reports
Extract patient, genomic and biomarker information from FoundationOne Sequencing Reports. (...)
Recognize entities in scanned PDFs
End-to-end example of regular NER pipeline: import scanned images from cloud storage, preprocess them for improving their quality, recognize text using Spark OCR, correct the spelling mistakes for improving OCR results and finally run NER for extracting entities. (...)
Extract brands from visual documents
This demo shows how brands from image can be detected using Spark OCR. (...)
Visual NER Key-Values v2
This demo extract the main document key points using our pre-trained Spark OCR model. (...)
Visual Question Answering
This demo allows Inferring the answer from a given image and a text-based question by using our pre-trained Spark OCR models. (...)
Chart to Text
Obtain a description of the charts in the image input document by using our Spark OCR model. (...)
Chart to Text powered by LLM
Obtain a deeper interpretation of the charts in the image input document by using our Spark OCR model powered by LLM. (...)
Infographic Visual Question Answering
Infer the answer from a given infographic related image and a text-based question by using our pre-trained Spark OCR model. (...)
Checkbox Detection
This model detects and classifies checkboxes in document images using our pre-trained Spark OCR model. (...)
Deidentify DICOM documents
Deidentify DICOM documents by masking PHI information on the image and by either masking or obfuscating PHI from the metadata. (...)
Deidentify Images
Deidentify images by masking sensitive information on the image and by either masking or obfuscating. (...)
De-identify PDF documents - GDPR Compliance
Deidentify PDF documents using GDPR guidelines by anonymizing PHI information using out of the box Spark NLP models. (...)
De-identify PDF documents - HIPAA Compliance
Deidentify PDF documents using HIPAA guidelines by masking PHI information using out of the box Spark NLP models. (...)
Image Classifier in Document Images
This model classifies document images using our pre-trained Spark OCR model. (...)
Document Layout Analysis
Identify and structure the visual elements in a document by using our pre-trained Spark OCR models. (...)
Chart to Text
Obtain a deeper interpretation of the charts in the PDF input document by using our Spark OCR model powered by LLM. (...)
HOCR Table Structure Recognition in Document Images
This model obtains the table structure of documents images using our HOCR pre-trained Spark OCR model. (...)