Pretrained Pipeline for Table Extraction

Description

Pretrained pipeline for conducting Table Extraction on mixed scanned and digital PDF documents. It ensures precise and efficient table extraction from PDFs of various origins and formats by first detecting tables in the input documents and then extracting the table structure.

Predicted Entities

Live Demo Open in Colab Download –> Copy S3 URI

How to use

pipeline = PretrainedPipeline('digital_pdf_table_extractor', 'en', 'clinical/ocr')

pdf_path = '/content/pdfs/'
pdf_example_df = spark.read.format("binaryFile").load(pdf_path).cache()

result = pipeline.transform(pdf_example_df)
val pipeline = new PretrainedPipeline("digital_pdf_table_extractor", "en", "clinical/ocr")

val pdf_path = "/content/pdfs/"
val pdf_example_df = spark.read.format("binaryFile").load(pdf_path).cache()

val result = pipeline.transform(pdf_example_df)

Example

Input image

Screenshot

Output image

Screenshot

Model Information

Model Name: digital_pdf_table_extractor
Type: ocr
Compatibility: Visual NLP 5.4.0+
License: Licensed
Edition: Official
Language: en