Pretrained Pipeline for Table Structure Extraction

Description

Pretrained pipeline for conducting Table Structure Extraction on mixed scanned and digital PDF documents. It ensures precise and efficient table structure extraction from PDFs of various origins and formats.

Predicted Entities

Live Demo Open in Colab Download Copy S3 URI

How to use

pipeline = PretrainedPipeline('basic_table_extractor', 'en', 'clinical/ocr')

pdf_path = '/content/pdfs/'
pdf_example_df = spark.read.format("binaryFile").load(pdf_path).cache()

result = pipeline.transform(pdf_example_df)
val pipeline = new PretrainedPipeline("basic_table_extractor", "en", "clinical/ocr")

val pdf_path = "/content/pdfs/"
val pdf_example_df = spark.read.format("binaryFile").load(pdf_path).cache()

val result = pipeline.transform(pdf_example_df)

Example

Input image

Screenshot

Output image

Screenshot

Model Information

Model Name: basic_table_extractor
Type: ocr
Compatibility: Visual NLP 5.4.0+
License: Licensed
Edition: Official
Language: en