Description
Pretrained pipeline for conducting Table Extraction on mixed scanned and digital PDF documents. It ensures precise and efficient table extraction from PDFs of various origins and formats by first detecting tables in the input documents and then extracting the table structure.
Predicted Entities
Live Demo Open in Colab Download –> Copy S3 URI
How to use
pipeline = PretrainedPipeline('digital_pdf_table_extractor', 'en', 'clinical/ocr')
pdf_path = '/content/pdfs/'
pdf_example_df = spark.read.format("binaryFile").load(pdf_path).cache()
result = pipeline.transform(pdf_example_df)
val pipeline = new PretrainedPipeline("digital_pdf_table_extractor", "en", "clinical/ocr")
val pdf_path = "/content/pdfs/"
val pdf_example_df = spark.read.format("binaryFile").load(pdf_path).cache()
val result = pipeline.transform(pdf_example_df)
Example
Model Information
Model Name: | digital_pdf_table_extractor |
Type: | ocr |
Compatibility: | Visual NLP 5.4.0+ |
License: | Licensed |
Edition: | Official |
Language: | en |