Description
Pretrained pipeline for conducting Table Structure Extraction on mixed scanned and digital PDF documents. It ensures precise and efficient table structure extraction from PDFs of various origins and formats.
Predicted Entities
Live Demo Open in Colab Download Copy S3 URI
How to use
pipeline = PretrainedPipeline('basic_table_extractor', 'en', 'clinical/ocr')
pdf_path = '/content/pdfs/'
pdf_example_df = spark.read.format("binaryFile").load(pdf_path).cache()
result = pipeline.transform(pdf_example_df)
val pipeline = new PretrainedPipeline("basic_table_extractor", "en", "clinical/ocr")
val pdf_path = "/content/pdfs/"
val pdf_example_df = spark.read.format("binaryFile").load(pdf_path).cache()
val result = pipeline.transform(pdf_example_df)
Example
Model Information
Model Name: | basic_table_extractor |
Type: | ocr |
Compatibility: | Visual NLP 5.4.0+ |
License: | Licensed |
Edition: | Official |
Language: | en |