Table and Form Detection

Description

This model is designed for detecting tables and forms within documents. It utilizes an object detection approach specifically tailored for document structures, combining robust text detection with additional post-processing techniques to enhance accuracy and precision.

The model identifies and locates tables and forms by analyzing the layout of the document, even in cases where the formatting is complex or inconsistent. By refining the initial detections through post-processing, it ensures that the detected regions are accurately classified and aligned, providing a reliable solution for automating the extraction of tabular and form-based data from scanned documents.

This approach significantly improves the efficiency of tasks such as document analysis, data extraction, and automated processing, making it ideal for use in industries dealing with large volumes of structured documents.

Predicted Entities

table, form.

Live Demo Open in Colab Download

How to use

binary_to_image = BinaryToImage() \
    .setImageType(ImageType.TYPE_3BYTE_BGR)

region_detector = ImageDocumentRegionDetector.pretrained("tabform_v1", "en", "clinical/ocr") \
    .setInputCol("image") \
    .setOutputCol("regions") \
    .setScoreThreshold(0.25)

draw_regions = ImageDrawRegions() \
    .setInputCol("image") \
    .setInputRegionsCol("regions") \
    .setOutputCol("image_with_regions") \
    .setRectColor(Color.red)

pipeline = PipelineModel(stages=[
    binary_to_image,
    region_detector,
    draw_regions
])

imagePath = "data/tabform_images/irs_sp_1.jpg"
image_df = spark.read.format("binaryFile").load(imagePath)

result = pipeline.transform(image_df)

val binary_to_image = new BinaryToImage()
    .setImageType(ImageType.TYPE_3BYTE_BGR)

val region_detector = new ImageDocumentRegionDetector.pretrained("tabform_v1", "en", "clinical/ocr")
    .setInputCol("image")
    .setOutputCol("regions")
    .setScoreThreshold(0.25)

val draw_regions = new ImageDrawRegions()
    .setInputCol("image")
    .setInputRegionsCol("regions")
    .setOutputCol("image_with_regions")
    .setRectColor(Color.red)

val pipeline = new PipelineModel().setStages(Array(
    binary_to_image,
    region_detector,
    draw_regions))

val imagePath = "data/tabform_images/irs_sp_1.jpg"
val image_df = spark.read.format("binaryFile").load(imagePath)

val result = pipeline.transform(image_df)

Example

Input image

Output image

Model Information

Model Name:	tabform_v1
Type:	ocr
Compatibility:	Visual NLP 4.3.0+
License:	Licensed
Edition:	Official
Language:	en

PREVIOUSDiT model pretrained on IIT-CDIP and finetuned on RVL-CDIP for document classification

NEXTDiT model finetuned on FUNSD for text detection