Description
Model for table and form detection in documents. It is based on text detection model with extra post-processing.
Predicted Entities
table
, form
.
Live Demo Open in Colab Download
How to use
binary_to_image = BinaryToImage() \
.setImageType(ImageType.TYPE_3BYTE_BGR)
region_detector = ImageDocumentRegionDetector.pretrained("tabform_v1", "en", "clinical/ocr") \
.setInputCol("image") \
.setOutputCol("regions") \
.setScoreThreshold(0.25)
draw_regions = ImageDrawRegions() \
.setInputCol("image") \
.setInputRegionsCol("regions") \
.setOutputCol("image_with_regions") \
.setRectColor(Color.red)
pipeline = PipelineModel(stages=[
binary_to_image,
region_detector,
draw_regions
])
imagePath = "data/tabform_images/irs_sp_1.jpg"
image_df = spark.read.format("binaryFile").load(imagePath)
result = pipeline.transform(image_df)
val binary_to_image = new BinaryToImage()
.setImageType(ImageType.TYPE_3BYTE_BGR)
val region_detector = new ImageDocumentRegionDetector.pretrained("tabform_v1", "en", "clinical/ocr")
.setInputCol("image")
.setOutputCol("regions")
.setScoreThreshold(0.25)
val draw_regions = new ImageDrawRegions()
.setInputCol("image")
.setInputRegionsCol("regions")
.setOutputCol("image_with_regions")
.setRectColor(Color.red)
val pipeline = new PipelineModel().setStages(Array(
binary_to_image,
region_detector,
draw_regions))
val imagePath = "data/tabform_images/irs_sp_1.jpg"
val image_df = spark.read.format("binaryFile").load(imagePath)
val result = pipeline.transform(image_df)
Example
Model Information
Model Name: | tabform_v1 |
Type: | ocr |
Compatibility: | Visual NLP 4.3.0+ |
License: | Licensed |
Edition: | Official |
Language: | en |