Description
This is a ViT (Visual Transformer) model, which can be used to carry out Binary Classification (true or false) on pictures / photos / images. This model has been trained in-house with different corpora, including:
- CORD
- COCO
- In-house annotated receipts
You can use this model to filter out non-tickets from a folder of images or mobile pictures, and then use Visual NLP to extract information using the layout and the text features.
Predicted Entities
ticket
, no_ticket
How to use
document_assembler = nlp.ImageAssembler() \
.setInputCol("image") \
.setOutputCol("image_assembler")
imageClassifier_loaded = nlp.ViTForImageClassification.pretrained("finvisualclf_vit_tickets", "en", "finance/models")\
.setInputCols(["image_assembler"])\
.setOutputCol("class")
pipeline = nlp.Pipeline().setStages([
document_assembler,
imageClassifier_loaded
])
test_image = spark.read\
.format("image")\
.option("dropInvalid", value = True)\
.load("./ticket.JPEG")
result = pipeline.fit(test_image).transform(test_image)
result.select("class.result").show(1, False)
Results
+--------+
|result |
+--------+
|[ticket]|
+--------+
Model Information
Model Name: | finvisualclf_vit_tickets |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [image_assembler] |
Output Labels: | [class] |
Language: | en |
Size: | 321.9 MB |
References
Cord, rvl-cdip, visual-genome and an external receipt dataset
Benchmarking
label score
training_loss 0.0006
validation_loss 0.0044
f1 0.9997