Receipts Binary Classification

Description

This is a ViT (Visual Transformer) model, which can be used to carry out Binary Classification (true or false) on pictures / photos / images. This model has been trained in-house with different corpora, including:

  • CORD
  • COCO
  • In-house annotated receipts

You can use this model to filter out non-tickets from a folder of images or mobile pictures, and then use Visual NLP to extract information using the layout and the text features.

Predicted Entities

ticket, no_ticket

Copy S3 URI

How to use

document_assembler = nlp.ImageAssembler() \
    .setInputCol("image") \
    .setOutputCol("image_assembler")

imageClassifier_loaded = nlp.ViTForImageClassification.pretrained("finvisualclf_vit_tickets", "en", "finance/models")\
  .setInputCols(["image_assembler"])\
  .setOutputCol("class")

pipeline = nlp.Pipeline().setStages([
    document_assembler,
    imageClassifier_loaded
])

test_image = spark.read\
    .format("image")\
    .option("dropInvalid", value = True)\
    .load("./ticket.JPEG")

result = pipeline.fit(test_image).transform(test_image)

result.select("class.result").show(1, False)

Results

+--------+
|result  |
+--------+
|[ticket]|
+--------+

Model Information

Model Name: finvisualclf_vit_tickets
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [image_assembler]
Output Labels: [class]
Language: en
Size: 321.9 MB

References

Cord, rvl-cdip, visual-genome and an external receipt dataset

Benchmarking

label            score
training_loss    0.0006  
validation_loss  0.0044
f1               0.9997