Speed Benchmarks

 

Speed Benchmarks

PDF De-identification Benchmark Experiment

  • Dataset: 1000 scanned PDF pages.
  • Instance :
    • m5n.4xlarge (16 vCPUs, 64 GiB memory)
    • m5n.8xlarge (32 vCPUs, 128 GiB memory)
  • AMI: ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20240411
  • Versions:
    • spark-nlp Version: v5.4.0
    • visual-nlp Version: v5.3.2
    • spark-nlp-jsl Version : v5.3.2
    • Spark Version : v3.4.1
  • Visual NLP Pipeline: ‘pdf_deid_subentity_context_augmented_pipeline’

Benchmark Table

Instance memory cores input_data_pages partition second per page timing
m5n.4xlarge 64 GB 16 1000 10 0.24 4 mins
m5n.8xlarge 128 GB 32 1000 32 0.15 2.5 mins
Last updated