Speed Benchmarks
PDF De-identification Benchmark Experiment
- Dataset: 1000 scanned PDF pages.
- Instance :
- m5n.4xlarge (16 vCPUs, 64 GiB memory)
- m5n.8xlarge (32 vCPUs, 128 GiB memory)
- AMI: ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20240411
- Versions:
- spark-nlp Version: v5.4.0
- visual-nlp Version: v5.3.2
- spark-nlp-jsl Version : v5.3.2
- Spark Version : v3.4.1
- Visual NLP Pipeline: ‘pdf_deid_subentity_context_augmented_pipeline’
Benchmark Table
Instance | memory | cores | input_data_pages | partition | second per page | timing |
---|---|---|---|---|---|---|
m5n.4xlarge | 64 GB | 16 | 1000 | 10 | 0.24 | 4 mins |
m5n.8xlarge | 128 GB | 32 | 1000 | 32 | 0.15 | 2.5 mins |
PREVIOUSStructures and helpers
NEXTRelease notes