# GPU vs CPU benchmark

This section includes a benchmark for MedicalNerApproach(), comparing its performance when running in m5.8xlarge CPU vs a Tesla V100 SXM2 GPU, as described in the Machine Specs section below.

Big improvements have been carried out from version 3.3.4, so please, make sure you use at least that version to fully levearge Spark NLP capabilities on GPU.

### Machine specs

#### CPU

An AWS m5.8xlarge machine was used for the CPU benchmarking. This machine consists of 32 vCPUs and 128 GB of RAM, as you can check in the official specification webpage available here

#### GPU

A Tesla V100 SXM2 GPU with 32GB of memory was used to calculate the GPU benchmarking.

### Versions

The benchmarking was carried out with the following Spark NLP versions:

Spark version: 3.0.2

Hadoop version: 3.2.0

SparkNLP version: 3.3.4

SparkNLP for Healthcare version: 3.3.4

Spark nodes: 1

### Benchmark on MedicalNerDLApproach()

This experiment consisted of training a Name Entity Recognition model (token-level), using our class NerDLApproach(), using Bert Word Embeddings and a Char-CNN-BiLSTM Neural Network. Only 1 Spark node was used for the training.

We used the Spark NLP class MedicalNer and it’s method Approach() as described in the documentation.

The pipeline looks as follows:

#### Dataset

The size of the dataset was small (17K), consisting of:

Training (rows): 14041

Test (rows): 3250

#### Training params

Different batch sizes were tested to demonstrate how GPU performance improves with bigger batches compared to CPU, for a constant number of epochs and learning rate.

Epochs: 10

Learning rate: 0.003

Batch sizes: 32, 64, 256, 512, 1024, 2048

#### Results

Even for this small dataset, we can observe that GPU is able to beat the CPU machine by a 62% in training time and a 68% in inference times. It’s important to mention that the batch size is very relevant when using GPU, since CPU scales much worse with bigger batch sizes than GPU.

#### Training times depending on batch (in minutes)

Batch size CPU GPU
32 9.5 10
64 8.1 6.5
256 6.9 3.5
512 6.7 3
1024 6.5 2.5
2048 6.5 2.5

#### Inference times (in minutes)

Although CPU times in inference remain more or less constant regardless the batch sizes, GPU time experiment good improvements the bigger the batch size is.

CPU times: ~29 min

Batch size GPU
32 10
64 6.5
256 3.5
512 3
1024 2.5
2048 2.5

#### Performance metrics

A macro F1-score of about 0.92 (0.90 in micro) was achieved, with the following charts extracted from the MedicalNerApproach() logs:

You will experiment big GPU improvements in the following cases:

1. Embeddings and Transformers are used in your pipeline. Take into consideration that GPU will performance very well in Embeddings / Transformer components, but other components of your pipeline may not leverage as well GPU capabilities;
2. Bigger batch sizes get the best of GPU, while CPU does not scale with bigger batch sizes;
3. Bigger dataset sizes get the best of GPU, while may be a bottleneck while running in CPU and lead to performance drops;

### MultiGPU Inference on Databricks

In this part, we will give you an idea on how to choose appropriate hardware specifications for Databricks. Here is a few different hardwares, their prices, as well as their performance:

Apparently, GPU hardware is the cheapest among them although it performs the best. Let’s see how overall performance looks like:

Figure above clearly shows us that GPU should be the first option of ours.

In conclusion, please find the best specifications for your use case since these benchmarks might depend on dataset size, inference batch size, quickness, pricing and so on.

### MultiGPU training

Currently, we don’t support multiGPU training, meaning training 1 model in different GPUs in parallel. However, you can train different models in different GPUs.

### MultiGPU inference

Spark NLP can carry out MultiGPU inference if GPUs are in different cluster nodes. For example, if you have a cluster with different GPUs, you can repartition your data to match the number of GPU nodes and then coalesce to retrieve the results back to the master node.

Currently, inference on multiple GPUs on the same machine is not supported.