Detect Clinical Entities (ner_jsl_slim)

Description

This is a pretrained named entity recognition deep learning model for clinical terminology. It is based on ner_jsl model, but with more generalised entities.

Predicted Entities

Death_Entity, Medical_Device, Vital_Sign, Alergen, Drug, Clinical_Dept, Lifestyle, Symptom, Body_Part, Physical_Measurement, Admission_Discharge, Date_Time, Age, Birth_Entity, Header, Oncological, Substance_Quantity, Test_Result, Test, Procedure, Treatment, Disease_Syndrome_Disorder, Pregnancy_Newborn, Demographics

Live Demo Open in Colab Download

How to use

embeddings_clinical = WordEmbeddingsModel().pretrained('embeddings_clinical', 'en', 'clinical/models') \
    .setInputCols(['sentence', 'token']) \
    .setOutputCol('embeddings')

clinical_ner = MedicalNerModel().pretrained("ner_jsl_slim", "en", "clinical/models") \
  .setInputCols(["sentence", "token", "embeddings"]) \
  .setOutputCol("ner")
...
nlpPipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings_clinical,  clinical_ner, ner_converter])

model = nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

results = model.transform(spark.createDataFrame([["HISTORY: 30-year-old female presents for digital bilateral mammography secondary to a soft tissue lump palpated by the patient in the upper right shoulder. The patient has a family history of breast cancer within her mother at age 58. Patient denies personal history of breast cancer."]], ["text"]))
...
val embeddings_clinical = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
   .setInputCols(["sentence", "token"])
   .setOutputCol("embeddings")
val ner = MedicalNerModel.pretrained("ner_jsl_slim", "en", "clinical/models") 
  .setInputCols("sentence", "token", "embeddings")
  .setOutputCol("ner")
...
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings_clinical, ner, ner_converter))
val data = Seq("HISTORY: 30-year-old female presents for digital bilateral mammography secondary to a soft tissue lump palpated by the patient in the upper right shoulder. The patient has a family history of breast cancer within her mother at age 58. Patient denies personal history of breast cancer.").toDF("text")
val result = pipeline.fit(data).transform(data)

Results

|    | chunk            | entity       |
|---:|:-----------------|:-------------|
|  0 | HISTORY:         | Header       |
|  1 | 30-year-old      | Age          |
|  2 | female           | Demographics |
|  3 | mammography      | Test         |
|  4 | soft tissue lump | Symptom      |
|  5 | shoulder         | Body_Part    |
|  6 | breast cancer    | Oncological  |
|  7 | her mother       | Demographics |
|  8 | age 58           | Age          |
|  9 | breast cancer    | Oncological  |

Model Information

Model Name: ner_jsl_slim
Compatibility: Spark NLP for Healthcare 3.2.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en

Data Source

Trained on data annotated by JSL.

Benchmarking

|label                      |tp  |fp  |fn  |prec      |rec       |f1        |
|---------------------------|----|----|----|----------|----------|----------|
|B-Medical_Device           |2696|444 |282 |0.8585987 |0.90530556|0.88133377|
|I-Physical_Measurement     |220 |16  |34  |0.9322034 |0.86614174|0.8979592 |
|B-Procedure                |1800|239 |281 |0.8827857 |0.8649688 |0.8737864 |
|B-Drug                     |1865|218 |237 |0.89534324|0.88725024|0.8912784 |
|I-Test_Result              |289 |203 |292 |0.58739835|0.49741825|0.5386766 |
|I-Pregnancy_Newborn        |150 |41  |104 |0.7853403 |0.5905512 |0.6741573 |
|B-Admission_Discharge      |255 |35  |6   |0.87931037|0.9770115 |0.92558986|
|B-Demographics             |4609|119 |123 |0.9748308 |0.9740068 |0.97441864|
|I-Lifestyle                |71  |49  |20  |0.59166664|0.7802198 |0.67298573|
|B-Header                   |2463|53  |122 |0.9789348 |0.9528046 |0.965693  |
|I-Date_Time                |928 |184 |191 |0.8345324 |0.8293119 |0.83191395|
|B-Test_Result              |866 |198 |262 |0.81390977|0.7677305 |0.79014593|
|I-Treatment                |114 |37  |46  |0.7549669 |0.7125    |0.733119  |
|B-Clinical_Dept            |688 |83  |76  |0.8923476 |0.90052354|0.8964169 |
|B-Test                     |1920|333 |313 |0.85219705|0.85982984|0.8559965 |
|B-Death_Entity             |36  |9   |2   |0.8       |0.94736844|0.8674699 |
|B-Lifestyle                |268 |58  |50  |0.8220859 |0.8427673 |0.8322981 |
|B-Date_Time                |823 |154 |176 |0.8423746 |0.8238238 |0.83299595|
|I-Age                      |136 |34  |49  |0.8       |0.73513514|0.7661972 |
|I-Oncological              |345 |41  |19  |0.8937824 |0.9478022 |0.91999996|
|I-Body_Part                |3717|720 |424 |0.8377282 |0.8976093 |0.8666356 |
|B-Pregnancy_Newborn        |153 |51  |104 |0.75      |0.5953307 |0.6637744 |
|B-Treatment                |169 |41  |58  |0.8047619 |0.74449337|0.7734553 |
|I-Procedure                |2302|326 |417 |0.8759513 |0.8466348 |0.8610435 |
|B-Birth_Entity             |6   |5   |7   |0.54545456|0.46153846|0.5       |
|I-Vital_Sign               |639 |197 |93  |0.76435405|0.8729508 |0.815051  |
|I-Header                   |4451|111 |216 |0.97566855|0.9537176 |0.9645682 |
|I-Death_Entity             |2   |0   |0   |1         |1         |1         |
|I-Clinical_Dept            |621 |54  |39  |0.92      |0.9409091 |0.9303371 |
|I-Test                     |1593|378 |353 |0.8082192 |0.81860226|0.81337756|
|B-Age                      |472 |43  |51  |0.91650486|0.90248567|0.9094413 |
|I-Symptom                  |4227|1271|1303|0.76882505|0.7643761 |0.7665941 |
|I-Demographics             |321 |53  |53  |0.85828876|0.85828876|0.85828876|
|B-Body_Part                |6312|912 |809 |0.87375414|0.88639235|0.8800279 |
|B-Physical_Measurement     |91  |10  |17  |0.9009901 |0.8425926 |0.8708134 |
|B-Disease_Syndrome_Disorder|2817|336 |433 |0.8934348 |0.86676925|0.8799001 |
|B-Symptom                  |4522|830 |747 |0.8449178 |0.8582274 |0.8515206 |
|I-Disease_Syndrome_Disorder|2814|386 |530 |0.879375  |0.8415072 |0.86002445|
|I-Drug                     |3737|612 |517 |0.859278  |0.8784673 |0.8687667 |
|I-Medical_Device           |1825|331 |131 |0.84647495|0.9330266 |0.8876459 |
|B-Oncological              |276 |28  |27  |0.90789473|0.9108911 |0.9093904 |
|B-Vital_Sign               |429 |97  |79  |0.81558937|0.8444882 |0.8297872 |

tp: 62038 fp: 9340 fn: 9110 labels: 46
Macro-average	 prec: 0.76782775, rec: 0.7648211, f1: 0.76632154
Micro-average	 prec: 0.86914736, rec: 0.87195706, f1: 0.87055