Detect Clinical Entities (Slim version, BertForTokenClassifier)

Description

This is a pretrained named entity recognition deep learning model for clinical terminology. It is based on the bert_token_classifier_ner_jsl model, but with more generalized entities. This model is trained with BertForTokenClassification method from the transformers library and imported into Spark NLP.

Predicted Entities

Death_Entity, Medical_Device, Vital_Sign, Alergen, Drug, Clinical_Dept, Lifestyle, Symptom, Body_Part, Physical_Measurement, Admission_Discharge, Date_Time, Age, Birth_Entity, Header, Oncological, Substance_Quantity, Test_Result, Test, Procedure, Treatment, Disease_Syndrome_Disorder, Pregnancy_Newborn, Demographics

Live Demo Open in Colab Download

How to use

...

tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_ner_jsl_slim", "en", "clinical/models")\
  .setInputCols("token", "document")\
  .setOutputCol("ner")\
  .setCaseSensitive(True)

ner_converter = NerConverter()\
        .setInputCols(["sentence","token","ner"])\
        .setOutputCol("ner_chunk")

pipeline = Pipeline(stages=[documentAssembler, sentence_detector, tokenizer, tokenClassifier, ner_converter])

p_model = pipeline.fit(spark.createDataFrame(pd.DataFrame({'text': ['']})))

test_sentence = """HISTORY: 30-year-old female presents for digital bilateral mammography secondary to a soft tissue lump palpated by the patient in the upper right shoulder. The patient has a family history of breast cancer within her mother at age 58. Patient denies personal history of breast cancer."""

result = p_model.transform(spark.createDataFrame(pd.DataFrame({'text': [test_sentence]})))
...

val tokenClassifier = BertForTokenClassification.pretrained("bert_token_classifier_ner_jsl_slim", "en", "clinical/models")
  .setInputCols("token", "sentence")
  .setOutputCol("ner")
  .setCaseSensitive(True)

val ner_converter = NerConverter()
        .setInputCols(Array("document","token","ner"))
        .setOutputCol("ner_chunk")

val pipeline =  new Pipeline().setStages(Array(documentAssembler, sentence_detector, tokenizer, tokenClassifier, ner_converter))

val data = Seq("HISTORY: 30-year-old female presents for digital bilateral mammography secondary to a soft tissue lump palpated by the patient in the upper right shoulder. The patient has a family history of breast cancer within her mother at age 58. Patient denies personal history of breast cancer.").toDF("text")

val result = pipeline.fit(data).transform(data)

Results

+----------------+------------+
|chunk           |ner_label   |
+----------------+------------+
|HISTORY:        |Header      |
|30-year-old     |Age         |
|female          |Demographics|
|mammography     |Test        |
|soft tissue lump|Symptom     |
|shoulder        |Body_Part   |
|breast cancer   |Oncological |
|her mother      |Demographics|
|age 58          |Age         |
|breast cancer   |Oncological |
+----------------+------------+

Model Information

Model Name: bert_token_classifier_ner_jsl_slim
Compatibility: Spark NLP for Healthcare 3.2.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token]
Output Labels: [ner]
Language: en
Case sensitive: true
Max sentense length: 256

Data Source

Trained on data annotated by JSL.

Benchmarking

                             precision    recall  f1-score   support

      B-Admission_Discharge       0.82      0.99      0.90       282
                      B-Age       0.88      0.83      0.85       576
                  B-Alergen       0.17      0.11      0.13         9
             B-Birth_Entity       0.33      0.29      0.31         7
                B-Body_Part       0.84      0.91      0.87      8582
            B-Clinical_Dept       0.86      0.94      0.90       909
                B-Date_Time       0.82      0.77      0.79      1062
             B-Death_Entity       0.66      0.98      0.79        43
             B-Demographics       0.97      0.98      0.98      5285
B-Disease_Syndrome_Disorder       0.84      0.89      0.86      4259
                     B-Drug       0.88      0.87      0.87      2555
                   B-Header       0.97      0.66      0.78      3911
                B-Lifestyle       0.77      0.83      0.80       371
           B-Medical_Device       0.84      0.87      0.85      3605
              B-Oncological       0.86      0.91      0.89       408
     B-Physical_Measurement       0.84      0.81      0.82       135
        B-Pregnancy_Newborn       0.66      0.71      0.68       245
                B-Procedure       0.82      0.88      0.85      2654
       B-Substance_Quantity       0.00      0.00      0.00         1
                  B-Symptom       0.83      0.86      0.85      6545
                     B-Test       0.82      0.83      0.83      2448
              B-Test_Result       0.76      0.81      0.78      1280
                B-Treatment       0.70      0.76      0.73       275
               B-Vital_Sign       0.85      0.87      0.86       627
      I-Admission_Discharge       0.00      0.00      0.00         1
                      I-Age       0.84      0.90      0.87       166
                  I-Alergen       0.00      0.00      0.00         5
                I-Body_Part       0.86      0.89      0.88      4946
            I-Clinical_Dept       0.92      0.93      0.93       806
                I-Date_Time       0.82      0.91      0.86      1173
             I-Death_Entity       1.00      0.29      0.44         7
             I-Demographics       0.89      0.84      0.86       416
I-Disease_Syndrome_Disorder       0.87      0.85      0.86      4385
                     I-Drug       0.83      0.86      0.85      5199
                   I-Header       0.85      0.97      0.90      6763
                I-Lifestyle       0.77      0.69      0.73       134
           I-Medical_Device       0.86      0.86      0.86      2341
              I-Oncological       0.85      0.94      0.89       515
     I-Physical_Measurement       0.88      0.94      0.91       329
        I-Pregnancy_Newborn       0.66      0.70      0.68       273
                I-Procedure       0.87      0.86      0.87      3414
       I-Substance_Quantity       0.00      0.00      0.00         1
                  I-Symptom       0.79      0.75      0.77      6485
                     I-Test       0.82      0.77      0.79      2283
              I-Test_Result       0.67      0.56      0.61       649
                I-Treatment       0.69      0.72      0.70       194
               I-Vital_Sign       0.88      0.90      0.89       918
                          O       0.97      0.97      0.97    210520

                   accuracy                           0.94    297997
                  macro avg       0.74      0.74      0.73    297997
               weighted avg       0.94      0.94      0.94    297997