Detect Problems, Tests and Treatments (ner_clinical) in German

Description

Pretrained named entity recognition deep learning model for clinical terms in German. The SparkNLP deep learning model (MedicalNerModel) is inspired by a former state of the art model for NER: Chiu & Nicols, Named Entity Recognition with Bidirectional LSTM-CNN.

Predicted Entities

PROBLEM, TEST, TREATMENT

Open in Colab Download Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
         .setInputCol("text")\
         .setOutputCol("document")
          
 sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \
     .setInputCols(["document"]) \
     .setOutputCol("sentence")

 tokenizer = Tokenizer()\
         .setInputCols(["sentence"])\
         .setOutputCol("token")

 word_embeddings = WordEmbeddingsModel.pretrained("w2v_cc_300d", "de", "clinical/models")\
         .setInputCols(["sentence", "token"])\
         .setOutputCol("embeddings")

 clinical_ner = MedicalNerModel.pretrained("ner_clinical", "de", "clinical/models") \
         .setInputCols(["sentence", "token", "embeddings"]) \
         .setOutputCol("ner")

 ner_converter = NerConverterInternal()\
          .setInputCols(["sentence", "token", "ner"])\
          .setOutputCol("ner_chunk")

 nlpPipeline = Pipeline(stages=[document_assembler, 
                                sentence_detector, 
                                tokenizer, 
                                word_embeddings, 
                                clinical_ner, 
                                ner_converter])

 model = nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

 sample_text= """Verschlechterung von Schmerzen oder Schwäche in den Beinen , Verlust der Darm - oder Blasenfunktion oder andere besorgniserregende Symptome. 
 Der Patient erhielt empirisch Ampicillin , Gentamycin und Flagyl sowie Narcan zur Umkehrung von Fentanyl .
 ALT war 181 , AST war 156 , LDH war 336 , alkalische Phosphatase war 214 und Bilirubin war insgesamt 12,7 ."""

 results = model.transform(spark.createDataFrame([[sample_text]], ["text"]))
val document_assembler = new DocumentAssembler()
         .setInputCol("text")
         .setOutputCol("document")
          
 val sentence_detector =  SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")
         .setInputCols("document")
         .setOutputCol("sentence")

 val tokenizer = new Tokenizer()
         .setInputCols("sentence")
         .setOutputCol("token")

 val word_embeddings = WordEmbeddingsModel.pretrained("w2v_cc_300d", "de", "clinical/models")
         .setInputCols(Array("sentence", "token"))
         .setOutputCol("embeddings")

 val ner = MedicalNerModel.pretrained("ner_clinical", "de", "clinical/models")
         .setInputCols(Array("sentence", "token", "embeddings"))
         .setOutputCol("ner")

 val ner_converter = new NerConverterInternal()
          .setInputCols(Array("sentence", "token", "ner"))
          .setOutputCol("ner_chunk")

 val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, ner, ner_converter))

 val data = Seq("""Verschlechterung von Schmerzen oder Schwäche in den Beinen , Verlust der Darm - oder Blasenfunktion oder andere besorgniserregende Symptome. 
 Der Patient erhielt empirisch Ampicillin , Gentamycin und Flagyl sowie Narcan zur Umkehrung von Fentanyl .
 ALT war 181 , AST war 156 , LDH war 336 , alkalische Phosphatase war 214 und Bilirubin war insgesamt 12,7 .""").toDS().toDF("text")

 val result = pipeline.fit(data).transform(data)

Results

+----------------------+---------+
 |chunk                 |ner_label|
 +----------------------+---------+
 |Schmerzen             |PROBLEM  |
 |Schwäche in den Beinen|PROBLEM  |
 |Verlust der Darm      |PROBLEM  |
 |Blasenfunktion        |PROBLEM  |
 |Symptome              |PROBLEM  |
 |empirisch Ampicillin  |TREATMENT|
 |Gentamycin            |TREATMENT|
 |Flagyl                |TREATMENT|
 |Narcan                |TREATMENT|
 |Fentanyl              |TREATMENT|
 |ALT                   |TEST     |
 |AST                   |TEST     |
 |LDH                   |TEST     |
 |alkalische Phosphatase|TEST     |
 |Bilirubin             |TEST     |
 +----------------------+---------+

Model Information

Model Name: ner_clinical
Compatibility: Healthcare NLP 4.4.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: de
Size: 2.0 MB

Benchmarking

    label       precision   recall  f1-score   support
   B-PROBLEM       0.85      0.71      0.78       512
      B-TEST       0.89      0.85      0.87       203
 B-TREATMENT       0.84      0.82      0.83       238
   I-PROBLEM       0.78      0.70      0.74       355
      I-TEST       0.90      0.83      0.87        66
 I-TREATMENT       0.62      0.71      0.66        75
           O       0.94      0.97      0.95      4141
    accuracy        -          -       0.91      5590
   macro avg       0.83      0.80      0.81      5590
weighted avg       0.91      0.91      0.91      5590