Detect Symptoms, Treatments and Other Entities in German

Description

This model can be used to detect symptoms, treatments and other entities in medical text in German language.

Predicted Entities

DIAGLAB_PROCEDURE, MEDICAL_SPECIFICATION, MEDICAL_DEVICE, MEASUREMENT, BIOLOGICAL_CHEMISTRY, BODY_FLUID, TIME_INFORMATION, LOCAL_SPECIFICATION, BIOLOGICAL_PARAMETER, PROCESS, MEDICATION, DOSING, DEGREE, MEDICAL_CONDITION, PERSON, TISSUE, STATE_OF_HEALTH, BODY_PART, TREATMENT

Live Demo Open in Colab Copy S3 URI

How to use


document_assembler = DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer() \
    .setInputCols("sentence") \
    .setOutputCol("token")

word_embeddings = WordEmbeddingsModel.pretrained("w2v_cc_300d","de","clinical/models")\
   .setInputCols(["sentence","token"])\
   .setOutputCol("embeddings")

clinical_ner = MedicalNerModel.pretrained("ner_healthcare", "de", "clinical/models") \
  .setInputCols(["sentence", "token", "embeddings"]) \
  .setOutputCol("ner")

clinical_ner_converter = NerConverter() \
    .setInputCols(["sentence", "token", "ner"]) \
    .setOutputCol("entities")

nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, clinical_ner_converter])

light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

annotations = light_pipeline.fullAnnotate("Das Kleinzellige Bronchialkarzinom (Kleinzelliger Lungenkrebs, SCLC) ist ein hochmalignes bronchogenes Karzinom")

val document_assembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare","en","clinical/models")
    .setInputCols("document")
    .setOutputCol("sentence")

val tokenizer = new Tokenizer()
    .setInputCols("sentence")
    .setOutputCol("token")

val word_embeddings = WordEmbeddingsModel.pretrained("w2v_cc_300d","de","clinical/models")
    .setInputCols(Array("sentence","token"))
    .setOutputCol("embeddings")

val ner = MedicalNerModel.pretrained("ner_healthcare_slim", "de", "clinical/models") 
    .setInputCols(Array("sentence", "token", "embeddings")) 
    .setOutputCol("ner")

val clinical_ner_converter = new NerConverter()
    .setInputCols(Array("sentence", "token", "ner"))
    .setOutputCol("entities")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, ner, clinical_ner_converter))

val data = Seq("Das Kleinzellige Bronchialkarzinom (Kleinzelliger Lungenkrebs, SCLC) ist ein hochmalignes bronchogenes Karzinom").toDF("text")

val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("de.med_ner.healthcare").predict("""Das Kleinzellige Bronchialkarzinom (Kleinzelliger Lungenkrebs, SCLC) ist ein hochmalignes bronchogenes Karzinom""")

Results

+-----------------+---------------------+-----+---+
|chunk            |ner_label            |begin|end|
+-----------------+---------------------+-----+---+
|Kleinzellige     |MEASUREMENT          |4    |15 |
|Bronchialkarzinom|MEDICAL_CONDITION    |17   |33 |
|Kleinzelliger    |MEDICAL_SPECIFICATION|36   |48 |
|Lungenkrebs      |MEDICAL_CONDITION    |50   |60 |
|SCLC             |MEDICAL_CONDITION    |63   |66 |
|Karzinom         |MEDICAL_CONDITION    |103  |110|
+-----------------+---------------------+-----+---+

Model Information

Model Name: ner_healthcare
Compatibility: Healthcare NLP 3.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: de

Data Source

Trained on augmented version of 2010 i2b2/VA dataset on concepts, assertions, and relations in clinical text with w2v_cc_300d.

Benchmarking

|    | label               |     tp |    fp |   fn | precision|    recall|       f1 |
|---:|--------------------:|-------:|------:|-----:|---------:|---------:|---------:|
|  0 | BIOLOGICAL_PARAMETER|    103 |    52 |   57 | 0.6645   | 0.6438   |  0.654   |
|  1 | BODY_FLUID          |    166 |    16 |   24 | 0.9121   | 0.8737   | 0.8925   |
|  2 | PERSON              |    475 |    74 |  142 | 0.8652   | 0.7699   | 0.8148   |
|  3 | DOSING              |     38 |    14 |   31 | 0.7308   | 0.5507   | 0.6281   |
|  4 | DIAGLAB_PROCEDURE   |    236 |    58 |   68 | 0.8027   | 0.7763   | 0.7893   |
|  5 | BODY_PART           |    690 |    72 |   79 | 0.9055   | 0.8973   | 0.9014   |
|  6 | MEDICATION          |    391 |   117 |  167 | 0.7697   | 0.7007   | 0.7336   |
|  7 | STATE_OF_HEALTH     |    321 |    41 |   76 | 0.8867   | 0.8086   | 0.8458   |
|  8 | LOCAL_SPECIFICATION |     57 |    19 |   24 |   0.75   | 0.7037   | 0.7261   |
|  9 | MEASUREMENT         |    574 |   260 |  222 | 0.6882   | 0.7211   | 0.7043   |
| 10 | TREATMENT           |    476 |   131 |  135 | 0.7842   | 0.7791   | 0.7816   |
| 11 | MEDICAL_CONDITION   |   1741 |   442 |  271 | 0.7975   | 0.8653   |   0.83   |
| 12 | TIME_INFORMATION    |    651 |   126 |  161 | 0.8378   | 0.8017   | 0.8194   |
| 13 | BIOLOGICAL_CHEMISTRY|    192 |    55 |   60 | 0.7773   | 0.7619   | 0.7695   |