Description
Pretrained named entity recognition deep learning model for clinical terms in Danish. The SparkNLP deep learning model (MedicalNerModel) is inspired by a former state of the art model for NER: Chiu & Nicols, Named Entity Recognition with Bidirectional LSTM-CNN.
Predicted Entities
PROBLEM, TEST, TREATMENT
Live Demo Open in Colab Copy S3 URI
How to use
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")
sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")
tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")
embeddings = WordEmbeddingsModel.pretrained("w2v_cc_300d","da") \
    .setInputCols(["sentence", "token"]) \
    .setOutputCol("embeddings")
ner_model = MedicalNerModel.pretrained("ner_clinical", "da", "clinical/models")\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")
ner_converter = NerConverterInternal()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")
pipeline = Pipeline(stages=[
    document_assembler, 
    sentence_detector,
    tokenizer,
    embeddings,
    ner_model,
    ner_converter   
    ])
sample_text = """Deanna, Tmax 102.8, BP fluktuerende 100-140/60s, HR i 80'erne, RR i 20'erne, Sat 75% på RA i triage, op til 100% på NRB. behov for smerte."""
data = spark.createDataFrame([[sample_text]]).toDF("text")
result = pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")
val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")
    .setInputCols("document")
    .setOutputCol("sentence")
val tokenizer = new Tokenizer()
    .setInputCols("sentence")
    .setOutputCol("token")
val embeddings = WordEmbeddingsModel.pretrained("w2v_cc_300d","da")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")
val ner_model = MedicalNerModel.pretrained("ner_clinical", "da", "clinical/models")
    .setInputCols(Array("sentence", "token", "embeddings"))
    .setOutputCol("ner")
val ner_converter = new NerConverterInternal()
    .setInputCols(Array("sentence", "token", "ner"))
    .setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(
    document_assembler, 
    sentence_detector,
    tokenizer,
    embeddings,
    ner_model,
    ner_converter   
))
sample_data = Seq("""Deanna, Tmax 102.8, BP fluktuerende 100-140/60s, HR i 80'erne, RR i 20'erne, Sat 75% på RA i triage, op til 100% på NRB. behov for smerte.""").toDS.toDF("text")
val result = pipeline.fit(sample_data).transform(sample_data)
Results
+------+-----+---+---------+
|chunk |begin|end|ner_label|
+------+-----+---+---------+
|Tmax  |8    |11 |TEST     |
|BP    |20   |21 |TEST     |
|HR    |49   |50 |TEST     |
|RR    |63   |64 |TEST     |
|Sat   |77   |79 |TEST     |
|RA    |88   |89 |TREATMENT|
|NRB   |116  |118|TREATMENT|
|smerte|131  |136|PROBLEM  |
+------+-----+---+---------+
Model Information
| Model Name: | ner_clinical | 
| Compatibility: | Healthcare NLP 5.1.0+ | 
| License: | Licensed | 
| Edition: | Official | 
| Input Labels: | [sentence, token, embeddings] | 
| Output Labels: | [ner] | 
| Language: | da | 
| Size: | 2.9 MB | 
Benchmarking
       label  precision    recall  f1-score   support
        TEST       0.88      0.85      0.87       296
     PROBLEM       0.77      0.85      0.81       809
   TREATMENT       0.82      0.75      0.79       282
   micro-avg       0.80      0.83      0.81      1387
   macro-avg       0.82      0.82      0.82      1387
weighted-avg       0.80      0.83      0.81      1387