Description
Pretrained named entity recognition deep learning model for clinical terms in Danish. The SparkNLP deep learning model (MedicalNerModel) is inspired by a former state of the art model for NER: Chiu & Nicols, Named Entity Recognition with Bidirectional LSTM-CNN.
Predicted Entities
PROBLEM
, TEST
, TREATMENT
Live Demo Open in Colab Copy S3 URI
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = WordEmbeddingsModel.pretrained("w2v_cc_300d","da") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings")
ner_model = MedicalNerModel.pretrained("ner_clinical", "da", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = NerConverterInternal()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
pipeline = Pipeline(stages=[
document_assembler,
sentence_detector,
tokenizer,
embeddings,
ner_model,
ner_converter
])
sample_text = """Deanna, Tmax 102.8, BP fluktuerende 100-140/60s, HR i 80'erne, RR i 20'erne, Sat 75% på RA i triage, op til 100% på NRB. behov for smerte."""
data = spark.createDataFrame([[sample_text]]).toDF("text")
result = pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val embeddings = WordEmbeddingsModel.pretrained("w2v_cc_300d","da")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val ner_model = MedicalNerModel.pretrained("ner_clinical", "da", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverterInternal()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(
document_assembler,
sentence_detector,
tokenizer,
embeddings,
ner_model,
ner_converter
))
sample_data = Seq("""Deanna, Tmax 102.8, BP fluktuerende 100-140/60s, HR i 80'erne, RR i 20'erne, Sat 75% på RA i triage, op til 100% på NRB. behov for smerte.""").toDS.toDF("text")
val result = pipeline.fit(sample_data).transform(sample_data)
Results
+------+-----+---+---------+
|chunk |begin|end|ner_label|
+------+-----+---+---------+
|Tmax |8 |11 |TEST |
|BP |20 |21 |TEST |
|HR |49 |50 |TEST |
|RR |63 |64 |TEST |
|Sat |77 |79 |TEST |
|RA |88 |89 |TREATMENT|
|NRB |116 |118|TREATMENT|
|smerte|131 |136|PROBLEM |
+------+-----+---+---------+
Model Information
Model Name: | ner_clinical |
Compatibility: | Healthcare NLP 5.1.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | da |
Size: | 2.9 MB |
Benchmarking
label precision recall f1-score support
TEST 0.88 0.85 0.87 296
PROBLEM 0.77 0.85 0.81 809
TREATMENT 0.82 0.75 0.79 282
micro-avg 0.80 0.83 0.81 1387
macro-avg 0.82 0.82 0.82 1387
weighted-avg 0.80 0.83 0.81 1387