Description
This model can be used to detect clinical events in medical text. It is the version of ner_events_clinical model augmented with langtest
library
Predicted Entities
DATE
, TIME
, PROBLEM
, TEST
, TREATMENT
, OCCURENCE
, CLINICAL_DEPT
, EVIDENTIAL
, DURATION
, FREQUENCY
, ADMISSION
, DISCHARGE
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_events_clinical_langtest", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = NerConverter()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
nlp_pipeline = Pipeline(stages=[document_assembler,
sentence_detector,
tokenizer,
word_embeddings,
clinical_ner,
ner_converter])
model = nlp_pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
result = model.transform(spark.createDataFrame([["The patient presented to the emergency room last evening"]], ["text"]))
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val ner = MedicalNerModel.pretrained("ner_events_clinical_langtest", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverter()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, ner, ner_converter))
val data = Seq("""The patient presented to the emergency room last evening""").toDS().toDF("text")
val result = pipeline.fit(data).transform(data)
Results
+------------------+-------------+
|chunk |ner_label |
+------------------+-------------+
|presented |EVIDENTIAL |
|the emergency room|CLINICAL_DEPT|
|last evening |DATE |
+------------------+-------------+
Model Information
Model Name: | ner_events_clinical_langtest |
Compatibility: | Healthcare NLP 5.0.2+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 14.7 MB |
References
Trained on augmented version of i2b2 dataset with clinical_embeddings
.
Benchmarking
label precision recall f1-score support
B-TREATMENT 0.84 0.85 0.84 3280
I-TREATMENT 0.84 0.82 0.83 3115
B-DATE 0.82 0.84 0.83 985
I-DATE 0.74 0.83 0.78 1117
B-TEST 0.85 0.84 0.85 2171
B-DURATION 0.63 0.67 0.65 341
I-DURATION 0.64 0.75 0.69 465
B-PROBLEM 0.84 0.86 0.85 4309
I-PROBLEM 0.82 0.85 0.84 6063
B-OCCURRENCE 0.69 0.60 0.64 2493
I-OCCURRENCE 0.50 0.38 0.43 1612
B-DISCHARGE 0.00 0.00 0.00 117
B-EVIDENTIAL 0.76 0.70 0.73 595
I-EVIDENTIAL 0.00 0.00 0.00 18
I-TEST 0.86 0.83 0.85 2665
B-CLINICAL_DEPT 0.79 0.81 0.80 732
I-CLINICAL_DEPT 0.86 0.87 0.87 1410
B-ADMISSION 0.00 0.00 0.00 120
B-FREQUENCY 0.85 0.61 0.71 197
I-FREQUENCY 0.74 0.40 0.52 161
B-TIME 0.68 0.38 0.49 60
I-TIME 0.90 0.37 0.53 127
micro-avg 0.80 0.78 0.79 32153
macro-avg 0.67 0.60 0.62 32153
weighted-avg 0.79 0.78 0.79 32153