Description
This model can be used to detect clinical events in medical text, with a focus on admission entities.
Predicted Entities
DATE
, TIME
, PROBLEM
, TEST
, TREATMENT
, OCCURENCE
, CLINICAL_DEPT
, EVIDENTIAL
, DURATION
, FREQUENCY
, ADMISSION
, DISCHARGE
.
Live Demo Open in Colab Download
How to use
...
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_events_admission_clinical", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
...
nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter])
light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))
annotations = light_pipeline.fullAnnotate("The patient presented to the emergency room last evening")
...
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val ner = MedicalNerModel.pretrained("ner_events_admission_clinical", "en", "clinical/models")
.setInputCols("sentence", "token", "embeddings")
.setOutputCol("ner")
...
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, ner, ner_converter))
val result = pipeline.fit(Seq.empty["The patient presented to the emergency room last evening"].toDS.toDF("text")).transform(data)
Results
+----+-----------------------------+---------+---------+-----------------+
| | chunk | begin | end | entity |
+====+=============================+=========+=========+=================+
| 0 | presented | 12 | 20 | EVIDENTIAL |
+----+-----------------------------+---------+---------+-----------------+
| 1 | the emergency room | 25 | 42 | CLINICAL_DEPT |
+----+-----------------------------+---------+---------+-----------------+
| 2 | last evening | 44 | 55 | DATE |
+----+-----------------------------+---------+---------+-----------------+
Model Information
Model Name: | ner_events_admission_clinical |
Compatibility: | Spark NLP for Healthcare 3.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Data Source
Trained on augmented/enriched i2b2 events data with clinical_embeddings. The data for Admissions has been enriched specifically.
Benchmarking
label tp fp fn prec rec f1
I-TIME 42 6 9 0.875 0.8235294 0.8484849
I-EVIDENTIAL 2 1 18 0.6666667 0.1 0.17391305
I-TREATMENT 1134 111 312 0.9108434 0.7842324 0.8428094
B-OCCURRENCE 406 344 382 0.5413333 0.51522845 0.52795845
I-DURATION 160 42 71 0.7920792 0.6926407 0.73903
B-DATE 500 32 49 0.9398496 0.9107468 0.92506933
I-DATE 309 54 49 0.8512397 0.8631285 0.8571429
B-ADMISSION 206 1 2 0.9951691 0.99038464 0.9927711
I-PROBLEM 2394 390 412 0.85991377 0.85317177 0.8565295
B-CLINICAL_DEPT 327 64 77 0.8363171 0.8094059 0.8226415
B-TIME 44 12 15 0.78571427 0.7457627 0.76521736
I-CLINICAL_DEPT 597 62 78 0.90591806 0.8844444 0.8950525
B-PROBLEM 1643 260 252 0.86337364 0.86701846 0.86519223
I-FREQUENCY 35 21 39 0.625 0.47297296 0.5384615
I-OCCURRENCE 305 410 316 0.42657343 0.49114332 0.45658684
I-TEST 1082 171 117 0.86352754 0.9024187 0.8825449
B-TEST 781 125 127 0.8620309 0.86013216 0.86108047
B-TREATMENT 1283 176 202 0.87936944 0.8639731 0.87160325
B-DISCHARGE 155 0 1 1.0 0.99358976 0.99678457
B-EVIDENTIAL 269 25 75 0.914966 0.78197676 0.84326017
B-DURATION 97 43 44 0.69285715 0.6879433 0.6903914
B-FREQUENCY 70 16 33 0.81395346 0.6796116 0.7407407
tp: 11841 fp: 2366 fn: 2680 labels: 22
Macro-average prec: 0.8137135, rec: 0.7533389, f1: 0.7823631
Micro-average prec: 0.83346236, rec: 0.8154397, f1: 0.8243525
PREVIOUSDetect Drug Chemicals