Description
This model can be used to detect clinical events in medical text, with a focus on admission entities.
Predicted Entities
DATE
, TIME
, PROBLEM
, TEST
, TREATMENT
, OCCURENCE
, CLINICAL_DEPT
, EVIDENTIAL
, DURATION
, FREQUENCY
, ADMISSION
, DISCHARGE
.
Live Demo Open in Colab Copy S3 URI
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_events_admission_clinical", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = NerConverter()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter])
model = nlp_pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
results = model.transform(spark.createDataFrame([["The patient presented to the emergency room last evening"]], ["text"]))
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = new SentenceDetector()
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val ner = MedicalNerModel.pretrained("ner_events_admission_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverter()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, ner, ner_converter))
val data = Seq("""The patient presented to the emergency room last evening""").toDS().toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.med_ner.admission_events").predict("""The patient presented to the emergency room last evening""")
Results
+----+-----------------------------+---------+---------+-----------------+
| | chunk | begin | end | entity |
+====+=============================+=========+=========+=================+
| 0 | presented | 12 | 20 | EVIDENTIAL |
+----+-----------------------------+---------+---------+-----------------+
| 1 | the emergency room | 25 | 42 | CLINICAL_DEPT |
+----+-----------------------------+---------+---------+-----------------+
| 2 | last evening | 44 | 55 | DATE |
+----+-----------------------------+---------+---------+-----------------+
Model Information
Model Name: | ner_events_admission_clinical |
Compatibility: | Healthcare NLP 3.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Data Source
Trained on augmented/enriched i2b2 events data with clinical_embeddings. The data for Admissions has been enriched specifically.
Benchmarking
label tp fp fn prec rec f1
I-TIME 42 6 9 0.875 0.8235294 0.8484849
I-TREATMENT 1134 111 312 0.9108434 0.7842324 0.8428094
B-OCCURRENCE 406 344 382 0.5413333 0.51522845 0.52795845
I-DURATION 160 42 71 0.7920792 0.6926407 0.73903
B-DATE 500 32 49 0.9398496 0.9107468 0.92506933
I-DATE 309 54 49 0.8512397 0.8631285 0.8571429
B-ADMISSION 206 1 2 0.9951691 0.99038464 0.9927711
I-PROBLEM 2394 390 412 0.85991377 0.85317177 0.8565295
B-CLINICAL_DEPT 327 64 77 0.8363171 0.8094059 0.8226415
B-TIME 44 12 15 0.78571427 0.7457627 0.76521736
I-CLINICAL_DEPT 597 62 78 0.90591806 0.8844444 0.8950525
B-PROBLEM 1643 260 252 0.86337364 0.86701846 0.86519223
I-FREQUENCY 35 21 39 0.625 0.47297296 0.5384615
I-TEST 1082 171 117 0.86352754 0.9024187 0.8825449
B-TEST 781 125 127 0.8620309 0.86013216 0.86108047
B-TREATMENT 1283 176 202 0.87936944 0.8639731 0.87160325
B-DISCHARGE 155 0 1 1.0 0.99358976 0.99678457
B-EVIDENTIAL 269 25 75 0.914966 0.78197676 0.84326017
B-DURATION 97 43 44 0.69285715 0.6879433 0.6903914
B-FREQUENCY 70 16 33 0.81395346 0.6796116 0.7407407
tp: 11841 fp: 2366 fn: 2680 labels: 22
Macro-average prec: 0.8137135, rec: 0.7533389, f1: 0.7823631
Micro-average prec: 0.83346236, rec: 0.8154397, f1: 0.8243525
PREVIOUSDetect Drug Chemicals