Description
This model can be used to detect clinical events in medical text.
Predicted Entities
DATE
, TIME
, PROBLEM
, TEST
, TREATMENT
, OCCURENCE
, CLINICAL_DEPT
, EVIDENTIAL
, DURATION
, FREQUENCY
, ADMISSION
, DISCHARGE
.
Live Demo Open in Colab Copy S3 URI
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_events_clinical", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = NerConverter()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter])
model = nlp_pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
results = model.transform(spark.createDataFrame([["The patient presented to the emergency room last evening"]], ["text"]))
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = new SentenceDetector()
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val ner = MedicalNerModel.pretrained("ner_events_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverter()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, ner, ner_converter))
val data = Seq("""The patient presented to the emergency room last evening""").toDS().toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.med_ner.events_clinical").predict("""The patient presented to the emergency room last evening""")
Results
+----+-----------------------------+---------+---------+-----------------+
| | chunk | begin | end | entity |
+====+=============================+=========+=========+=================+
| 0 | presented | 12 | 20 | EVIDENTIAL |
+----+-----------------------------+---------+---------+-----------------+
| 1 | the emergency room | 25 | 42 | CLINICAL_DEPT |
+----+-----------------------------+---------+---------+-----------------+
| 2 | last evening | 44 | 55 | DATE |
+----+-----------------------------+---------+---------+-----------------+
Model Information
Model Name: | ner_events_clinical |
Compatibility: | Healthcare NLP 3.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Data Source
Trained on augmented version of i2b2 dataset with clinical_embeddings
.
Benchmarking
label tp fp fn prec rec f1
I-TIME 82 12 45 0.87234 0.645669 0.742081
I-TREATMENT 2580 439 535 0.854588 0.82825 0.841213
B-OCCURRENCE 1548 680 945 0.694793 0.620939 0.655793
I-DURATION 366 183 99 0.666667 0.787097 0.721893
B-DATE 847 151 138 0.848697 0.859898 0.854261
I-DATE 921 191 196 0.828237 0.82453 0.82638
B-ADMISSION 105 102 15 0.507246 0.875 0.642202
I-PROBLEM 5238 902 823 0.853094 0.864214 0.858618
B-CLINICAL_DEPT 613 130 119 0.825034 0.837432 0.831187
B-TIME 36 8 24 0.818182 0.6 0.692308
I-CLINICAL_DEPT 1273 210 137 0.858395 0.902837 0.880055
B-PROBLEM 3717 608 591 0.859422 0.862813 0.861114
I-TEST 2304 384 361 0.857143 0.86454 0.860826
B-TEST 1870 372 300 0.834077 0.861751 0.847688
B-TREATMENT 2767 437 513 0.863608 0.843598 0.853485
B-EVIDENTIAL 394 109 201 0.7833 0.662185 0.717669
B-DURATION 236 119 105 0.664789 0.692082 0.678161
B-FREQUENCY 117 20 79 0.854015 0.596939 0.702703
Macro-average 25806 5821 6342 0.735285 0.677034 0.704959
Micro-average 25806 5821 6342 0.815948 0.802725 0.809283