Description
This model considers 10 tokens on the left and 10 tokens on the right side of the clinical entities extracted by NER models and assigns their assertion status based on their context in this scope.
Predicted Entities
hypothetical
, associated_with_someone_else
, conditional
, possible
, absent
, present
How to use
document = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
token = Tokenizer()\
.setInputCols(['sentence'])\
.setOutputCol('token')
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_clinical", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = NerConverter() \
.setInputCols(["sentence", "token", "ner"]) \
.setOutputCol("ner_chunk")
clinical_assertion = AssertionDLModel.pretrained("assertion_dl_scope_L10R10", "en", "clinical/models") \
.setInputCols(["sentence", "ner_chunk", "embeddings"]) \
.setOutputCol("assertion")
nlpPipeline = Pipeline(stages=[document,sentenceDetector, token, word_embeddings,clinical_ner,ner_converter, clinical_assertion])
text = "Patient with severe fever and sore throat. He shows no stomach pain and he maintained on an epidural and PCA for pain control. He also became short of breath with climbing a flight of stairs. After CT, lung tumor located at the right lower lobe. Father with Alzheimer."
data = spark.createDataFrame([[text]]).toDF("text")
result = nlpPipeline.fit(data).transform(data)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentenceDetector = new SentenceDetector()
.setInputCols(Array("document"))
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("document")
.setOutputCol("token")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val clinical_ner = MedicalNerModel.pretrained("ner_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = NerConverter()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val clinical_assertion = AssertionDLModel.pretrained("assertion_dl_scope_L10R10", "en", "clinical/models")
.setInputCols(Array("sentence", "ner_chunk", "embeddings"))
.setOutputCol("assertion")
val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, tokenizer, word_embeddings, clinical_ner, ner_converter, clinical_assertion))
val data = Seq("Patient with severe fever and sore throat. He shows no stomach pain and he maintained on an epidural and PCA for pain control. He also became short of breath with climbing a flight of stairs. After CT, lung tumor located at the right lower lobe. Father with Alzheimer.").toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.assert.l10r10").predict("""Patient with severe fever and sore throat. He shows no stomach pain and he maintained on an epidural and PCA for pain control. He also became short of breath with climbing a flight of stairs. After CT, lung tumor located at the right lower lobe. Father with Alzheimer.""")
Results
+---------------+---------+----------------------------+
|chunk |entity |assertion |
+---------------+---------+----------------------------+
|severe fever |PROBLEM |present |
|sore throat |PROBLEM |present |
|stomach pain |PROBLEM |absent |
|an epidural |TREATMENT|present |
|PCA |TREATMENT|present |
|pain control |PROBLEM |present |
|short of breath|PROBLEM |conditional |
|CT |TEST |present |
|lung tumor |PROBLEM |present |
|Alzheimer |PROBLEM |associated_with_someone_else|
+---------------+---------+----------------------------+
Model Information
Model Name: | assertion_dl_scope_L10R10 |
Compatibility: | Healthcare NLP 3.4.2+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document, chunk, embeddings] |
Output Labels: | [assertion] |
Language: | en |
Size: | 1.4 MB |
References
Trained on augmented version of 2010 i2b2/VA dataset on concepts, assertions, and relations in clinical text with embeddings_clinical
.
https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/
Benchmarking
label tp fp fn prec rec f1
absent 812 48 71 0.94418603 0.9195923 0.93172693
present 2463 127 141 0.9509652 0.9458525 0.948402
conditional 25 19 28 0.5681818 0.4716981 0.5154639
associated_with_someone_else 36 7 9 0.8372093 0.8 0.8181818
hypothetical 147 31 28 0.8258427 0.84 0.8328612
possible 159 87 42 0.64634144 0.7910448 0.71140933
Macro-average - - - 0.79545444 0.7946979 0.795076
Micro-average - - - 0.91946477 0.9194648 0.91946477