Description
Assign assertion status to clinical entities extracted by NER based on their context in the text.
Predicted Entities
absent
, present
, conditional
, associated_with_someone_else
, hypothetical
, possible
.
Live Demo Open in Colab Copy S3 URI
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_healthcare_100d", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_healthcare", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = NerConverter() \
.setInputCols(["sentence", "token", "ner"]) \
.setOutputCol("ner_chunk")
clinical_assertion = AssertionDLModel.pretrained("assertion_dl_healthcare", "en", "clinical/models") \
.setInputCols(["sentence", "ner_chunk", "embeddings"]) \
.setOutputCol("assertion")
nlpPipeline = Pipeline(stages=[
documentAssembler,
sentenceDetector,
tokenizer,
word_embeddings,
clinical_ner,
ner_converter,
clinical_assertion
])
data = spark.createDataFrame([["""Patient with severe fever and sore throat. He shows no stomach pain and he maintained on an epidural and PCA for pain control. He also became short of breath with climbing a flight of stairs. After CT, lung tumor located at the right lower lobe. Father with Alzheimer."""]]).toDF("text")
result = nlpPipeline.fit(data).transform(data)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentenceDetector = new SentenceDetector()
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_healthcare_100d", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_healthcare", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverter()
.setInputCols(Array("sentence","token","ner"))
.setOutputCol("ner_chunk")
val clinical_assertion = AssertionDLModel.pretrained("assertion_dl_healthcare","en", "clinical/models")
.setInputCols(Array("sentence", "ner_chunk", "embeddings"))
.setOutputCol("assertion")
val pipeline = new Pipeline().setStages(Array(documentAssembler,
sentenceDetector,
tokenizer,
word_embeddings,
clinical_ner,
ner_converter,
clinical_assertion))
val data = Seq("Patient with severe fever and sore throat. He shows no stomach pain and he maintained on an epidural and PCA for pain control. He also became short of breath with climbing a flight of stairs. After CT, lung tumor located at the right lower lobe. Father with Alzheimer.").toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.assert.healthcare").predict("""Patient with severe fever and sore throat. He shows no stomach pain and he maintained on an epidural and PCA for pain control. He also became short of breath with climbing a flight of stairs. After CT, lung tumor located at the right lower lobe. Father with Alzheimer.""")
Results
+---------------+---------+----------------------------+
|chunk |ner_label|assertion |
+---------------+---------+----------------------------+
|severe fever |PROBLEM |present |
|sore throat |PROBLEM |present |
|stomach pain |PROBLEM |absent |
|an epidural |TREATMENT|present |
|PCA |TREATMENT|present |
|pain control |TREATMENT|present |
|short of breath|PROBLEM |conditional |
|CT |TEST |present |
|lung tumor |PROBLEM |present |
|Alzheimer |PROBLEM |associated_with_someone_else|
+---------------+---------+----------------------------+
Model Information
Model Name: | assertion_dl_healthcare |
Compatibility: | Spark NLP 2.7.2+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document, chunk, embeddings] |
Output Labels: | [assertion] |
Language: | en |
Data Source
Trained with augmented version of i2b2 dataset.
Benchmarking
label tp fp fn prec rec f1
absent 726 86 98 0.894089 0.881068 0.887531
present 2544 232 119 0.916427 0.955314 0.935466
conditional 18 13 37 0.580645 0.327273 0.418605
associated_with_someone_else 40 5 9 0.888889 0.816327 0.851064
hypothetical 132 13 26 0.910345 0.835443 0.871287
possible 96 45 105 0.680851 0.477612 0.561404
Macro-average 3556 394 394 0.811874 0.715506 0.76065
Micro-average 3556 394 394 0.900253 0.900253 0.900253