Detect Assertion Status (assertion_dl_healthcare) - supports confidence scores

Description

Assign assertion status to clinical entities extracted by NER based on their context in the text.

Predicted Entities

absent, present, conditional, associated_with_someone_else, hypothetical, possible.

Live Demo Open in Colab Copy S3 URI

How to use

documentAssembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentenceDetector = SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_healthcare_100d", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

clinical_ner = MedicalNerModel.pretrained("ner_healthcare", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_converter = NerConverter() \
    .setInputCols(["sentence", "token", "ner"]) \
    .setOutputCol("ner_chunk")

clinical_assertion = AssertionDLModel.pretrained("assertion_dl_healthcare", "en", "clinical/models") \
    .setInputCols(["sentence", "ner_chunk", "embeddings"]) \
    .setOutputCol("assertion")

nlpPipeline = Pipeline(stages=[
    documentAssembler, 
    sentenceDetector,
    tokenizer,
    word_embeddings,
    clinical_ner,
    ner_converter,
    clinical_assertion
    ])

data = spark.createDataFrame([["""Patient with severe fever and sore throat. He shows no stomach pain and he maintained on an epidural and PCA for pain control. He also became short of breath with climbing a flight of stairs. After CT, lung tumor located at the right lower lobe. Father with Alzheimer."""]]).toDF("text")

result = nlpPipeline.fit(data).transform(data)
val documentAssembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val sentenceDetector = new SentenceDetector()
    .setInputCols("document") 
    .setOutputCol("sentence") 

val tokenizer = new Tokenizer()
    .setInputCols("sentence")
    .setOutputCol("token")

val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_healthcare_100d", "en", "clinical/models")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")

clinical_ner = MedicalNerModel.pretrained("ner_healthcare", "en", "clinical/models") 
    .setInputCols(Array("sentence", "token", "embeddings")) 
    .setOutputCol("ner")

val ner_converter = new NerConverter()
    .setInputCols(Array("sentence","token","ner"))
    .setOutputCol("ner_chunk")

val clinical_assertion = AssertionDLModel.pretrained("assertion_dl_healthcare","en", "clinical/models") 
    .setInputCols(Array("sentence", "ner_chunk", "embeddings")) 
    .setOutputCol("assertion")

val pipeline =  new Pipeline().setStages(Array(documentAssembler, 
                                               sentenceDetector, 
                                               tokenizer, 
                                               word_embeddings, 
                                               clinical_ner, 
                                               ner_converter, 
                                               clinical_assertion))

val data = Seq("Patient with severe fever and sore throat. He shows no stomach pain and he maintained on an epidural and PCA for pain control. He also became short of breath with climbing a flight of stairs. After CT, lung tumor located at the right lower lobe. Father with Alzheimer.").toDF("text")

val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.assert.healthcare").predict("""Patient with severe fever and sore throat. He shows no stomach pain and he maintained on an epidural and PCA for pain control. He also became short of breath with climbing a flight of stairs. After CT, lung tumor located at the right lower lobe. Father with Alzheimer.""")

Results

+---------------+---------+----------------------------+
|chunk          |ner_label|assertion                   |
+---------------+---------+----------------------------+
|severe fever   |PROBLEM  |present                     |
|sore throat    |PROBLEM  |present                     |
|stomach pain   |PROBLEM  |absent                      |
|an epidural    |TREATMENT|present                     |
|PCA            |TREATMENT|present                     |
|pain control   |TREATMENT|present                     |
|short of breath|PROBLEM  |conditional                 |
|CT             |TEST     |present                     |
|lung tumor     |PROBLEM  |present                     |
|Alzheimer      |PROBLEM  |associated_with_someone_else|
+---------------+---------+----------------------------+

Model Information

Model Name: assertion_dl_healthcare
Compatibility: Spark NLP 2.7.2+
License: Licensed
Edition: Official
Input Labels: [document, chunk, embeddings]
Output Labels: [assertion]
Language: en

Data Source

Trained with augmented version of i2b2 dataset.

Benchmarking

label                            tp    fp    fn      prec       rec        f1
absent                          726    86    98  0.894089  0.881068  0.887531
present                        2544   232   119  0.916427  0.955314  0.935466
conditional                      18    13    37  0.580645  0.327273  0.418605
associated_with_someone_else     40     5     9  0.888889  0.816327  0.851064
hypothetical                    132    13    26  0.910345  0.835443  0.871287
possible                         96    45   105  0.680851  0.477612  0.561404
Macro-average                  3556   394   394  0.811874  0.715506  0.76065 
Micro-average                  3556   394   394  0.900253  0.900253  0.900253