Detect Assertion Status (assertion_dl_healthcare)

Description

Assertion of Clinical Entities based on Deep Learning.

Predicted Entities

hypothetical, present, absent, possible, conditional, associated_with_someone_else.

Open in Colab Copy S3 URI

How to use

Use as part of an nlp pipeline with the following stages: DocumentAssembler, SentenceDetector, Tokenizer, WordEmbeddingsModel, NerDLModel, AssertionDLModel.

documentAssembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentenceDetector = SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_healthcare_100d", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

clinical_ner = MedicalNerModel.pretrained("ner_healthcare", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_converter = NerConverter() \
    .setInputCols(["sentence", "token", "ner"]) \
    .setOutputCol("ner_chunk")

clinical_assertion = AssertionDLModel.pretrained("assertion_dl_healthcare", "en", "clinical/models") \
    .setInputCols(["sentence", "ner_chunk", "embeddings"]) \
    .setOutputCol("assertion")

nlpPipeline = Pipeline(stages=[
    documentAssembler, 
    sentenceDetector,
    tokenizer,
    word_embeddings,
    clinical_ner,
    ner_converter,
    clinical_assertion
    ])

model = nlpPipeline.fit(spark.createDataFrame([['Patient has a headache for the last 2 weeks and appears anxious when she walks fast. No alopecia noted. She denies pain']]).toDF("text"))
results = model.transform(data)
val documentAssembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val sentenceDetector = new SentenceDetector()
    .setInputCols("document")
    .setOutputCol("sentence")

val tokenizer = new Tokenizer()
    .setInputCols("sentence")
    .setOutputCol("token")

val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_healthcare_100d", "en", "clinical/models")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")

val clinical_ner = MedicalNerModel.pretrained("ner_healthcare", "en", "clinical/models")
    .setInputCols(Array("sentence", "token", "embeddings")) 
    .setOutputCol("ner")

val ner_converter = new NerConverter()
    .setInputCols(Array("sentence", "token", "ner"))
    .setOutputCol("ner_chunk")

val clinical_assertion = AssertionDLModel.pretrained("assertion_dl_healthcare","en","clinical/models")
    .setInputCols("document","ner_chunk","embeddings")
    .setOutputCol("assertion")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter, clinical_assertion))

val data = Seq("Patient has a headache for the last 2 weeks and appears anxious when she walks fast. No alopecia noted. She denies pain").toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.assert.healthcare").predict("""Patient has a headache for the last 2 weeks and appears anxious when she walks fast. No alopecia noted. She denies pain""")

Result


|   | chunks     | entities| assertion   |
|--:|-----------:|--------:|------------:|
| 0 | a headache | PROBLEM | present     |
| 1 | anxious    | PROBLEM | conditional |
| 2 | alopecia   | PROBLEM | absent      |
| 3 | pain       | PROBLEM | absent      |

Model Information

Name: assertion_dl_healthcare  
Type: AssertionDLModel  
Compatibility: 2.6.0  
License: Licensed  
Edition: Official  
Input labels: [document, chunk, word_embeddings]  
Output labels: [assertion]  
Language: en  
Case sensitive: False  
Dependencies: embeddings_healthcare_100d  

Data Source

Trained with augmented version of 2010 i2b2/VA dataset on concepts, assertions, and relations in clinical text with embeddings_clinical. https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/

Benchmarking

label  prec    rec     f1

absent  0.9289  0.9466  0.9377
present  0.9433  0.9559  0.9496
conditional  0.6888  0.5     0.5794
associated_with_someone_else  0.9285  0.9122  0.9203
hypothetical  0.9079  0.8654  0.8862
possible  0.7     0.6146  0.6545

macro-avg  0.8496  0.7991  0.8236
micro-avg  0.9245  0.9245  0.9245