Detect Assertion Status (assertion_dl_biobert_scope_L10R10)

Description

This model is trained using biobert_pubmed_base_cased BERT token embeddings. It considers 10 tokens on the left and 10 tokens on the right side of the clinical entities extracted by NER models and assigns their assertion status based on their context in this scope.

Predicted Entities

present, absent, possible, conditional, associated_with_someone_else, hypothetical

Open in Colab Download Copy S3 URI

How to use

document = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")


sentenceDetector = SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentence")


token = Tokenizer()\
    .setInputCols(['sentence'])\
    .setOutputCol('token')


embeddings = BertEmbeddings.pretrained("biobert_pubmed_base_cased")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")


clinical_ner = MedicalNerModel.pretrained("ner_clinical_biobert", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")


ner_converter = NerConverter() \
    .setInputCols(["sentence", "token", "ner"]) \
    .setOutputCol("ner_chunk")


clinical_assertion = AssertionDLModel.pretrained("assertion_dl_biobert_scope_L10R10","en", "clinical/models") \
    .setInputCols(["sentence", "ner_chunk", "embeddings"]) \
    .setOutputCol("assertion")
    
nlpPipeline = Pipeline(stages=[document,
                               sentenceDetector,
                               token, 
                               embeddings, 
                               clinical_ner,
                               ner_converter,  
                               clinical_assertion])


text = "Patient with severe fever and sore throat. He shows no stomach pain and he maintained on an epidural and PCA for pain control. He also became short of breath with climbing a flight of stairs. After CT, lung tumor located at the right lower lobe. Father with Alzheimer."


data = spark.createDataFrame([[text]]).toDF("text")


result = nlpPipeline.fit(data).transform(data)

val documentAssembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val sentenceDetector = SentenceDetectorDLModel.pretrained()
    .setInputCols(Array("document"))
    .setOutputCol("sentence") 

val tokenizer = new Tokenizer()
    .setInputCols(Array("sentence"))
    .setOutputCol("token")

val embeddings = BertEmbeddings.pretrained("biobert_pubmed_base_cased")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")

clinical_ner = MedicalNerModel.pretrained("ner_clinical_biobert", "en", "clinical/models") 
    .setInputCols(Array("sentence", "token", "embeddings")) 
    .setOutputCol("ner")

val ner_converter = new NerConverter()
    .setInputCols(Array("sentence","token","ner"))
    .setOutputCol("ner_chunk")

val clinical_assertion = AssertionDLModel.pretrained("assertion_dl_biobert_scope_L10R10","en", "clinical/models") 
    .setInputCols(Array("sentence", "ner_chunk", "embeddings")) 
    .setOutputCol("assertion")

val pipeline =  new Pipeline().setStages(Array(documentAssembler, sentenceDetector, tokenizer, embeddings, clinical_ner, ner_converter, clinical_assertion))

val data = Seq("Patient with severe fever and sore throat. He shows no stomach pain and he maintained on an epidural and PCA for pain control. He also became short of breath with climbing a flight of stairs. After CT, lung tumor located at the right lower lobe. Father with Alzheimer.").toDF("text")

val result = pipeline.fit(data).transform(data)

import nlu
nlu.load("en.assert.biobert_l10210").predict("""Patient with severe fever and sore throat. He shows no stomach pain and he maintained on an epidural and PCA for pain control. He also became short of breath with climbing a flight of stairs. After CT, lung tumor located at the right lower lobe. Father with Alzheimer.""")

Results

+---------------+---------+----------------------------+
|chunk          |ner_label|assertion                   |
+---------------+---------+----------------------------+
|severe fever   |PROBLEM  |present                     |
|sore throat    |PROBLEM  |present                     |
|stomach pain   |PROBLEM  |absent                      |
|an epidural    |TREATMENT|present                     |
|PCA            |TREATMENT|present                     |
|pain control   |TREATMENT|present                     |
|short of breath|PROBLEM  |conditional                 |
|CT             |TEST     |present                     |
|lung tumor     |PROBLEM  |present                     |
|Alzheimer      |PROBLEM  |associated_with_someone_else|
+---------------+---------+----------------------------+

Model Information

Model Name:	assertion_dl_biobert_scope_L10R10
Compatibility:	Healthcare NLP 3.4.2+
License:	Licensed
Edition:	Official
Input Labels:	[document, chunk, embeddings]
Output Labels:	[assertion]
Language:	en
Size:	3.2 MB

References

Trained on augmented version of 2010 i2b2/VA dataset on concepts, assertions, and relations in clinical text with biobert_pubmed_base_cased. https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/

Benchmarking

label                         tp   fp   fn   prec       rec        f1       
absent                        839  89   44   0.9040948  0.9501699  0.9265599
present                       2436 127  168  0.9504487  0.9354839  0.9429069
conditional                   29   21   24   0.58       0.5471698  0.5631067
associated_with_someone_else  39   11   6    0.78       0.8666670  0.8210527
hypothetical                  164  44   11   0.7884616  0.9371429  0.8563969
possible                      126  36   75   0.7777778  0.6268657  0.6942149
Macro-average                 3633 328  328  0.7967971  0.8105832  0.8036310
Micro-average                 3633 328  328  0.9171926  0.9171926  0.9171926

PREVIOUSPipeline to Detect Clinical Entities (BertForTokenClassifier)

NEXTDetect PHI for Deidentification purposes (Italian, reduced entities)