Detect Clinical Relations

Description

Relation Extraction model based on syntactic features using deep learning. Models the set of clinical relations defined in the 2010 i2b2 relation challenge.

Included Relations

TrIP: A certain treatment has improved or cured a medical problem (eg, ‘infection resolved with antibiotic course’) TrWP: A patient’s medical problem has deteriorated or worsened because of or in spite of a treatment being administered (eg, ‘the tumor was growing despite the drain’) TrCP: A treatment caused a medical problem (eg, ‘penicillin causes a rash’) TrAP: A treatment administered for a medical problem (eg, ‘Dexamphetamine for narcolepsy’) TrNAP: The administration of a treatment was avoided because of a medical problem (eg, ‘Ralafen which is contra-indicated because of ulcers’) TeRP: A test has revealed some medical problem (eg, ‘an echocardiogram revealed a pericardial effusion’) TeCP: A test was performed to investigate a medical problem (eg, ‘chest x-ray done to rule out pneumonia’) PIP: Two problems are related to each other (eg, ‘Azotemia presumed secondary to sepsis’)

Live Demo Open in Colab Download

How to use

...

reModel = RelationExtractionModel.pretrained("re_clinical","en","clinical/models")\
    .setInputCols(["word_embeddings","chunk","pos","dependency"])\
    .setOutput("relations")

pipeline = Pipeline(stages=[documenter, sentencer, tokenizer, words_embedder, pos_tagger, ner_tagger, ner_chunker, dependency_parser, reModel])
model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

results = LightPipeline(model).fullAnnotate("""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus ( T2DM ), one prior episode of HTG-induced pancreatitis three years prior to presentation,  associated with an acute hepatitis , and obesity with a body mass index ( BMI ) of 33.5 kg/m2 , presented with a one-week history of polyuria , polydipsia , poor appetite , and vomiting . Two weeks prior to presentation , she was treated with a five-day course of amoxicillin for a respiratory tract infection . She was on metformin , glipizide , and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG . She had been on dapagliflozin for six months at the time of presentation. Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness , guarding , or rigidity . Pertinent laboratory findings on admission were : serum glucose 111 mg/dl , bicarbonate 18 mmol/l , anion gap 20 , creatinine 0.4 mg/dL , triglycerides 508 mg/dL , total cholesterol 122 mg/dL , glycated hemoglobin ( HbA1c ) 10% , and venous pH 7.27 . Serum lipase was normal at 43 U/L . Serum acetone levels could not be assessed as blood samples kept hemolyzing due to significant lipemia . The patient was initially admitted for starvation ketosis , as she reported poor oral intake for three days prior to admission . However , serum chemistry obtained six hours after presentation revealed her glucose was 186 mg/dL , the anion gap was still elevated at 21 , serum bicarbonate was 16 mmol/L , triglyceride level peaked at 2050 mg/dL , and lipase was 52 U/L . The β-hydroxybutyrate level was obtained and found to be elevated at 5.29 mmol/L - the original sample was centrifuged and the chylomicron layer removed prior to analysis due to interference from turbidity caused by lipemia again . The patient was treated with an insulin drip for euDKA and HTG with a reduction in the anion gap to 13 and triglycerides to 1400 mg/dL , within 24 hours . Her euDKA was thought to be precipitated by her respiratory tract infection in the setting of SGLT2 inhibitor use . The patient was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night , 12 units of insulin lispro with meals , and metformin 1000 mg two times a day . It was determined that all SGLT2 inhibitors should be discontinued indefinitely . She had close follow-up with endocrinology post discharge .""")
...

val reModel = RelationExtractionModel.pretrained("re_clinical","en","clinical/models")
    .setInputCols("word_embeddings","chunk","pos","dependency")
    .setOutputCol("relations")
    
val pipeline = new Pipeline().setStages(Array(documenter, sentencer, tokenizer, words_embedder, pos_tagger, ner_tagger, ner_chunker, dependency_parser, reModel))

val result = pipeline.fit(Seq.empty["A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus ( T2DM ), one prior episode of HTG-induced pancreatitis three years prior to presentation,  associated with an acute hepatitis , and obesity with a body mass index ( BMI ) of 33.5 kg/m2 , presented with a one-week history of polyuria , polydipsia , poor appetite , and vomiting . Two weeks prior to presentation , she was treated with a five-day course of amoxicillin for a respiratory tract infection . She was on metformin , glipizide , and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG . She had been on dapagliflozin for six months at the time of presentation. Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness , guarding , or rigidity . Pertinent laboratory findings on admission were : serum glucose 111 mg/dl , bicarbonate 18 mmol/l , anion gap 20 , creatinine 0.4 mg/dL , triglycerides 508 mg/dL , total cholesterol 122 mg/dL , glycated hemoglobin ( HbA1c ) 10% , and venous pH 7.27 . Serum lipase was normal at 43 U/L . Serum acetone levels could not be assessed as blood samples kept hemolyzing due to significant lipemia . The patient was initially admitted for starvation ketosis , as she reported poor oral intake for three days prior to admission . However , serum chemistry obtained six hours after presentation revealed her glucose was 186 mg/dL , the anion gap was still elevated at 21 , serum bicarbonate was 16 mmol/L , triglyceride level peaked at 2050 mg/dL , and lipase was 52 U/L . The β-hydroxybutyrate level was obtained and found to be elevated at 5.29 mmol/L - the original sample was centrifuged and the chylomicron layer removed prior to analysis due to interference from turbidity caused by lipemia again . The patient was treated with an insulin drip for euDKA and HTG with a reduction in the anion gap to 13 and triglycerides to 1400 mg/dL , within 24 hours . Her euDKA was thought to be precipitated by her respiratory tract infection in the setting of SGLT2 inhibitor use . The patient was seen by the endocrinology service and she was discharged on 40 units of insulin glargine at night , 12 units of insulin lispro with meals , and metformin 1000 mg two times a day . It was determined that all SGLT2 inhibitors should be discontinued indefinitely . She had close follow-up with endocrinology post discharge ."].toDS.toDF("text")).transform(data)

Results

| relation | entity1   | chunk1                          | entity2    | chunk2                        | confidence |
|----------|-----------|---------------------------------|------------|-------------------------------|------------|
| TrAP     | PROBLEM   | T2DM                            | TREATMENT  | atorvastatin                  | 0.99955326 |
| TrWP     | TEST      | blood samples                   | PROBLEM    | significant lipemia           | 0.99998724 |
| TeRP     | TEST      | the anion gap                   | PROBLEM    | still elevated                | 0.9965193  |
| TrAP     | TEST      | analysis                        | PROBLEM    | interference from turbidity   | 0.9676019  |
| TrWP     | TREATMENT | an insulin drip                 | PROBLEM    | a reduction in the anion gap  | 0.94099987 |
| TeRP     | PROBLEM   | a reduction in the anion gap    | TEST       | triglycerides                 | 0.9956793  |
| TeRP     | PROBLEM   | her respiratory tract infection | TREATMENT  | SGLT2 inhibitor               | 0.997498   |

Model Information

Name: re_clinical  
Type: RelationExtractionModel  
Compatibility: Spark NLP 2.5.5+  
License: Licensed  
Edition: Official  
Input labels: [word_embeddings, chunk, pos, dependency]  
Output labels: [category]  
Language: en  
Case sensitive: False  
Dependencies: embeddings_clinical  

Data Source

Trained on data gathered and manually annotated by John Snow Labs https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/

Benchmarking

The model has been validated against the posology dataset described in “Magge, Scotch, & Gonzalez-Hernandez, 2018”.

       label  precision  recall  f1-score
 
           O       0.96    0.93      0.94
        TeRP       0.91    0.94      0.92
         PIP       0.86    0.92      0.89
        TrAP       0.81    0.92      0.86
        TrCP       0.56    0.55      0.55
        TeCP       0.57    0.49      0.53
        TrWP       0.29    0.02      0.03
       TrNAP       0.47    0.31      0.37
        TrIP       0.46    0.28      0.35

    accuracy                         0.88
   macro avg       0.65    0.59      0.60
weighted avg       0.87    0.88      0.87 

Magge, Scotch, Gonzalez-Hernandez (2018) collapsed DRUG-FORM and DRUG-ROUTE into a single relation.