RelationExtractionModel Clinical


Models the set of clinical relations defined in the 2010 i2b2 relation challenge.

Included Relations

TrIP: A certain treatment has improved or cured a medical problem (eg, ‘infection resolved with antibiotic course’)

TrWP: A patient’s medical problem has deteriorated or worsened because of or in spite of a treatment being administered (eg, ‘the tumor was growing despite the drain’)

TrCP: A treatment caused a medical problem (eg, ‘penicillin causes a rash’)

TrAP: A treatment administered for a medical problem (eg, ‘Dexamphetamine for narcolepsy’)

TrNAP: The administration of a treatment was avoided because of a medical problem (eg, ‘Ralafen which is contra-indicated because of ulcers’)

TeRP: A test has revealed some medical problem (eg, ‘an echocardiogram revealed a pericardial effusion’)

TeCP: A test was performed to investigate a medical problem (eg, ‘chest x-ray done to rule out pneumonia’)

PIP: Two problems are related to each other (eg, ‘Azotemia presumed secondary to sepsis’)

How to use

Use as part of an nlp pipeline with the following stages: DocumentAssembler, SentenceDetector, Tokenizer, WordEmbeddingsModel, PerceptronModel, NerDLModel, NerConverter, DependencyParserModel, RelationExtractionModel.

The precision of the RE model is controlled by “setMaxSyntacticDistance(4)”, which sets the maximum syntactic distance between named entities to 4. A larger value will improve recall at the expense at lower precision.

clinical_re_Model = RelationExtractionModel()\
    .pretrained("re_clinical", "en", 'clinical/models')\
    .setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
    .setRelationPairs(["problem-test", "problem-treatment"]) # Possible relation pairs. Default is all relations.

loaded_pipeline = Pipeline(stages=[clinical_re_Model])

empty_data = spark.createDataFrame([[""]]).toDF("text")

loaded_model =

loaded_lmodel = LightPipeline(loaded_model)

annotations = loaded_lmodel.fullAnnotate(text)

rel_df = get_relations_df (annotations)

Model Parameters

Model Name: re_clinical_en_2.5.5_2.4
Type: re
Compatibility: Spark NLP 2.5.5
Edition: Healthcare
License: Licensed
Input Labels: [embeddings, pos_tags, ner_chunks, dependencies]
Output Labels: [relations]
Language: [en]
Case sensitive: false

Dataset used for training

Trained on augmented 2010 i2b2 challenge data with ‘clinical_embeddings’.


The output is a dataframe with a Relation column and a Confidence column. image