RelationExtractionModel Clinical


Models the set of clinical relations defined in the 2010 i2b2 relation challenge.

Included Relations

TrIP: A certain treatment has improved or cured a medical problem (eg, ‘infection resolved with antibiotic course’)

TrWP: A patient’s medical problem has deteriorated or worsened because of or in spite of a treatment being administered (eg, ‘the tumor was growing despite the drain’)

TrCP: A treatment caused a medical problem (eg, ‘penicillin causes a rash’)

TrAP: A treatment administered for a medical problem (eg, ‘Dexamphetamine for narcolepsy’)

TrNAP: The administration of a treatment was avoided because of a medical problem (eg, ‘Ralafen which is contra-indicated because of ulcers’)

TeRP: A test has revealed some medical problem (eg, ‘an echocardiogram revealed a pericardial effusion’)

TeCP: A test was performed to investigate a medical problem (eg, ‘chest x-ray done to rule out pneumonia’)

PIP: Two problems are related to each other (eg, ‘Azotemia presumed secondary to sepsis’)

Open in Colab Download

How to use

Use as part of an nlp pipeline with the following stages: DocumentAssembler, SentenceDetector, Tokenizer, WordEmbeddingsModel, PerceptronModel, NerDLModel, NerConverter, DependencyParserModel, RelationExtractionModel.

The precision of the RE model is controlled by “setMaxSyntacticDistance(4)”, which sets the maximum syntactic distance between named entities to 4. A larger value will improve recall at the expense at lower precision.

clinical_re_Model = RelationExtractionModel()\
    .pretrained("re_clinical", "en", 'clinical/models')\
    .setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
    .setRelationPairs(["problem-test", "problem-treatment"]) # Possible relation pairs. Default is all relations.

loaded_pipeline = Pipeline(stages=[clinical_re_Model])

empty_data = spark.createDataFrame([[""]]).toDF("text")

loaded_model =

loaded_lmodel = LightPipeline(loaded_model)

annotations = loaded_lmodel.fullAnnotate(text)

rel_df = get_relations_df (annotations)

Model Parameters

Model Name: re_clinical_en_2.5.5_2.4
Type: re
Compatibility: Spark NLP 2.5.5
Edition: Healthcare
License: Licensed
Input Labels: [embeddings, pos_tags, ner_chunks, dependencies]
Output Labels: [relations]
Language: [en]
Case sensitive: false

Dataset used for training

Trained on augmented 2010 i2b2 challenge data with ‘clinical_embeddings’.


The output is a dataframe with a Relation column and a Confidence column. image