Relation extraction between dates and clinical entities

Description

Relation extraction between date and related other entities. 1 : Shows there is a relation between the date entity and other clinical entities, 0 : Shows there is no relation between the date entity and other clinical entities.

Predicted Entities

0, 1

Open in Colab Copy S3 URI

How to use

In the table below, re_date_clinical RE model, its labels, optimal NER model, and meaningful relation pairs are illustrated.

RE MODEL RE MODEL LABES NER MODEL RE PAIRS
re_date_clinical 0,1 ner_jsl [“date-admission_discharge”,
“admission_discharge-date”,
“date-alcohol”,
“alcohol-date”,
“date-allergen”,
“allergen-date”,
“date-bmi”,
“bmi-date”,
“date-birth_entity”,
“birth_entity-date”,
“date-blood_pressure”,
“blood_pressure-date”,
“date-cerebrovascular_disease”,
“cerebrovascular_disease-date”,
“date-clinical_dept”,
“clinical_dept-date”,
“date-communicable_disease”,
“communicable_disease-date”,
“date-death_entity”,
“death_entity-date”,
“date-diabetes”,
“diabetes-date”,
“date-diet”,
“diet-date”,
“date-disease_syndrome_disorder”,
“disease_syndrome_disorder-date”,
“date-drug_brandname”,
“drug_brandname-date”,
“date-drug_ingredient”,
“drug_ingredient-date”,
“date-ekg_findings”,
“ekg_findings-date”,
“date-external_body_part_or_region”,
“external_body_part_or_region-date”,
“date-fetus_newborn”,
“fetus_newborn-date”,
“date-hdl”,
“hdl-date”,
“date-heart_disease”,
“heart_disease-date”,
“date-height”,
“height-date”,
“date-hyperlipidemia”,
“hyperlipidemia-date”,
“date-hypertension”,
“hypertension-date”,
“date-imagingfindings”,
“imagingfindings-date”,
“date-imaging_technique”,
“imaging_technique-date”,
“date-injury_or_poisoning”,
“injury_or_poisoning-date”,
“date-internal_organ_or_component”,
“internal_organ_or_component-date”,
“date-kidney_disease”,
“kidney_disease-date”,
“date-ldl”,
“ldl-date”,
“date-modifier”,
“modifier-date”,
“date-o2_saturation”,
“o2_saturation-date”,
“date-obesity”,
“obesity-date”,
“date-oncological”,
“oncological-date”,
“date-overweight”,
“overweight-date”,
“date-oxygen_therapy”,
“oxygen_therapy-date”,
“date-pregnancy”,
“pregnancy-date”,
“date-procedure”,
“procedure-date”,
“date-psychological_condition”,
“psychological_condition-date”,
“date-pulse”,
“pulse-date”,
“date-respiration”,
“respiration-date”,
“date-smoking”,
“smoking-date”,
“date-substance”,
“substance-date”,
“date-substance_quantity”,
“substance_quantity-date”,
“date-symptom”,
“symptom-date”,
“date-temperature”,
“temperature-date”,
“date-test”,
“test-date”,
“date-test_result”,
“test_result-date”,
“date-total_cholesterol”,
“total_cholesterol-date”,
“date-treatment”,
“treatment-date”,
“date-triglycerides”,
“triglycerides-date”,
“date-vs_finding”,
“vs_finding-date”,
“date-vaccine”,
“vaccine-date”,
“date-vital_signs_header”,
“vital_signs_header-date”,
“date-weight”,
“weight-date”,
“time-admission_discharge”,
“admission_discharge-time”,
“time-alcohol”,
“alcohol-time”,
“time-allergen”,
“allergen-time”,
“time-bmi”,
“bmi-time”,
“time-birth_entity”,
“birth_entity-time”,
“time-blood_pressure”,
“blood_pressure-time”,
“time-cerebrovascular_disease”,
“cerebrovascular_disease-time”,
“time-clinical_dept”,
“clinical_dept-time”,
“time-communicable_disease”,
“communicable_disease-time”,
“time-death_entity”,
“death_entity-time”,
“time-diabetes”,
“diabetes-time”,
“time-diet”,
“diet-time”,
“time-disease_syndrome_disorder”,
“disease_syndrome_disorder-time”,
“time-drug_brandname”,
“drug_brandname-time”,
“time-drug_ingredient”,
“drug_ingredient-time”,
“time-ekg_findings”,
“ekg_findings-time”,
“time-external_body_part_or_region”,
“external_body_part_or_region-time”,
“time-fetus_newborn”,
“fetus_newborn-time”,
“time-hdl”,
“hdl-time”,
“time-heart_disease”,
“heart_disease-time”,
“time-height”,
“height-time”,
“time-hyperlipidemia”,
“hyperlipidemia-time”,
“time-hypertension”,
“hypertension-time”,
“time-imagingfindings”,
“imagingfindings-time”,
“time-imaging_technique”,
“imaging_technique-time”,
“time-injury_or_poisoning”,
“injury_or_poisoning-time”,
“time-internal_organ_or_component”,
“internal_organ_or_component-time”,
“time-kidney_disease”,
“kidney_disease-time”,
“time-ldl”,
“ldl-time”,
“time-modifier”,
“modifier-time”,
“time-o2_saturation”,
“o2_saturation-time”,
“time-obesity”,
“obesity-time”,
“time-oncological”,
“oncological-time”,
“time-overweight”,
“overweight-time”,
“time-oxygen_therapy”,
“oxygen_therapy-time”,
“time-pregnancy”,
“pregnancy-time”,
“time-procedure”,
“procedure-time”,
“time-psychological_condition”,
“psychological_condition-time”,
“time-pulse”,
“pulse-time”,
“time-respiration”,
“respiration-time”,
“time-smoking”,
“smoking-time”,
“time-substance”,
“substance-time”,
“time-substance_quantity”,
“substance_quantity-time”,
“time-symptom”,
“symptom-time”,
“time-temperature”,
“temperature-time”,
“time-test”,
“test-time”,
“time-test_result”,
“test_result-time”,
“time-total_cholesterol”,
“total_cholesterol-time”,
“time-treatment”,
“treatment-time”,
“time-triglycerides”,
“triglycerides-time”,
“time-vs_finding”,
“vs_finding-time”,
“time-vaccine”,
“vaccine-time”,
“time-vital_signs_header”,
“vital_signs_header-time”,
“time-weight”,
“weight-time”,
“relativedate-admission_discharge”,
“admission_discharge-relativedate”,
“relativedate-alcohol”,
“alcohol-relativedate”,
“relativedate-allergen”,
“allergen-relativedate”,
“relativedate-bmi”,
“bmi-relativedate”,
“relativedate-birth_entity”,
“birth_entity-relativedate”,
“relativedate-blood_pressure”,
“blood_pressure-relativedate”,
“relativedate-cerebrovascular_disease”,
“cerebrovascular_disease-relativedate”,
“relativedate-clinical_dept”,
“clinical_dept-relativedate”,
“relativedate-communicable_disease”,
“communicable_disease-relativedate”,
“relativedate-death_entity”,
“death_entity-relativedate”,
“relativedate-diabetes”,
“diabetes-relativedate”,
“relativedate-diet”,
“diet-relativedate”,
“relativedate-disease_syndrome_disorder”,
“disease_syndrome_disorder-relativedate”,
“relativedate-drug_brandname”,
“drug_brandname-relativedate”,
“relativedate-drug_ingredient”,
“drug_ingredient-relativedate”,
“relativedate-ekg_findings”,
“ekg_findings-relativedate”,
“relativedate-external_body_part_or_region”,
“external_body_part_or_region-relativedate”,
“relativedate-fetus_newborn”,
“fetus_newborn-relativedate”,
“relativedate-hdl”,
“hdl-relativedate”,
“relativedate-heart_disease”,
“heart_disease-relativedate”,
“relativedate-height”,
“height-relativedate”,
“relativedate-hyperlipidemia”,
“hyperlipidemia-relativedate”,
“relativedate-hypertension”,
“hypertension-relativedate”,
“relativedate-imagingfindings”,
“imagingfindings-relativedate”,
“relativedate-imaging_technique”,
“imaging_technique-relativedate”,
“relativedate-injury_or_poisoning”,
“injury_or_poisoning-relativedate”,
“relativedate-internal_organ_or_component”,
“internal_organ_or_component-relativedate”,
“relativedate-kidney_disease”,
“kidney_disease-relativedate”,
“relativedate-ldl”,
“ldl-relativedate”,
“relativedate-modifier”,
“modifier-relativedate”,
“relativedate-o2_saturation”,
“o2_saturation-relativedate”,
“relativedate-obesity”,
“obesity-relativedate”,
“relativedate-oncological”,
“oncological-relativedate”,
“relativedate-overweight”,
“overweight-relativedate”,
“relativedate-oxygen_therapy”,
“oxygen_therapy-relativedate”,
“relativedate-pregnancy”,
“pregnancy-relativedate”,
“relativedate-procedure”,
“procedure-relativedate”,
“relativedate-psychological_condition”,
“psychological_condition-relativedate”,
“relativedate-pulse”,
“pulse-relativedate”,
“relativedate-respiration”,
“respiration-relativedate”,
“relativedate-smoking”,
“smoking-relativedate”,
“relativedate-substance”,
“substance-relativedate”,
“relativedate-substance_quantity”,
“substance_quantity-relativedate”,
“relativedate-symptom”,
“symptom-relativedate”,
“relativedate-temperature”,
“temperature-relativedate”,
“relativedate-test”,
“test-relativedate”,
“relativedate-test_result”,
“test_result-relativedate”,
“relativedate-total_cholesterol”,
“total_cholesterol-relativedate”,
“relativedate-treatment”,
“treatment-relativedate”,
“relativedate-triglycerides”,
“triglycerides-relativedate”,
“relativedate-vs_finding”,
“vs_finding-relativedate”,
“relativedate-vaccine”,
“vaccine-relativedate”,
“relativedate-vital_signs_header”,
“vital_signs_header-relativedate”,
“relativedate-weight”,
“weight-relativedate”,
“relativetime-admission_discharge”,
“admission_discharge-relativetime”,
“relativetime-alcohol”,
“alcohol-relativetime”,
“relativetime-allergen”,
“allergen-relativetime”,
“relativetime-bmi”,
“bmi-relativetime”,
“relativetime-birth_entity”,
“birth_entity-relativetime”,
“relativetime-blood_pressure”,
“blood_pressure-relativetime”,
“relativetime-cerebrovascular_disease”,
“cerebrovascular_disease-relativetime”,
“relativetime-clinical_dept”,
“clinical_dept-relativetime”,
“relativetime-communicable_disease”,
“communicable_disease-relativetime”,
“relativetime-death_entity”,
“death_entity-relativetime”,
“relativetime-diabetes”,
“diabetes-relativetime”,
“relativetime-diet”,
“diet-relativetime”,
“relativetime-disease_syndrome_disorder”,
“disease_syndrome_disorder-relativetime”,
“relativetime-drug_brandname”,
“drug_brandname-relativetime”,
“relativetime-drug_ingredient”,
“drug_ingredient-relativetime”,
“relativetime-ekg_findings”,
“ekg_findings-relativetime”,
“relativetime-external_body_part_or_region”,
“external_body_part_or_region-relativetime”,
“relativetime-fetus_newborn”,
“fetus_newborn-relativetime”,
“relativetime-hdl”,
“hdl-relativetime”,
“relativetime-heart_disease”,
“heart_disease-relativetime”,
“relativetime-height”,
“height-relativetime”,
“relativetime-hyperlipidemia”,
“hyperlipidemia-relativetime”,
“relativetime-hypertension”,
“hypertension-relativetime”,
“relativetime-imagingfindings”,
“imagingfindings-relativetime”,
“relativetime-imaging_technique”,
“imaging_technique-relativetime”,
“relativetime-injury_or_poisoning”,
“injury_or_poisoning-relativetime”,
“relativetime-internal_organ_or_component”,
“internal_organ_or_component-relativetime”,
“relativetime-kidney_disease”,
“kidney_disease-relativetime”,
“relativetime-ldl”,
“ldl-relativetime”,
“relativetime-modifier”,
“modifier-relativetime”,
“relativetime-o2_saturation”,
“o2_saturation-relativetime”,
“relativetime-obesity”,
“obesity-relativetime”,
“relativetime-oncological”,
“oncological-relativetime”,
“relativetime-overweight”,
“overweight-relativetime”,
“relativetime-oxygen_therapy”,
“oxygen_therapy-relativetime”,
“relativetime-pregnancy”,
“pregnancy-relativetime”,
“relativetime-procedure”,
“procedure-relativetime”,
“relativetime-psychological_condition”,
“psychological_condition-relativetime”,
“relativetime-pulse”,
“pulse-relativetime”,
“relativetime-respiration”,
“respiration-relativetime”,
“relativetime-smoking”,
“smoking-relativetime”,
“relativetime-substance”,
“substance-relativetime”,
“relativetime-substance_quantity”,
“substance_quantity-relativetime”,
“relativetime-symptom”,
“symptom-relativetime”,
“relativetime-temperature”,
“temperature-relativetime”,
“relativetime-test”,
“test-relativetime”,
“relativetime-test_result”,
“test_result-relativetime”,
“relativetime-total_cholesterol”,
“total_cholesterol-relativetime”,
“relativetime-treatment”,
“treatment-relativetime”,
“relativetime-triglycerides”,
“triglycerides-relativetime”,
“relativetime-vs_finding”,
“vs_finding-relativetime”,
“relativetime-vaccine”,
“vaccine-relativetime”,
“relativetime-vital_signs_header”,
“vital_signs_header-relativetime”,
“relativetime-weight”,
“weight-relativetime”]

Use as part of an nlp pipeline with the following stages: DocumentAssembler, SentenceDetector, Tokenizer, PerceptronModel, DependencyParserModel, WordEmbeddingsModel, NerDLModel, NerConverter, RelationExtractionModel.

documenter = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentencer = SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentences")

tokenizer = Tokenizer()\
    .setInputCols(["sentences"])\
    .setOutputCol("tokens")
  
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("embeddings")

pos_tagger = PerceptronModel()\
    .pretrained("pos_clinical", "en", "clinical/models") \
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("pos_tags")

ner_tagger = MedicalNerModel().pretrained("jsl_ner_wip_greedy_clinical","en","clinical/models")\
    .setInputCols("sentences", "tokens", "embeddings")\
    .setOutputCol("ner_tags")

ner_chunker = NerConverterInternal()\
    .setInputCols(["sentences", "tokens", "ner_tags"])\
    .setOutputCol("ner_chunks")

dependency_parser = DependencyParserModel()\
    .pretrained("dependency_conllu", "en")\
    .setInputCols(["sentences", "pos_tags", "tokens"])\
    .setOutputCol("dependencies")

re_model = RelationExtractionModel().pretrained("re_date_clinical", "en", "clinical/models")\
    .setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
    .setOutputCol("relations")\
    .setMaxSyntacticDistance(3)\
    .setPredictionThreshold(0.9)\
    .setRelationPairs(["test-date", "symptom-date"]) # Possible relation pairs. Default: All Relations.

nlp_pipeline = Pipeline(stages=[documenter, sentencer,tokenizer, word_embeddings, pos_tagger, ner_tagger, ner_chunker, dependency_parser, re_model])

light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

annotations = light_pipeline.fullAnnotate('''This 73 y/o patient had CT on 1/12/95, with progressive memory and cognitive decline since 8/11/94.''')
val documenter = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val sentencer = new SentenceDetector()
    .setInputCols("document")
    .setOutputCol("sentences")

val tokenizer = new Tokenizer()
    .setInputCols("sentences")
    .setOutputCol("tokens")
  
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
    .setInputCols(Array("sentences", "tokens"))
    .setOutputCol("embeddings")

val pos_tagger = PerceptronModel()
    .pretrained("pos_clinical", "en", "clinical/models")
    .setInputCols(Array("sentences", "tokens"))
    .setOutputCol("pos_tags")

val ner_tagger = MedicalNerModel().pretrained("jsl_ner_wip_greedy_clinical","en","clinical/models")
    .setInputCols(Array("sentences", "tokens", "embeddings"))
    .setOutputCol("ner_tags")

val ner_chunker = new NerConverterInternal()
    .setInputCols(Array("sentences", "tokens", "ner_tags"))
    .setOutputCol("ner_chunks")

val dependency_parser = DependencyParserModel()
    .pretrained("dependency_conllu", "en")
    .setInputCols(Array("sentences", "pos_tags", "tokens"))
    .setOutputCol("dependencies")

val re_model = RelationExtractionModel()
    .pretrained("re_date", "en", "clinical/models")
    .setInputCols(Array("embeddings", "pos_tags", "ner_chunks", "dependencies"))
    .setOutputCol("relations")
    .setMaxSyntacticDistance(3) #default: 0 
    .setPredictionThreshold(0.9) #default: 0.5 
    .setRelationPairs(Array("test-date", "symptom-date")) # Possible relation pairs. Default: All Relations.

val nlpPipeline = new Pipeline().setStages(Array(documenter, sentencer,tokenizer, word_embeddings, pos_tagger, ner_tagger, ner_chunker, dependency_parser, re_model))

val result = pipeline.fit(Seq.empty[String]).transform(data)

val annotations = light_pipeline.fullAnnotate("""This 73 y/o patient had CT on 1/12/95, with progressive memory and cognitive decline since 8/11/94.""")

Results

|   | relations | entity1 | entity1_begin | entity1_end | chunk1                                   | entity2 | entity2_end | entity2_end | chunk2  | confidence |
|---|-----------|---------|---------------|-------------|------------------------------------------|---------|-------------|-------------|---------|------------|
| 0 | 1         | Test    | 24            | 25          | CT                                       | Date    | 31          | 37          | 1/12/95 | 1.0        |
| 1 | 1         | Symptom | 45            | 84          | progressive memory and cognitive decline | Date    | 92          | 98          | 8/11/94 | 1.0        |

Model Information

Model Name: re_date_clinical
Type: re
Compatibility: Spark NLP 2.7.1+
License: Licensed
Edition: Official
Input Labels: [embeddings, pos_tags, train_ner_chunks, dependencies]
Output Labels: [relations]
Language: en
Dependencies: embeddings_clinical

Data Source

Trained on data gathered and manually annotated by John Snow Labs

Benchmarking

label recall  precision  f1   
0     0.74    0.71       0.72
1     0.94    0.95       0.94