Relation Extraction between Tests, Results, and Dates

Description

Relation extraction between lab test names, their findings, measurements, results, and date.

Predicted Entities

is_finding_of, is_result_of, is_date_of, O.

Live Demo Open in Colab Copy S3 URI

How to use

Use as part of an nlp pipeline with the following stages: DocumentAssembler, SentenceDetector, Tokenizer, PerceptronModel, DependencyParserModel, WordEmbeddingsModel, NerDLModel, NerConverter, RelationExtractionModel

In the table below, re_test_result_date RE model, its labels, optimal NER model, and meaningful relation pairs are illustrated.

RE MODEL RE MODEL LABES NER MODEL RE PAIRS
re_test_result_date is_finding_of,
is_result_of,
is_date_of,
O
ner_jsl [‘test-test_result’, ‘test_result-test’,
‘date-admission_discharge’,’admission_discharge-date’,
‘date-alcohol’, ‘alcohol-date’,
‘date-bmi’, ‘bmi-date’,
‘date-birth_entity’, ‘birth_entity-date’,
‘date-blood_pressure’,’blood_pressure-date’,
‘date-cerebrovascular_disease’, ‘cerebrovascular_disease-date’,
‘date-communicable_disease’, ‘communicable_disease-date’,
‘date-death_entity’, ‘death_entity-date’,
‘date-diabetes’, ‘diabetes-date’,
‘date-diet’, ‘diet-date’,
‘date-disease_syndrome_disorder’, disease_syndrome_disorder-date’,
‘date-drug_brandname’, ‘drug_brandname-date’,
‘date-drug_ingredient’, ‘drug_ingredient-date’,
‘date-ekg_findings’,’ekg_findings-date’,
‘date-employment’, ‘employment-date’,
‘date-fetus_newborn’, ‘fetus_newborn-date’,
‘date-hdl’, ‘hdl-date’,
‘date-heart_disease’, ‘heart_disease-date’,
‘date-height’, ‘height-date’,
‘date-hyperlipidemia’, ‘hyperlipidemia-date’,
‘date-hypertension’, ‘hypertension-date’,
‘date-imagingfindings’, ‘imagingfindings-date’,
‘date-imaging_technique’, ‘imaging_technique-date’,
‘date-injury_or_poisoning’, ‘injury_or_poisoning-date’,
‘date-internal_organ_or_component’, ‘internal_organ_or_component-date’,
‘date-kidney_disease’, ‘kidney_disease-date’,
‘date-ldl’, ‘ldl-date’,
‘date-labour_delivery’, ‘labour_delivery-date’,
‘date-o2_saturation’, ‘o2_saturation-date’,
‘date-obesity’, ‘obesity-date’,
‘date-oncological’, ‘oncological-date’,
‘date-overweight’, ‘overweight-date’,
‘date-oxygen_therapy’, ‘oxygen_therapy-date’,
‘date-pregnancy’, ‘pregnancy-date’,
‘date-procedure’, ‘procedure-date’,
‘date-psychological_condition’, ‘psychological_condition-date’,
‘date-pulse’, ‘pulse-date’,
‘date-relationship_status’, ‘relationship_status-date’,
‘date-relativedate’, ‘relativedate-date’,
‘date-relativetime’, ‘relativetime-date’,
‘date-respiration’, ‘respiration-date’,
‘date-route’, ‘route-date’,
‘date-sexually_active_or_sexual_orientation’,’sexually_active_or_sexual_orientation-date’,
‘date-smoking’, ‘smoking-date’,
‘date-substance’, ‘substance-date’,
‘date-substance_quantity’, substance_quantity-date’,
‘date-symptom’, ‘symptom-date’,
‘date-temperature’, ‘temperature-date’,
‘date-test’, ‘test-date’,
‘date-test_result’, ‘test_result-date’,
‘date-total_cholesterol’, ‘total_cholesterol-date’,
‘date-treatment’, ‘treatment-date’,
‘date-triglycerides’, ‘triglycerides-date’,
‘date-vs_finding’, ‘vs_finding-date’,
‘date-vaccine’, ‘vaccine-date’,
‘date-weight’, ‘weight-date’,
‘relativedate-admission_discharge’,’admission_discharge-relativedate’,
‘relativedate-alcohol’, ‘alcohol-relativedate’,
‘relativedate-bmi’, ‘bmi-relativedate’,
‘relativedate-birth_entity’, ‘birth_entity-relativedate’,
‘relativedate-blood_pressure’, ‘blood_pressure-relativedate’,
‘relativedate-cerebrovascular_disease’, cerebrovascular_disease-relativedate’,
‘relativedate-communicable_disease’, ‘communicable_disease-relativedate’,
‘relativedate-death_entity’, ‘death_entity-relativedate’,
‘relativedate-diabetes’, ‘diabetes-relativedate’,
‘relativedate-diet’, ‘diet-relativedate’,
‘relativedate-disease_syndrome_disorder’, ‘disease_syndrome_disorder-relativedate’,
‘relativedate-drug_brandname’, ‘drug_brandname-relativedate’,
‘relativedate-drug_ingredient’, ‘drug_ingredient-relativedate’,
‘relativedate-ekg_findings’, ‘ekg_findings-relativedate’,
‘relativedate-employment’, ‘employment-relativedate’,
‘relativedate-fetus_newborn’, ‘fetus_newborn-relativedate’,
‘relativedate-hdl’, ‘hdl-relativedate’,
‘relativedate-heart_disease’, ‘heart_disease-relativedate’,
‘relativedate-height’, ‘height-relativedate’,
‘relativedate-hyperlipidemia’, ‘hyperlipidemia-relativedate’,
‘relativedate-hypertension’, ‘hypertension-relativedate’,
‘relativedate-imagingfindings’, ‘imagingfindings-relativedate’,
‘relativedate-imaging_technique’, ‘imaging_technique-relativedate’,
‘relativedate-injury_or_poisoning’, ‘injury_or_poisoning-relativedate’,
‘relativedate-internal_organ_or_component’, internal_organ_or_component-relativedate’,
‘relativedate-kidney_disease’, ‘kidney_disease-relativedate’,
‘relativedate-ldl’, ‘ldl-relativedate’,
‘relativedate-labour_delivery’, ‘labour_delivery-relativedate’,
‘relativedate-o2_saturation’, ‘o2_saturation-relativedate’,
‘relativedate-obesity’, ‘obesity-relativedate’,
‘relativedate-oncological’, ‘oncological-relativedate’,
‘relativedate-overweight’, ‘overweight-relativedate’,
‘relativedate-oxygen_therapy’, ‘oxygen_therapy-relativedate’,
‘relativedate-pregnancy’, ‘pregnancy-relativedate’,
‘relativedate-procedure’, ‘procedure-relativedate’,
‘relativedate-psychological_condition’, ‘psychological_condition-relativedate’,
‘relativedate-pulse’, ‘pulse-relativedate’,
‘relativedate-relationship_status’, ‘relationship_status-relativedate’,
‘relativedate-relativedate’, ‘relativedate-relativetime’,
‘relativedate-relativetime’, ‘relativetime-relativedate’,
‘relativedate-respiration’, ‘respiration-relativedate’,
‘relativedate-route’, ‘route-relativedate’,
‘relativedate-sexually_active_or_sexual_orientation’, ‘sexually_active_or_sexual_orientation-relativedate’,
‘relativedate-smoking’, ‘smoking-relativedate’,
‘relativedate-substance’, ‘substance-relativedate’,
‘relativedate-substance_quantity’, ‘substance_quantity-relativedate’,
‘relativedate-symptom’, symptom-relativedate’,
‘relativedate-temperature’, ‘temperature-relativedate’,
‘relativedate-test’, ‘test-relativedate’,
‘relativedate-test_result’, ‘test_result-relativedate’,
‘relativedate-total_cholesterol’, ‘total_cholesterol-relativedate’,
‘relativedate-treatment’, ‘treatment-relativedate’,
‘relativedate-triglycerides’, ‘triglycerides-relativedate’,
‘relativedate-vs_finding’, ‘vs_finding-relativedate’,
‘relativedate-vaccine’,’vaccine-relativedate’,
‘relativedate-weight’, ‘weight-relativedate’,
‘test-cerebrovascular_disease’, ‘cerebrovascular_disease-test’,
‘test-communicable_disease’, ‘communicable_disease-test’,
‘test-diabetes’, ‘diabetes-test’,
‘test-disease_syndrome_disorder’, disease_syndrome_disorder-test’,
‘test-ekg_findings’, ‘ekg_findings-test’,
‘test-heart_disease’, ‘heart_disease-test’,
‘test-hyperlipidemia’,’hyperlipidemia-test’,
‘test-hypertension’, ‘hypertension-test’,
‘test-imagingfindings’, ‘imagingfindings-test’,
‘test-injury_or_poisoning’, ‘injury_or_poisoning-test’,
‘test-kidney_disease’, ‘kidney_disease-test’,
‘test-oncological’, ‘oncological-test’,
‘test-vs_finding’, ‘vs_finding-test’,
‘weight-overweight’, ‘overweight-weight’,
‘weight-obesity’, ‘obesity-weight’,
‘bmi-obesity’, ‘obesity-bmi’,
‘bmi-overweight’, ‘overweight-bmi’,
‘ekg_findings-heart_disease’, ‘heart_disease-ekg_findings’,
‘imaging_technique-imagingfindings’, ‘imagingfindings-imaging_technique’]
document_assembler = DocumentAssembler()\
	.setInputCol("text")\
	.setOutputCol("document")

sentencer = SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentences")

tokenizer = Tokenizer()\
    .setInputCols(["sentences"])\
    .setOutputCol("tokens")

word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("embeddings")

pos_tagger = PerceptronModel()\
    .pretrained("pos_clinical", "en", "clinical/models") \
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("pos_tags")

ner_tagger = MedicalNerModel().pretrained('ner_jsl',"en","clinical/models")\
    .setInputCols("sentences", "tokens", "embeddings")\
    .setOutputCol("ner_tags")

ner_chunker = NerConverterInternal()\
    .setInputCols(["sentences", "tokens", "ner_tags"])\
    .setOutputCol("ner_chunks")

dependency_parser = DependencyParserModel()\
    .pretrained("dependency_conllu", "en")\
    .setInputCols(["sentences", "pos_tags", "tokens"])\
    .setOutputCol("dependencies")

re_model = RelationExtractionModel().pretrained("re_test_result_date", "en", 'clinical/models')\
    .setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
    .setOutputCol("relations")\
    .setMaxSyntacticDistance(4)\
    .setPredictionThreshold(0.9)


nlp_pipeline = Pipeline(stages=[document_assembler,
                                sentencer,
                                tokenizer,
                                word_embeddings,
                                pos_tagger,
                                ner_tagger,
                                ner_chunker,
                                dependency_parser,
                                re_model])

light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

results = light_pipeline.fullAnnotate("""He was advised chest X-ray or CT scan after checking his SpO2 which was <= 93%""")
val document_assembler = new DocumentAssembler()
	.setInputCol("text")
	.setOutputCol("document")
	
val sentencer = new SentenceDetector()
	.setInputCols(Array("document"))
	.setOutputCol("sentences")
	
val tokenizer = new Tokenizer()
	.setInputCols(Array("sentences"))
	.setOutputCol("tokens")
	
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")
	.setInputCols(Array("sentences","tokens"))
	.setOutputCol("embeddings")
	
val pos_tagger = PerceptronModel
	.pretrained("pos_clinical","en","clinical/models")
	.setInputCols(Array("sentences","tokens"))
	.setOutputCol("pos_tags")
	
val ner_tagger = MedicalNerModel
	.pretrained("ner_jsl","en","clinical/models")
	.setInputCols("sentences","tokens","embeddings")
	.setOutputCol("ner_tags")
	
val ner_chunker = new NerConverterInternal()
	.setInputCols(Array("sentences","tokens","ner_tags"))
	.setOutputCol("ner_chunks")
	
val dependency_parser = DependencyParserModel
	.pretrained("dependency_conllu","en")
	.setInputCols(Array("sentences","pos_tags","tokens"))
	.setOutputCol("dependencies")
	
val re_model = RelationExtractionModel.pretrained("re_test_result_date","en","clinical/models")
	.setInputCols(Array("embeddings","pos_tags","ner_chunks","dependencies"))
	.setOutputCol("relations")
	.setMaxSyntacticDistance(4)
	.setPredictionThreshold(0.9)
	
val nlp_pipeline = new Pipeline().setStages(Array(
    document_assembler, 
    sentencer, 
    tokenizer, 
    word_embeddings, 
    pos_tagger, 
    ner_tagger, 
    ner_chunker, 
    dependency_parser, 
    re_model))
	
val light_pipeline = new LightPipeline(nlp_pipeline.fit(Seq("") .toDF("text")))
	
val results = light_pipeline.fullAnnotate("""He was advised chest X-ray or CT scan after checking his SpO2 which was <= 93%""")
import nlu
nlu.load("en.relation.test_result_date").predict("""He was advised chest X-ray or CT scan after checking his SpO2 which was <= 93%""")

Results

|   |      relation | entity1 | entity1_begin | entity1_end |      chunk1 | entity2 | entity2_begin | entity2_end |      chunk2 | confidence |
|--:|--------------:|--------:|--------------:|------------:|------------:|--------:|--------------:|------------:|------------:|-----------:|
| 0 | is_finding_of |  Gender |             0 |           1 |          He |    Test |            15 |          25 | chest X-ray |    0.99916 |
| 1 | is_finding_of |  Gender |             0 |           1 |          He |    Test |            30 |          36 |     CT scan |    1.00000 |
| 2 | is_finding_of |    Test |            15 |          25 | chest X-ray |    Test |            30 |          36 |     CT scan |    1.00000 |
| 3 | is_finding_of |    Test |            30 |          36 |     CT scan |  Gender |            53 |          55 |         his |    1.00000 |
| 4 | is_finding_of |    Test |            30 |          36 |     CT scan |    Test |            57 |          60 |        SpO2 |    1.00000 |
| 5 |    is_date_of |  Gender |            53 |          55 |         his |    Test |            57 |          60 |        SpO2 |    0.98956 |

Model Information

Model Name: re_test_result_date
Type: re
Compatibility: Healthcare NLP 2.7.4+
License: Licensed
Edition: Official
Input Labels: [embeddings, pos_tags, train_ner_chunks, dependencies]
Output Labels: [relations]
Language: en

Data Source

Trained on internal data.

Benchmarking

| relation        | prec |
|-----------------|------|
| O               | 0.77 |
| is_finding_of   | 0.80 |
| is_result_of    | 0.96 |
| is_date_of      | 0.94 |