Description
Relation extraction between lab test names, their findings, measurements, results, and date.
Predicted Entities
is_finding_of
, is_result_of
, is_date_of
, O
.
Live Demo Open in Colab Copy S3 URI
How to use
Use as part of an nlp pipeline with the following stages: DocumentAssembler, SentenceDetector, Tokenizer, PerceptronModel, DependencyParserModel, WordEmbeddingsModel, NerDLModel, NerConverter, RelationExtractionModel
In the table below, re_test_result_date
RE model, its labels, optimal NER model, and meaningful relation pairs are illustrated.
RE MODEL | RE MODEL LABES | NER MODEL | RE PAIRS |
---|---|---|---|
re_test_result_date | is_finding_of, is_result_of, is_date_of, O |
ner_jsl | [‘test-test_result’, ‘test_result-test’, ‘date-admission_discharge’,’admission_discharge-date’, ‘date-alcohol’, ‘alcohol-date’, ‘date-bmi’, ‘bmi-date’, ‘date-birth_entity’, ‘birth_entity-date’, ‘date-blood_pressure’,’blood_pressure-date’, ‘date-cerebrovascular_disease’, ‘cerebrovascular_disease-date’, ‘date-communicable_disease’, ‘communicable_disease-date’, ‘date-death_entity’, ‘death_entity-date’, ‘date-diabetes’, ‘diabetes-date’, ‘date-diet’, ‘diet-date’, ‘date-disease_syndrome_disorder’, disease_syndrome_disorder-date’, ‘date-drug_brandname’, ‘drug_brandname-date’, ‘date-drug_ingredient’, ‘drug_ingredient-date’, ‘date-ekg_findings’,’ekg_findings-date’, ‘date-employment’, ‘employment-date’, ‘date-fetus_newborn’, ‘fetus_newborn-date’, ‘date-hdl’, ‘hdl-date’, ‘date-heart_disease’, ‘heart_disease-date’, ‘date-height’, ‘height-date’, ‘date-hyperlipidemia’, ‘hyperlipidemia-date’, ‘date-hypertension’, ‘hypertension-date’, ‘date-imagingfindings’, ‘imagingfindings-date’, ‘date-imaging_technique’, ‘imaging_technique-date’, ‘date-injury_or_poisoning’, ‘injury_or_poisoning-date’, ‘date-internal_organ_or_component’, ‘internal_organ_or_component-date’, ‘date-kidney_disease’, ‘kidney_disease-date’, ‘date-ldl’, ‘ldl-date’, ‘date-labour_delivery’, ‘labour_delivery-date’, ‘date-o2_saturation’, ‘o2_saturation-date’, ‘date-obesity’, ‘obesity-date’, ‘date-oncological’, ‘oncological-date’, ‘date-overweight’, ‘overweight-date’, ‘date-oxygen_therapy’, ‘oxygen_therapy-date’, ‘date-pregnancy’, ‘pregnancy-date’, ‘date-procedure’, ‘procedure-date’, ‘date-psychological_condition’, ‘psychological_condition-date’, ‘date-pulse’, ‘pulse-date’, ‘date-relationship_status’, ‘relationship_status-date’, ‘date-relativedate’, ‘relativedate-date’, ‘date-relativetime’, ‘relativetime-date’, ‘date-respiration’, ‘respiration-date’, ‘date-route’, ‘route-date’, ‘date-sexually_active_or_sexual_orientation’,’sexually_active_or_sexual_orientation-date’, ‘date-smoking’, ‘smoking-date’, ‘date-substance’, ‘substance-date’, ‘date-substance_quantity’, substance_quantity-date’, ‘date-symptom’, ‘symptom-date’, ‘date-temperature’, ‘temperature-date’, ‘date-test’, ‘test-date’, ‘date-test_result’, ‘test_result-date’, ‘date-total_cholesterol’, ‘total_cholesterol-date’, ‘date-treatment’, ‘treatment-date’, ‘date-triglycerides’, ‘triglycerides-date’, ‘date-vs_finding’, ‘vs_finding-date’, ‘date-vaccine’, ‘vaccine-date’, ‘date-weight’, ‘weight-date’, ‘relativedate-admission_discharge’,’admission_discharge-relativedate’, ‘relativedate-alcohol’, ‘alcohol-relativedate’, ‘relativedate-bmi’, ‘bmi-relativedate’, ‘relativedate-birth_entity’, ‘birth_entity-relativedate’, ‘relativedate-blood_pressure’, ‘blood_pressure-relativedate’, ‘relativedate-cerebrovascular_disease’, cerebrovascular_disease-relativedate’, ‘relativedate-communicable_disease’, ‘communicable_disease-relativedate’, ‘relativedate-death_entity’, ‘death_entity-relativedate’, ‘relativedate-diabetes’, ‘diabetes-relativedate’, ‘relativedate-diet’, ‘diet-relativedate’, ‘relativedate-disease_syndrome_disorder’, ‘disease_syndrome_disorder-relativedate’, ‘relativedate-drug_brandname’, ‘drug_brandname-relativedate’, ‘relativedate-drug_ingredient’, ‘drug_ingredient-relativedate’, ‘relativedate-ekg_findings’, ‘ekg_findings-relativedate’, ‘relativedate-employment’, ‘employment-relativedate’, ‘relativedate-fetus_newborn’, ‘fetus_newborn-relativedate’, ‘relativedate-hdl’, ‘hdl-relativedate’, ‘relativedate-heart_disease’, ‘heart_disease-relativedate’, ‘relativedate-height’, ‘height-relativedate’, ‘relativedate-hyperlipidemia’, ‘hyperlipidemia-relativedate’, ‘relativedate-hypertension’, ‘hypertension-relativedate’, ‘relativedate-imagingfindings’, ‘imagingfindings-relativedate’, ‘relativedate-imaging_technique’, ‘imaging_technique-relativedate’, ‘relativedate-injury_or_poisoning’, ‘injury_or_poisoning-relativedate’, ‘relativedate-internal_organ_or_component’, internal_organ_or_component-relativedate’, ‘relativedate-kidney_disease’, ‘kidney_disease-relativedate’, ‘relativedate-ldl’, ‘ldl-relativedate’, ‘relativedate-labour_delivery’, ‘labour_delivery-relativedate’, ‘relativedate-o2_saturation’, ‘o2_saturation-relativedate’, ‘relativedate-obesity’, ‘obesity-relativedate’, ‘relativedate-oncological’, ‘oncological-relativedate’, ‘relativedate-overweight’, ‘overweight-relativedate’, ‘relativedate-oxygen_therapy’, ‘oxygen_therapy-relativedate’, ‘relativedate-pregnancy’, ‘pregnancy-relativedate’, ‘relativedate-procedure’, ‘procedure-relativedate’, ‘relativedate-psychological_condition’, ‘psychological_condition-relativedate’, ‘relativedate-pulse’, ‘pulse-relativedate’, ‘relativedate-relationship_status’, ‘relationship_status-relativedate’, ‘relativedate-relativedate’, ‘relativedate-relativetime’, ‘relativedate-relativetime’, ‘relativetime-relativedate’, ‘relativedate-respiration’, ‘respiration-relativedate’, ‘relativedate-route’, ‘route-relativedate’, ‘relativedate-sexually_active_or_sexual_orientation’, ‘sexually_active_or_sexual_orientation-relativedate’, ‘relativedate-smoking’, ‘smoking-relativedate’, ‘relativedate-substance’, ‘substance-relativedate’, ‘relativedate-substance_quantity’, ‘substance_quantity-relativedate’, ‘relativedate-symptom’, symptom-relativedate’, ‘relativedate-temperature’, ‘temperature-relativedate’, ‘relativedate-test’, ‘test-relativedate’, ‘relativedate-test_result’, ‘test_result-relativedate’, ‘relativedate-total_cholesterol’, ‘total_cholesterol-relativedate’, ‘relativedate-treatment’, ‘treatment-relativedate’, ‘relativedate-triglycerides’, ‘triglycerides-relativedate’, ‘relativedate-vs_finding’, ‘vs_finding-relativedate’, ‘relativedate-vaccine’,’vaccine-relativedate’, ‘relativedate-weight’, ‘weight-relativedate’, ‘test-cerebrovascular_disease’, ‘cerebrovascular_disease-test’, ‘test-communicable_disease’, ‘communicable_disease-test’, ‘test-diabetes’, ‘diabetes-test’, ‘test-disease_syndrome_disorder’, disease_syndrome_disorder-test’, ‘test-ekg_findings’, ‘ekg_findings-test’, ‘test-heart_disease’, ‘heart_disease-test’, ‘test-hyperlipidemia’,’hyperlipidemia-test’, ‘test-hypertension’, ‘hypertension-test’, ‘test-imagingfindings’, ‘imagingfindings-test’, ‘test-injury_or_poisoning’, ‘injury_or_poisoning-test’, ‘test-kidney_disease’, ‘kidney_disease-test’, ‘test-oncological’, ‘oncological-test’, ‘test-vs_finding’, ‘vs_finding-test’, ‘weight-overweight’, ‘overweight-weight’, ‘weight-obesity’, ‘obesity-weight’, ‘bmi-obesity’, ‘obesity-bmi’, ‘bmi-overweight’, ‘overweight-bmi’, ‘ekg_findings-heart_disease’, ‘heart_disease-ekg_findings’, ‘imaging_technique-imagingfindings’, ‘imagingfindings-imaging_technique’] |
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentencer = SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentences")
tokenizer = Tokenizer()\
.setInputCols(["sentences"])\
.setOutputCol("tokens")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentences", "tokens"])\
.setOutputCol("embeddings")
pos_tagger = PerceptronModel()\
.pretrained("pos_clinical", "en", "clinical/models") \
.setInputCols(["sentences", "tokens"])\
.setOutputCol("pos_tags")
ner_tagger = MedicalNerModel().pretrained('ner_jsl',"en","clinical/models")\
.setInputCols("sentences", "tokens", "embeddings")\
.setOutputCol("ner_tags")
ner_chunker = NerConverterInternal()\
.setInputCols(["sentences", "tokens", "ner_tags"])\
.setOutputCol("ner_chunks")
dependency_parser = DependencyParserModel()\
.pretrained("dependency_conllu", "en")\
.setInputCols(["sentences", "pos_tags", "tokens"])\
.setOutputCol("dependencies")
re_model = RelationExtractionModel().pretrained("re_test_result_date", "en", 'clinical/models')\
.setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
.setOutputCol("relations")\
.setMaxSyntacticDistance(4)\
.setPredictionThreshold(0.9)
nlp_pipeline = Pipeline(stages=[document_assembler,
sentencer,
tokenizer,
word_embeddings,
pos_tagger,
ner_tagger,
ner_chunker,
dependency_parser,
re_model])
light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))
results = light_pipeline.fullAnnotate("""He was advised chest X-ray or CT scan after checking his SpO2 which was <= 93%""")
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentencer = new SentenceDetector()
.setInputCols(Array("document"))
.setOutputCol("sentences")
val tokenizer = new Tokenizer()
.setInputCols(Array("sentences"))
.setOutputCol("tokens")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")
.setInputCols(Array("sentences","tokens"))
.setOutputCol("embeddings")
val pos_tagger = PerceptronModel
.pretrained("pos_clinical","en","clinical/models")
.setInputCols(Array("sentences","tokens"))
.setOutputCol("pos_tags")
val ner_tagger = MedicalNerModel
.pretrained("ner_jsl","en","clinical/models")
.setInputCols("sentences","tokens","embeddings")
.setOutputCol("ner_tags")
val ner_chunker = new NerConverterInternal()
.setInputCols(Array("sentences","tokens","ner_tags"))
.setOutputCol("ner_chunks")
val dependency_parser = DependencyParserModel
.pretrained("dependency_conllu","en")
.setInputCols(Array("sentences","pos_tags","tokens"))
.setOutputCol("dependencies")
val re_model = RelationExtractionModel.pretrained("re_test_result_date","en","clinical/models")
.setInputCols(Array("embeddings","pos_tags","ner_chunks","dependencies"))
.setOutputCol("relations")
.setMaxSyntacticDistance(4)
.setPredictionThreshold(0.9)
val nlp_pipeline = new Pipeline().setStages(Array(
document_assembler,
sentencer,
tokenizer,
word_embeddings,
pos_tagger,
ner_tagger,
ner_chunker,
dependency_parser,
re_model))
val light_pipeline = new LightPipeline(nlp_pipeline.fit(Seq("") .toDF("text")))
val results = light_pipeline.fullAnnotate("""He was advised chest X-ray or CT scan after checking his SpO2 which was <= 93%""")
import nlu
nlu.load("en.relation.test_result_date").predict("""He was advised chest X-ray or CT scan after checking his SpO2 which was <= 93%""")
Results
| | relation | entity1 | entity1_begin | entity1_end | chunk1 | entity2 | entity2_begin | entity2_end | chunk2 | confidence |
|--:|--------------:|--------:|--------------:|------------:|------------:|--------:|--------------:|------------:|------------:|-----------:|
| 0 | is_finding_of | Gender | 0 | 1 | He | Test | 15 | 25 | chest X-ray | 0.99916 |
| 1 | is_finding_of | Gender | 0 | 1 | He | Test | 30 | 36 | CT scan | 1.00000 |
| 2 | is_finding_of | Test | 15 | 25 | chest X-ray | Test | 30 | 36 | CT scan | 1.00000 |
| 3 | is_finding_of | Test | 30 | 36 | CT scan | Gender | 53 | 55 | his | 1.00000 |
| 4 | is_finding_of | Test | 30 | 36 | CT scan | Test | 57 | 60 | SpO2 | 1.00000 |
| 5 | is_date_of | Gender | 53 | 55 | his | Test | 57 | 60 | SpO2 | 0.98956 |
Model Information
Model Name: | re_test_result_date |
Type: | re |
Compatibility: | Healthcare NLP 2.7.4+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [embeddings, pos_tags, train_ner_chunks, dependencies] |
Output Labels: | [relations] |
Language: | en |
Data Source
Trained on internal data.
Benchmarking
| relation | prec |
|-----------------|------|
| O | 0.77 |
| is_finding_of | 0.80 |
| is_result_of | 0.96 |
| is_date_of | 0.94 |