Description
Relation extraction between date and related other entities. 1
: Shows there is a relation between the date entity and other clinical entities, 0
: Shows there is no relation between the date entity and other clinical entities.
Predicted Entities
0
, 1
How to use
In the table below, re_date_clinical
RE model, its labels, optimal NER model, and meaningful relation pairs are illustrated.
RE MODEL | RE MODEL LABES | NER MODEL | RE PAIRS |
---|---|---|---|
re_date_clinical | 0,1 | ner_jsl | [“date-admission_discharge”, “admission_discharge-date”, “date-alcohol”, “alcohol-date”, “date-allergen”, “allergen-date”, “date-bmi”, “bmi-date”, “date-birth_entity”, “birth_entity-date”, “date-blood_pressure”, “blood_pressure-date”, “date-cerebrovascular_disease”, “cerebrovascular_disease-date”, “date-clinical_dept”, “clinical_dept-date”, “date-communicable_disease”, “communicable_disease-date”, “date-death_entity”, “death_entity-date”, “date-diabetes”, “diabetes-date”, “date-diet”, “diet-date”, “date-disease_syndrome_disorder”, “disease_syndrome_disorder-date”, “date-drug_brandname”, “drug_brandname-date”, “date-drug_ingredient”, “drug_ingredient-date”, “date-ekg_findings”, “ekg_findings-date”, “date-external_body_part_or_region”, “external_body_part_or_region-date”, “date-fetus_newborn”, “fetus_newborn-date”, “date-hdl”, “hdl-date”, “date-heart_disease”, “heart_disease-date”, “date-height”, “height-date”, “date-hyperlipidemia”, “hyperlipidemia-date”, “date-hypertension”, “hypertension-date”, “date-imagingfindings”, “imagingfindings-date”, “date-imaging_technique”, “imaging_technique-date”, “date-injury_or_poisoning”, “injury_or_poisoning-date”, “date-internal_organ_or_component”, “internal_organ_or_component-date”, “date-kidney_disease”, “kidney_disease-date”, “date-ldl”, “ldl-date”, “date-modifier”, “modifier-date”, “date-o2_saturation”, “o2_saturation-date”, “date-obesity”, “obesity-date”, “date-oncological”, “oncological-date”, “date-overweight”, “overweight-date”, “date-oxygen_therapy”, “oxygen_therapy-date”, “date-pregnancy”, “pregnancy-date”, “date-procedure”, “procedure-date”, “date-psychological_condition”, “psychological_condition-date”, “date-pulse”, “pulse-date”, “date-respiration”, “respiration-date”, “date-smoking”, “smoking-date”, “date-substance”, “substance-date”, “date-substance_quantity”, “substance_quantity-date”, “date-symptom”, “symptom-date”, “date-temperature”, “temperature-date”, “date-test”, “test-date”, “date-test_result”, “test_result-date”, “date-total_cholesterol”, “total_cholesterol-date”, “date-treatment”, “treatment-date”, “date-triglycerides”, “triglycerides-date”, “date-vs_finding”, “vs_finding-date”, “date-vaccine”, “vaccine-date”, “date-vital_signs_header”, “vital_signs_header-date”, “date-weight”, “weight-date”, “time-admission_discharge”, “admission_discharge-time”, “time-alcohol”, “alcohol-time”, “time-allergen”, “allergen-time”, “time-bmi”, “bmi-time”, “time-birth_entity”, “birth_entity-time”, “time-blood_pressure”, “blood_pressure-time”, “time-cerebrovascular_disease”, “cerebrovascular_disease-time”, “time-clinical_dept”, “clinical_dept-time”, “time-communicable_disease”, “communicable_disease-time”, “time-death_entity”, “death_entity-time”, “time-diabetes”, “diabetes-time”, “time-diet”, “diet-time”, “time-disease_syndrome_disorder”, “disease_syndrome_disorder-time”, “time-drug_brandname”, “drug_brandname-time”, “time-drug_ingredient”, “drug_ingredient-time”, “time-ekg_findings”, “ekg_findings-time”, “time-external_body_part_or_region”, “external_body_part_or_region-time”, “time-fetus_newborn”, “fetus_newborn-time”, “time-hdl”, “hdl-time”, “time-heart_disease”, “heart_disease-time”, “time-height”, “height-time”, “time-hyperlipidemia”, “hyperlipidemia-time”, “time-hypertension”, “hypertension-time”, “time-imagingfindings”, “imagingfindings-time”, “time-imaging_technique”, “imaging_technique-time”, “time-injury_or_poisoning”, “injury_or_poisoning-time”, “time-internal_organ_or_component”, “internal_organ_or_component-time”, “time-kidney_disease”, “kidney_disease-time”, “time-ldl”, “ldl-time”, “time-modifier”, “modifier-time”, “time-o2_saturation”, “o2_saturation-time”, “time-obesity”, “obesity-time”, “time-oncological”, “oncological-time”, “time-overweight”, “overweight-time”, “time-oxygen_therapy”, “oxygen_therapy-time”, “time-pregnancy”, “pregnancy-time”, “time-procedure”, “procedure-time”, “time-psychological_condition”, “psychological_condition-time”, “time-pulse”, “pulse-time”, “time-respiration”, “respiration-time”, “time-smoking”, “smoking-time”, “time-substance”, “substance-time”, “time-substance_quantity”, “substance_quantity-time”, “time-symptom”, “symptom-time”, “time-temperature”, “temperature-time”, “time-test”, “test-time”, “time-test_result”, “test_result-time”, “time-total_cholesterol”, “total_cholesterol-time”, “time-treatment”, “treatment-time”, “time-triglycerides”, “triglycerides-time”, “time-vs_finding”, “vs_finding-time”, “time-vaccine”, “vaccine-time”, “time-vital_signs_header”, “vital_signs_header-time”, “time-weight”, “weight-time”, “relativedate-admission_discharge”, “admission_discharge-relativedate”, “relativedate-alcohol”, “alcohol-relativedate”, “relativedate-allergen”, “allergen-relativedate”, “relativedate-bmi”, “bmi-relativedate”, “relativedate-birth_entity”, “birth_entity-relativedate”, “relativedate-blood_pressure”, “blood_pressure-relativedate”, “relativedate-cerebrovascular_disease”, “cerebrovascular_disease-relativedate”, “relativedate-clinical_dept”, “clinical_dept-relativedate”, “relativedate-communicable_disease”, “communicable_disease-relativedate”, “relativedate-death_entity”, “death_entity-relativedate”, “relativedate-diabetes”, “diabetes-relativedate”, “relativedate-diet”, “diet-relativedate”, “relativedate-disease_syndrome_disorder”, “disease_syndrome_disorder-relativedate”, “relativedate-drug_brandname”, “drug_brandname-relativedate”, “relativedate-drug_ingredient”, “drug_ingredient-relativedate”, “relativedate-ekg_findings”, “ekg_findings-relativedate”, “relativedate-external_body_part_or_region”, “external_body_part_or_region-relativedate”, “relativedate-fetus_newborn”, “fetus_newborn-relativedate”, “relativedate-hdl”, “hdl-relativedate”, “relativedate-heart_disease”, “heart_disease-relativedate”, “relativedate-height”, “height-relativedate”, “relativedate-hyperlipidemia”, “hyperlipidemia-relativedate”, “relativedate-hypertension”, “hypertension-relativedate”, “relativedate-imagingfindings”, “imagingfindings-relativedate”, “relativedate-imaging_technique”, “imaging_technique-relativedate”, “relativedate-injury_or_poisoning”, “injury_or_poisoning-relativedate”, “relativedate-internal_organ_or_component”, “internal_organ_or_component-relativedate”, “relativedate-kidney_disease”, “kidney_disease-relativedate”, “relativedate-ldl”, “ldl-relativedate”, “relativedate-modifier”, “modifier-relativedate”, “relativedate-o2_saturation”, “o2_saturation-relativedate”, “relativedate-obesity”, “obesity-relativedate”, “relativedate-oncological”, “oncological-relativedate”, “relativedate-overweight”, “overweight-relativedate”, “relativedate-oxygen_therapy”, “oxygen_therapy-relativedate”, “relativedate-pregnancy”, “pregnancy-relativedate”, “relativedate-procedure”, “procedure-relativedate”, “relativedate-psychological_condition”, “psychological_condition-relativedate”, “relativedate-pulse”, “pulse-relativedate”, “relativedate-respiration”, “respiration-relativedate”, “relativedate-smoking”, “smoking-relativedate”, “relativedate-substance”, “substance-relativedate”, “relativedate-substance_quantity”, “substance_quantity-relativedate”, “relativedate-symptom”, “symptom-relativedate”, “relativedate-temperature”, “temperature-relativedate”, “relativedate-test”, “test-relativedate”, “relativedate-test_result”, “test_result-relativedate”, “relativedate-total_cholesterol”, “total_cholesterol-relativedate”, “relativedate-treatment”, “treatment-relativedate”, “relativedate-triglycerides”, “triglycerides-relativedate”, “relativedate-vs_finding”, “vs_finding-relativedate”, “relativedate-vaccine”, “vaccine-relativedate”, “relativedate-vital_signs_header”, “vital_signs_header-relativedate”, “relativedate-weight”, “weight-relativedate”, “relativetime-admission_discharge”, “admission_discharge-relativetime”, “relativetime-alcohol”, “alcohol-relativetime”, “relativetime-allergen”, “allergen-relativetime”, “relativetime-bmi”, “bmi-relativetime”, “relativetime-birth_entity”, “birth_entity-relativetime”, “relativetime-blood_pressure”, “blood_pressure-relativetime”, “relativetime-cerebrovascular_disease”, “cerebrovascular_disease-relativetime”, “relativetime-clinical_dept”, “clinical_dept-relativetime”, “relativetime-communicable_disease”, “communicable_disease-relativetime”, “relativetime-death_entity”, “death_entity-relativetime”, “relativetime-diabetes”, “diabetes-relativetime”, “relativetime-diet”, “diet-relativetime”, “relativetime-disease_syndrome_disorder”, “disease_syndrome_disorder-relativetime”, “relativetime-drug_brandname”, “drug_brandname-relativetime”, “relativetime-drug_ingredient”, “drug_ingredient-relativetime”, “relativetime-ekg_findings”, “ekg_findings-relativetime”, “relativetime-external_body_part_or_region”, “external_body_part_or_region-relativetime”, “relativetime-fetus_newborn”, “fetus_newborn-relativetime”, “relativetime-hdl”, “hdl-relativetime”, “relativetime-heart_disease”, “heart_disease-relativetime”, “relativetime-height”, “height-relativetime”, “relativetime-hyperlipidemia”, “hyperlipidemia-relativetime”, “relativetime-hypertension”, “hypertension-relativetime”, “relativetime-imagingfindings”, “imagingfindings-relativetime”, “relativetime-imaging_technique”, “imaging_technique-relativetime”, “relativetime-injury_or_poisoning”, “injury_or_poisoning-relativetime”, “relativetime-internal_organ_or_component”, “internal_organ_or_component-relativetime”, “relativetime-kidney_disease”, “kidney_disease-relativetime”, “relativetime-ldl”, “ldl-relativetime”, “relativetime-modifier”, “modifier-relativetime”, “relativetime-o2_saturation”, “o2_saturation-relativetime”, “relativetime-obesity”, “obesity-relativetime”, “relativetime-oncological”, “oncological-relativetime”, “relativetime-overweight”, “overweight-relativetime”, “relativetime-oxygen_therapy”, “oxygen_therapy-relativetime”, “relativetime-pregnancy”, “pregnancy-relativetime”, “relativetime-procedure”, “procedure-relativetime”, “relativetime-psychological_condition”, “psychological_condition-relativetime”, “relativetime-pulse”, “pulse-relativetime”, “relativetime-respiration”, “respiration-relativetime”, “relativetime-smoking”, “smoking-relativetime”, “relativetime-substance”, “substance-relativetime”, “relativetime-substance_quantity”, “substance_quantity-relativetime”, “relativetime-symptom”, “symptom-relativetime”, “relativetime-temperature”, “temperature-relativetime”, “relativetime-test”, “test-relativetime”, “relativetime-test_result”, “test_result-relativetime”, “relativetime-total_cholesterol”, “total_cholesterol-relativetime”, “relativetime-treatment”, “treatment-relativetime”, “relativetime-triglycerides”, “triglycerides-relativetime”, “relativetime-vs_finding”, “vs_finding-relativetime”, “relativetime-vaccine”, “vaccine-relativetime”, “relativetime-vital_signs_header”, “vital_signs_header-relativetime”, “relativetime-weight”, “weight-relativetime”] |
Use as part of an nlp pipeline with the following stages: DocumentAssembler, SentenceDetector, Tokenizer, PerceptronModel, DependencyParserModel, WordEmbeddingsModel, NerDLModel, NerConverter, RelationExtractionModel.
documenter = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentencer = SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentences")
tokenizer = Tokenizer()\
.setInputCols(["sentences"])\
.setOutputCol("tokens")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentences", "tokens"])\
.setOutputCol("embeddings")
pos_tagger = PerceptronModel()\
.pretrained("pos_clinical", "en", "clinical/models") \
.setInputCols(["sentences", "tokens"])\
.setOutputCol("pos_tags")
ner_tagger = MedicalNerModel().pretrained("jsl_ner_wip_greedy_clinical","en","clinical/models")\
.setInputCols("sentences", "tokens", "embeddings")\
.setOutputCol("ner_tags")
ner_chunker = NerConverterInternal()\
.setInputCols(["sentences", "tokens", "ner_tags"])\
.setOutputCol("ner_chunks")
dependency_parser = DependencyParserModel()\
.pretrained("dependency_conllu", "en")\
.setInputCols(["sentences", "pos_tags", "tokens"])\
.setOutputCol("dependencies")
re_model = RelationExtractionModel().pretrained("re_date_clinical", "en", "clinical/models")\
.setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
.setOutputCol("relations")\
.setMaxSyntacticDistance(3)\
.setPredictionThreshold(0.9)\
.setRelationPairs(["test-date", "symptom-date"]) # Possible relation pairs. Default: All Relations.
nlp_pipeline = Pipeline(stages=[documenter, sentencer,tokenizer, word_embeddings, pos_tagger, ner_tagger, ner_chunker, dependency_parser, re_model])
light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))
annotations = light_pipeline.fullAnnotate('''This 73 y/o patient had CT on 1/12/95, with progressive memory and cognitive decline since 8/11/94.''')
val documenter = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentencer = new SentenceDetector()
.setInputCols("document")
.setOutputCol("sentences")
val tokenizer = new Tokenizer()
.setInputCols("sentences")
.setOutputCol("tokens")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentences", "tokens"))
.setOutputCol("embeddings")
val pos_tagger = PerceptronModel()
.pretrained("pos_clinical", "en", "clinical/models")
.setInputCols(Array("sentences", "tokens"))
.setOutputCol("pos_tags")
val ner_tagger = MedicalNerModel().pretrained("jsl_ner_wip_greedy_clinical","en","clinical/models")
.setInputCols(Array("sentences", "tokens", "embeddings"))
.setOutputCol("ner_tags")
val ner_chunker = new NerConverterInternal()
.setInputCols(Array("sentences", "tokens", "ner_tags"))
.setOutputCol("ner_chunks")
val dependency_parser = DependencyParserModel()
.pretrained("dependency_conllu", "en")
.setInputCols(Array("sentences", "pos_tags", "tokens"))
.setOutputCol("dependencies")
val re_model = RelationExtractionModel()
.pretrained("re_date", "en", "clinical/models")
.setInputCols(Array("embeddings", "pos_tags", "ner_chunks", "dependencies"))
.setOutputCol("relations")
.setMaxSyntacticDistance(3) #default: 0
.setPredictionThreshold(0.9) #default: 0.5
.setRelationPairs(Array("test-date", "symptom-date")) # Possible relation pairs. Default: All Relations.
val nlpPipeline = new Pipeline().setStages(Array(documenter, sentencer,tokenizer, word_embeddings, pos_tagger, ner_tagger, ner_chunker, dependency_parser, re_model))
val result = pipeline.fit(Seq.empty[String]).transform(data)
val annotations = light_pipeline.fullAnnotate("""This 73 y/o patient had CT on 1/12/95, with progressive memory and cognitive decline since 8/11/94.""")
Results
| | relations | entity1 | entity1_begin | entity1_end | chunk1 | entity2 | entity2_end | entity2_end | chunk2 | confidence |
|---|-----------|---------|---------------|-------------|------------------------------------------|---------|-------------|-------------|---------|------------|
| 0 | 1 | Test | 24 | 25 | CT | Date | 31 | 37 | 1/12/95 | 1.0 |
| 1 | 1 | Symptom | 45 | 84 | progressive memory and cognitive decline | Date | 92 | 98 | 8/11/94 | 1.0 |
Model Information
Model Name: | re_date_clinical |
Type: | re |
Compatibility: | Spark NLP 2.7.1+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [embeddings, pos_tags, train_ner_chunks, dependencies] |
Output Labels: | [relations] |
Language: | en |
Dependencies: | embeddings_clinical |
Data Source
Trained on data gathered and manually annotated by John Snow Labs
Benchmarking
label recall precision f1
0 0.74 0.71 0.72
1 0.94 0.95 0.94