Relation extraction between body parts and problem entities

Description

Relation extraction between body parts and problem entities in clinical texts. 1 : Shows that there is a relation between the body part entity and the entities labeled as problem ( diagnosis, symptom etc.), 0 : Shows that there no relation between the body part entity and the entities labeled as problem ( diagnosis, symptom etc.).

Predicted Entities

0, 1

Live Demo Open in Colab Copy S3 URI

How to use

In the table below, re_bodypart_problem RE model, its labels, optimal NER model, and meaningful relation pairs are illustrated.

RE MODEL RE MODELS LABES NER MODEL RE PAIRS
re_bodypart_problem 0,1 ner_jsl [“internal_organ_or_component-cerebrovascular_disease”,
“cerebrovascular_disease-internal_organ_or_component”,
“internal_organ_or_component-communicable_disease”,
“communicable_disease-internal_organ_or_component”,
“internal_organ_or_component-diabetes”,
“diabetes-internal_organ_or_component”,
“internal_organ_or_component-disease_syndrome_disorder”,
“disease_syndrome_disorder-internal_organ_or_component”,
“internal_organ_or_component-ekg_findings”,
“ekg_findings-internal_organ_or_component”,
“internal_organ_or_component-heart_disease”,
“heart_disease-internal_organ_or_component”,
“internal_organ_or_component-hyperlipidemia”,
“hyperlipidemia-internal_organ_or_component”,
“internal_organ_or_component-hypertension”,
“hypertension-internal_organ_or_component”,
“internal_organ_or_component-imagingfindings”,
“imagingfindings-internal_organ_or_component”,
“internal_organ_or_component-injury_or_poisoning”,
“injury_or_poisoning-internal_organ_or_component”,
“internal_organ_or_component-kidney_disease”,
“kidney_disease-internal_organ_or_component”,
“internal_organ_or_component-oncological”,
“oncological-internal_organ_or_component”,
“internal_organ_or_component-psychological_condition”,
“psychological_condition-internal_organ_or_component”,
“internal_organ_or_component-symptom”,
“symptom-internal_organ_or_component”,
“internal_organ_or_component-vs_finding”,
“vs_finding-internal_organ_or_component”,
“external_body_part_or_region-communicable_disease”,
“communicable_disease-external_body_part_or_region”,
“external_body_part_or_region-diabetes”,
“diabetes-external_body_part_or_region”,
“external_body_part_or_region-disease_syndrome_disorder”,
“disease_syndrome_disorder-external_body_part_or_region”,
“external_body_part_or_region-hypertension”,
“hypertension-external_body_part_or_region”,
“external_body_part_or_region-imagingfindings”,
“imagingfindings-external_body_part_or_region”,
“external_body_part_or_region-injury_or_poisoning”,
“injury_or_poisoning-external_body_part_or_region”,
“external_body_part_or_region-obesity”,
“obesity-external_body_part_or_region”,
“external_body_part_or_region-oncological”,
“oncological-external_body_part_or_region”,
“external_body_part_or_region-overweight”,
“overweight-external_body_part_or_region”,
“external_body_part_or_region-symptom”,
“symptom-external_body_part_or_region”,
“external_body_part_or_region-vs_finding”,
“vs_finding-external_body_part_or_region”]
documenter = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentencer = SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentences")

tokenizer = Tokenizer()\
    .setInputCols(["sentences"])\
    .setOutputCol("tokens")
  
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("embeddings")

pos_tagger = PerceptronModel()\
    .pretrained("pos_clinical", "en", "clinical/models") \
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("pos_tags")

ner_tagger = MedicalNerModel()\
    .pretrained("jsl_ner_wip_greedy_clinical","en","clinical/models")\
    .setInputCols("sentences", "tokens", "embeddings")\
    .setOutputCol("ner_tags") 

ner_chunker = NerConverterInternal()\
    .setInputCols(["sentences", "tokens", "ner_tags"])\
    .setOutputCol("ner_chunks")

dependency_parser = DependencyParserModel()\
    .pretrained("dependency_conllu", "en")\
    .setInputCols(["sentences", "pos_tags", "tokens"])\
    .setOutputCol("dependencies")

reModel = RelationExtractionModel.pretrained("re_bodypart_problem","en","clinical/models")\
    .setInputCols(["embeddings","ner_chunks","pos_tags","dependencies"])\
    .setOutputCol("relations") \
    .setRelationPairs(['symptom-external_body_part_or_region'])

pipeline = Pipeline(stages=[documenter, sentencer, tokenizer, word_embeddings, pos_tagger, ner_tagger, ner_chunker, dependency_parser, reModel])

model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

results = LightPipeline(model).fullAnnotate('''No neurologic deficits other than some numbness in his left hand.''')
val documenter = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val sentencer = new SentenceDetector()
    .setInputCols("document")
    .setOutputCol("sentences")

val tokenizer = new Tokenizer()
    .setInputCols("sentences")
    .setOutputCol("tokens")
  
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
    .setInputCols(Array("sentences", "tokens"))
    .setOutputCol("embeddings")

val pos_tagger = PerceptronModel()
    .pretrained("pos_clinical", "en", "clinical/models") 
    .setInputCols(Array("sentences", "tokens"))
    .setOutputCol("pos_tags")

val ner_tagger = MedicalNerModel()
    .pretrained("jsl_ner_wip_greedy_clinical","en","clinical/models")
    .setInputCols(Array("sentences", "tokens", "embeddings"))
    .setOutputCol("ner_tags") 

val ner_chunker = new NerConverterInternal()
    .setInputCols(Array("sentences", "tokens", "ner_tags"))
    .setOutputCol("ner_chunks")

val dependency_parser = DependencyParserModel()
    .pretrained("dependency_conllu", "en")
    .setInputCols(Array("sentences", "pos_tags", "tokens"))
    .setOutputCol("dependencies")

val reModel = RelationExtractionModel().pretrained("re_bodypart_problem","en","clinical/models")
    .setInputCols(Array("embeddings","ner_chunks","pos_tags","dependencies"))
    .setOutput("relations")
    .setRelationPairs(Array("symptom-external_body_part_or_region"))

val nlpPipeline = new Pipeline().setStages(Array(documenter, sentencer, tokenizer, word_embeddings, pos_tagger, ner_tagger, ner_chunker, dependency_parser, reModel))

val result = pipeline.fit(Seq.empty[String]).transform(data)

val results = LightPipeline(model).fullAnnotate("""No neurologic deficits other than some numbness in his left hand.""")

Results

| index | relations | entity1 | entity1_begin | entity1_end | chunk1              | entity2                      | entity2_end | entity2_end | chunk2 | confidence |
|-------|-----------|---------|---------------|-------------|---------------------|------------------------------|-------------|-------------|--------|------------|
| 0     | 0         | Symptom | 3             | 21          | neurologic deficits | external_body_part_or_region | 60          | 63          | hand   | 0.999998   |
| 1     | 1         | Symptom | 39            | 46          | numbness            | external_body_part_or_region | 60          | 63          | hand   | 1          |

Model Information

Model Name: re_bodypart_problem
Type: re
Compatibility: Spark NLP 2.7.1+
License: Licensed
Edition: Official
Input Labels: [embeddings, pos_tags, train_ner_chunks, dependencies]
Output Labels: [relations]
Language: en
Dependencies: embeddings_clinical

Data Source

Trained on custom datasets annotated internally

Benchmarking

label  recall  precision
0      0.72    0.82     
1      0.94    0.91