Description
Relation extraction between body parts entites [‘Internal_organ_or_component’,’External_body_part_or_region’] and procedure and test entities
Predicted Entities
0
, 1
Live Demo Open in Colab Copy S3 URI
How to use
In the table below, re_bodypart_proceduretest
RE model, its labels, optimal NER model, and meaningful relation pairs are illustrated.
RE MODEL | RE MODEL LABES | NER MODEL | RE PAIRS |
---|---|---|---|
re_bodypart_proceduretest | 0,1 | ner_jsl | [“external_body_part_or_region-test”, “test-external_body_part_or_region”, “internal_organ_or_component-test”, “test-internal_organ_or_component”, “external_body_part_or_region-procedure”, “procedure-external_body_part_or_region”, “procedure-internal_organ_or_component”, “internal_organ_or_component-procedure”] |
Use as part of an nlp pipeline with the following stages: DocumentAssembler, SentenceDetector, Tokenizer, PerceptronModel, DependencyParserModel, WordEmbeddingsModel, NerDLModel, NerConverter, RelationExtractionModel.
documenter = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentencer = SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentences")
tokenizer = Tokenizer()\
.setInputCols(["sentences"])\
.setOutputCol("tokens")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentences", "tokens"])\
.setOutputCol("embeddings")
pos_tagger = PerceptronModel()\
.pretrained("pos_clinical", "en", "clinical/models") \
.setInputCols(["sentences", "tokens"])\
.setOutputCol("pos_tags")
ner_tagger = MedicalNerModel()\
.pretrained("jsl_ner_wip_greedy_clinical","en","clinical/models")\
.setInputCols("sentences", "tokens", "embeddings")\
.setOutputCol("ner_tags")
ner_chunker = NerConverterInternal()\
.setInputCols(["sentences", "tokens", "ner_tags"])\
.setOutputCol("ner_chunks")
dependency_parser = DependencyParserModel()\
.pretrained("dependency_conllu", "en")\
.setInputCols(["sentences", "pos_tags", "tokens"])\
.setOutputCol("dependencies")
re_model = RelationExtractionModel()\
.pretrained("re_bodypart_proceduretest", "en", "clinical/models")\
.setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
.setOutputCol("relations")\
.setMaxSyntacticDistance(4)\
.setPredictionThreshold(0.9)\
.setRelationPairs(["external_body_part_or_region-test"]) # Possible relation pairs. Default: All Relations.
nlp_pipeline = Pipeline(stages=[documenter, sentencer,tokenizer, word_embeddings, pos_tagger, ner_tagger, ner_chunker, dependency_parser, re_model])
light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))
annotations = light_pipeline.fullAnnotate('''TECHNIQUE IN DETAIL: After informed consent was obtained from the patient and his mother, the chest was scanned with portable ultrasound.''')
val documenter = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentencer = new SentenceDetector()
.setInputCols("document")
.setOutputCol("sentences")
val tokenizer = new Tokenizer()
.setInputCols("sentences")
.setOutputCol("tokens")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentences", "tokens"))
.setOutputCol("embeddings")
val pos_tagger = PerceptronModel()
.pretrained("pos_clinical", "en", "clinical/models")
.setInputCols(Array("sentences", "tokens"))
.setOutputCol("pos_tags")
val ner_tagger = MedicalNerModel().pretrained("jsl_ner_wip_greedy_clinical","en","clinical/models")
.setInputCols(Array("sentences", "tokens", "embeddings"))
.setOutputCol("ner_tags")
val ner_chunker = new NerConverterInternal()
.setInputCols(Array("sentences", "tokens", "ner_tags"))
.setOutputCol("ner_chunks")
val dependency_parser = DependencyParserModel()
.pretrained("dependency_conllu", "en")
.setInputCols(Array("sentences", "pos_tags", "tokens"))
.setOutputCol("dependencies")
val re_model = RelationExtractionModel().pretrained("re_bodypart_proceduretest", "en", "clinical/models")
.setInputCols(Array("embeddings", "pos_tags", "ner_chunks", "dependencies"))
.setOutputCol("relations")
.setMaxSyntacticDistance(4) #default: 0
.setPredictionThreshold(0.9) #default: 0.5
.setRelationPairs(Array("external_body_part_or_region-test")) # Possible relation pairs. Default: All Relations.
val nlpPipeline = new Pipeline().setStages(Array(documenter, sentencer,tokenizer, word_embeddings, pos_tagger, ner_tagger, ner_chunker, dependency_parser, re_model))
val result = pipeline.fit(Seq.empty[String]).transform(data)
val annotations = light_pipeline.fullAnnotate("""TECHNIQUE IN DETAIL: After informed consent was obtained from the patient and his mother, the chest was scanned with portable ultrasound.""")
Results
| index | relations | entity1 | entity1_begin | entity1_end | chunk1 | entity2 | entity2_end | entity2_end | chunk2 | confidence |
|-------|-----------|------------------------------|---------------|-------------|--------|---------|-------------|-------------|---------------------|------------|
| 0 | 1 | External_body_part_or_region | 94 | 98 | chest | Test | 117 | 135 | portable ultrasound | 1.0 |
Model Information
Model Name: | re_bodypart_proceduretest |
Type: | re |
Compatibility: | Spark NLP 2.7.1+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [embeddings, pos_tags, train_ner_chunks, dependencies] |
Output Labels: | [relations] |
Language: | en |
Dependencies: | embeddings_clinical |
Data Source
Trained on data gathered and manually annotated by John Snow Labs
Benchmarking
label recall precision f1
0 0.55 0.35 0.43
1 0.73 0.86 0.79