Relation extraction between Drugs and ADE - Conversational Text

Description

This model is capable of Relating Drugs and adverse reactions caused by them in conversational text.

Predicted Entities

is_related, not_related

Live Demo Open in Colab Copy S3 URI

How to use

documenter = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("sentences")

tokenizer = Tokenizer()\
    .setInputCols(["sentences"])\
    .setOutputCol("tokens")

words_embedder = WordEmbeddingsModel() \
    .pretrained("embeddings_clinical", "en", "clinical/models") \
    .setInputCols(["sentences", "tokens"]) \
    .setOutputCol("embeddings")

ner_tagger = MedicalNerModel() \
    .pretrained("ner_ade_clinical", "en", "clinical/models") \
    .setInputCols(["sentences", "tokens", "embeddings"]) \
    .setOutputCol("ner_tags")

ner_converter = NerConverter() \
    .setInputCols(["sentences", "tokens", "ner_tags"]) \
    .setOutputCol("ner_chunks")

pos_tagger = PerceptronModel()\
    .pretrained("pos_clinical", "en", "clinical/models") \
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("pos_tags")

dependency_parser = sparknlp.annotators.DependencyParserModel()\
    .pretrained("dependency_conllu", "en")\
    .setInputCols(["sentences", "pos_tags", "tokens"])\
    .setOutputCol("dependencies")

re_model = RelationExtractionModel()\
    .pretrained("re_ade_conversational", "en", "clinical/models")\
    .setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
    .setOutputCol("relations")\
    .setRelationPairs(["ade-drug", "drug-ade"]) # Possible relation pairs. Default: All Relations.

nlp_pipeline = Pipeline(stages=[documenter, tokenizer, words_embedder, pos_tagger, ner_tagger, ner_converter, dependency_parser, re_model])

light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([[""]]).toDF("text")))

text ="""19.32 day 20 rivaroxaban diary. still residual aches and pains; only had 4 paracetamol today."""

annotations = light_pipeline.fullAnnotate(text)
val documenter = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("sentences")

val tokenizer = new Tokenizer()
    .setInputCols("sentences")
    .setOutputCol("tokens")

val words_embedder = WordEmbeddingsModel()
    .pretrained("embeddings_clinical", "en", "clinical/models")
    .setInputCols(Array("sentences", "tokens"))
    .setOutputCol("embeddings")

val ner_tagger = NerDLModel()
    .pretrained("ner_ade_clinical", "en", "clinical/models")
    .setInputCols(Array("sentences", "tokens", "embeddings"))
    .setOutputCol("ner_tags")

val ner_converter = new NerConverter()
    .setInputCols(Array("sentences", "tokens", "ner_tags"))
    .setOutputCol("ner_chunks")

val pos_tagger = PerceptronModel()
    .pretrained("pos_clinical", "en", "clinical/models")
    .setInputCols(Array("sentences", "tokens"))
    .setOutputCol("pos_tags")

val dependency_parser = DependencyParserModel()
    .pretrained("dependency_conllu", "en")
    .setInputCols(Array("sentences", "pos_tags", "tokens"))
    .setOutputCol("dependencies")

val re_model = RelationExtractionModel()
    .pretrained("re_ade_conversational", "en", "clinical/models")
    .setInputCols(Array("embeddings", "pos_tags", "ner_chunks", "dependencies"))
    .setOutputCol("relations")
    .setMaxSyntacticDistance(3) #default: 0 
    .setPredictionThreshold(0.5) #default: 0.5 
    .setRelationPairs(Array("drug-ade", "ade-drug")) # Possible relation pairs. Default: All Relations.

val nlpPipeline = new Pipeline().setStages(Array(documenter, tokenizer, words_embedder, pos_tagger, ner_tagger, ner_chunker, dependency_parser, re_model))

val data = Seq("""19.32 day 20 rivaroxaban diary. still residual aches and pains; only had 4 paracetamol today.""").toDS.toDF("text")

val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.relation.adverse_drug_events.conversational").predict("""19.32 day 20 rivaroxaban diary. still residual aches and pains; only had 4 paracetamol today.""")

Results

|    | chunk1                        | entitiy1   | chunk2      | entity2 | relation    |
|----|-------------------------------|------------|-------------|---------|-------------|
| 0  | residual aches and pains      | ADE        | rivaroxaban | DRUG    | is_related  |
| 1  | residual aches and pains      | ADE        | paracetamol | DRUG    | not_related |

Model Information

Model Name: re_ade_conversational
Type: re
Compatibility: Healthcare NLP 3.5.0+
License: Licensed
Edition: Official
Input Labels: [embeddings, pos_tags, train_ner_chunks, dependencies]
Output Labels: [relations]
Language: en
Size: 11.3 MB

References

Trained on SMM4H dataset - annotated manually. https://healthlanguageprocessing.org/smm4h-2022/

Benchmarking

       label  precision    recall  f1-score   support
 not_related       0.81      0.88      0.85       528
  is_related       0.94      0.89      0.91      1019
    accuracy       -         -         0.89      1547
   macro-avg       0.87      0.89      0.88      1547
weighted-avg       0.89      0.89      0.89      1547