Detect Adverse Drug Events (BertForTokenClassification)

Description

Detect adverse reactions of drugs in reviews, tweets, and medical text using the pretrained NER model. This model is trained with the BertForTokenClassification method from the transformers library and imported into Spark NLP.

Predicted Entities

DRUG, ADE

Live Demo Open in Colab Download Copy S3 URI

How to use

documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")

tokenizer = Tokenizer()\
.setInputCols(["document"])\
.setOutputCol("token")

tokenClassifier = MedicalBertForTokenClassifier.pretrained("bert_token_classifier_ner_ade", "en", "clinical/models")\
.setInputCols("token", "document")\
.setOutputCol("ner")\
.setCaseSensitive(True)\
.setMaxSentenceLength(512)

ner_converter = NerConverter() \
.setInputCols(["document","token","ner"]) \
.setOutputCol("ner_chunk")

pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier, ner_converter])

data = spark.createDataFrame([["""I have an allergic reaction to vancomycin so I have itchy skin, sore throat/burning/itching, numbness of tongue and gums. I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication."""
]]).toDF("text")

result = pipeline.fit(data).transform(data)

val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val tokenizer = new Tokenizer()
.setInputCols(Array("document"))
.setOutputCol("token")

val tokenClassifier = MedicalBertForTokenClassifier.pretrained("bert_token_classifier_ner_ade", "en", "clinical/models")
.setInputCols(Array("document","token"))
.setOutputCol("ner")
.setCaseSensitive(True)
.setMaxSentenceLength(512)

val ner_converter = new NerConverter()
.setInputCols(Array("document","token","ner"))
.setOutputCol("ner_chunk")

val pipeline =  new Pipeline().setStages(Array(document_assembler, tokenizer, tokenClassifier, ner_converter))

val data = Seq("""I have an allergic reaction to vancomycin so I have itchy skin, sore throat/burning/itching, numbness of tongue and gums. I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication.""").toDS.toDF("text")

val result = pipeline.fit(data).transform(data)

import nlu
nlu.load("en.classify.token_bert.ner_ade").predict("""I have an allergic reaction to vancomycin so I have itchy skin, sore throat/burning/itching, numbness of tongue and gums. I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication.""")

Results

+-----------+---------------------------+-----+---+---------+
|sentence_id|chunk                      |begin|end|ner_label|
+-----------+---------------------------+-----+---+---------+
|0          |allergic reaction          |10   |26 |ADE      |
|0          |vancomycin                 |31   |40 |DRUG     |
|0          |itchy skin                 |52   |61 |ADE      |
|0          |sore throat/burning/itching|64   |90 |ADE      |
|0          |numbness of tongue and gums|93   |119|ADE      |
+-----------+---------------------------+-----+---+---------+

Model Information

Model Name:	bert_token_classifier_ner_ade
Compatibility:	Healthcare NLP 3.4.0+
License:	Licensed
Edition:	Official
Input Labels:	[sentence, token]
Output Labels:	[ner]
Language:	en
Size:	404.3 MB
Case sensitive:	true
Max sentense length:	512

Data Source

This model is trained on a custom dataset by John Snow Labs.

Benchmarking

label    precision    recall  f1-score   support
B-ADE         0.93      0.79      0.85      2694
B-DRUG        0.97      0.87      0.92      9539
I-ADE         0.93      0.73      0.82      3236
I-DRUG        0.95      0.82      0.88      6115
accuracy       -         -        0.83     21584
macro-avg     0.84      0.84      0.84     21584
weighted-avg  0.95      0.83      0.89     21584

PREVIOUSSentence Entity Resolver for Clinical Abbreviations and Acronyms (sbiobert_base_cased_mli embeddings)

NEXTExtract relations between drugs and proteins