Description
Detect adverse reactions of drugs in reviews, tweets, and medical text using the pretrained NER model. This model is trained with the BertForTokenClassification
method from the transformers
library and imported into Spark NLP.
Predicted Entities
DRUG
, ADE
Live Demo Open in Colab Copy S3 URI
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
tokenizer = Tokenizer()\
.setInputCols(["document"])\
.setOutputCol("token")
tokenClassifier = MedicalBertForTokenClassifier.pretrained("bert_token_classifier_ner_ade", "en", "clinical/models")\
.setInputCols("token", "document")\
.setOutputCol("ner")\
.setCaseSensitive(True)\
.setMaxSentenceLength(512)
ner_converter = NerConverter() \
.setInputCols(["document","token","ner"]) \
.setOutputCol("ner_chunk")
pipeline = Pipeline(stages=[documentAssembler, tokenizer, tokenClassifier, ner_converter])
data = spark.createDataFrame([["""I have an allergic reaction to vancomycin so I have itchy skin, sore throat/burning/itching, numbness of tongue and gums. I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication."""
]]).toDF("text")
result = pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val tokenizer = new Tokenizer()
.setInputCols(Array("document"))
.setOutputCol("token")
val tokenClassifier = MedicalBertForTokenClassifier.pretrained("bert_token_classifier_ner_ade", "en", "clinical/models")
.setInputCols(Array("document","token"))
.setOutputCol("ner")
.setCaseSensitive(True)
.setMaxSentenceLength(512)
val ner_converter = new NerConverter()
.setInputCols(Array("document","token","ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, tokenClassifier, ner_converter))
val data = Seq("""I have an allergic reaction to vancomycin so I have itchy skin, sore throat/burning/itching, numbness of tongue and gums. I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication.""").toDS.toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.classify.token_bert.ner_ade").predict("""I have an allergic reaction to vancomycin so I have itchy skin, sore throat/burning/itching, numbness of tongue and gums. I would not recommend this drug to anyone, especially since I have never had such an adverse reaction to any other medication.""")
Results
+-----------+---------------------------+-----+---+---------+
|sentence_id|chunk |begin|end|ner_label|
+-----------+---------------------------+-----+---+---------+
|0 |allergic reaction |10 |26 |ADE |
|0 |vancomycin |31 |40 |DRUG |
|0 |itchy skin |52 |61 |ADE |
|0 |sore throat/burning/itching|64 |90 |ADE |
|0 |numbness of tongue and gums|93 |119|ADE |
+-----------+---------------------------+-----+---+---------+
Model Information
Model Name: | bert_token_classifier_ner_ade |
Compatibility: | Healthcare NLP 3.4.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token] |
Output Labels: | [ner] |
Language: | en |
Size: | 404.3 MB |
Case sensitive: | true |
Max sentense length: | 512 |
Data Source
This model is trained on a custom dataset by John Snow Labs.
Benchmarking
label precision recall f1-score support
B-ADE 0.93 0.79 0.85 2694
B-DRUG 0.97 0.87 0.92 9539
I-ADE 0.93 0.73 0.82 3236
I-DRUG 0.95 0.82 0.88 6115
accuracy - - 0.83 21584
macro-avg 0.84 0.84 0.84 21584
weighted-avg 0.95 0.83 0.89 21584