Detect Adverse Drug Events (MedicalBertForTokenClassification)

Description

Detect adverse reactions of drugs in texts excahnged over twitter. This model is trained with the BertForTokenClassification method from the transformers library and imported into Spark NLP.

Predicted Entities

ADE

Live Demo Open in Colab Copy S3 URI

How to use

documentAssembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentenceDetector = SentenceDetectorDLModel.pretrained()\
    .setInputCols("document")\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols("sentence")\
    .setOutputCol("token")

tokenClassifier = MedicalBertForTokenClassifier.pretrained("bert_token_classifier_ner_ade_binary", "en", "clinical/models")\
    .setInputCols("token", "sentence")\
    .setOutputCol("label")\
    .setCaseSensitive(True)

ner_converter = NerConverter()\
    .setInputCols(["sentence","token","label"])\
    .setOutputCol("ner_chunk")

pipeline =  Pipeline(stages=[
                      documentAssembler,
                      sentenceDetector,
                      tokenizer,
                      tokenClassifier,
                      ner_converter])

model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

data = spark.createDataFrame(["I used to be on paxil but that made me more depressed and prozac made me angry",
                              "Maybe cos of the insulin blocking effect of seroquel but i do feel sugar crashes when eat fast carbs."], StringType()).toDF("text")

result = model.transform(data)
val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val sentenceDetector = SentenceDetectorDLModel.pretrained()
  .setInputCols("document")
  .setOutputCol("sentence")

val tokenizer = new Tokenizer()
  .setInputCols("sentence")
  .setOutputCol("token")

val tokenClassifier = MedicalBertForTokenClassifier.pretrained("bert_token_classifier_ner_ade_binary", "en", "clinical/models")
  .setInputCols(Array("token", "sentence"))
  .setOutputCol("label")
  .setCaseSensitive(True)

val ner_converter = new NerConverter()
  .setInputCols(Array("sentence", "token", "label"))
  .setOutputCol("ner_chunk")


val pipeline =  new Pipeline().setStages(Array(
                      documentAssembler,
                      sentenceDetector,
                      tokenizer,
                      tokenClassifier,
                      ner_converter))

val data = Seq(Array("I used to be on paxil but that made me more depressed and prozac made me angry",
                     "Maybe cos of the insulin blocking effect of seroquel but i do feel sugar crashes when eat fast carbs.")).toDS().toDF("text")

val result = model.fit(data).transform(data)
import nlu
nlu.load("en.classify.bert_token.ner_ade_bert").predict("""Maybe cos of the insulin blocking effect of seroquel but i do feel sugar crashes when eat fast carbs.""")

Results

+-------------+---------+
|chunk        |ner_label|
+-------------+---------+
|depressed    |ADE      |
|angry        |ADE      |
|sugar crashes|ADE      |
+-------------+---------+

Model Information

Model Name: bert_token_classifier_ner_ade_binary
Compatibility: Healthcare NLP 4.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token]
Output Labels: [ner]
Language: en
Size: 404.2 MB
Case sensitive: true
Max sentence length: 128

Benchmarking

       label  precision    recall  f1-score   support
       B-ADE       0.89      0.88      0.88      3720
       I-ADE       0.85      0.84      0.84      3145
           O       0.98      0.98      0.98     26963
    accuracy        -         -        0.95     33828
   macro-avg       0.90      0.90      0.90     33828
weighted-avg       0.95      0.95      0.95     33828