Adverse Drug Events Binary Classifier

Description

Classify texts/sentences in two categories:

  • True : The sentence is talking about a possible ADE.

  • False : The sentence doesn’t have any information about an ADE.

This model is a BioBERT-based classifier.

Predicted Entities

Live Demo Open in Colab Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = Tokenizer()\
    .setInputCols("document")\
    .setOutputCol("token")

sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_ade_augmented_v2", "en", "clinical/models")\
    .setInputCols(["document","token"])\
    .setOutputCol("classes")

pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier])

data = spark.createDataFrame([["I felt a bit drowsy and had blurred vision after taking Aspirin."]]).toDF("text")

result = pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler() 
.setInputCol("text") 
.setOutputCol("document")


val tokenizer = new Tokenizer()
.setInputCols("document")
.setOutputCol("token")


val sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_ade_augmented_v2", "en", "clinical/models")
.setInputCols(Array("document","token"))
.setOutputCol("class")


val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, sequenceClassifier))


val data = Seq("I felt a bit drowsy and had blurred vision after taking Aspirin.").toDF("text")


val result = pipeline.fit(data).transform(data)

Results

+------+----------------------------------------------------------------+
|result|text                                                            |
+------+----------------------------------------------------------------+
|[True]|I felt a bit drowsy and had blurred vision after taking Aspirin.|
+------+----------------------------------------------------------------+

Model Information

Model Name: bert_sequence_classifier_ade_augmented_v2
Compatibility: Healthcare NLP 5.4.1+
License: Licensed
Edition: Official
Input Labels: [document, token]
Output Labels: [class]
Language: en
Size: 406.6 MB
Case sensitive: false
Max sentence length: 512

Benchmarking

       label  precision    recall  f1-score   support
       False       0.97      0.97      0.97      2000
        True       0.90      0.89      0.90       530
    accuracy         -         -       0.96      2530
   macro-avg       0.93      0.93      0.93      2530
weighted-avg       0.96      0.96      0.96      2530