Adverse Drug Events Classifier

Description

This model is trained with the DocumentMLClassifierApproach annotator and classifies a text/sentence into two categories:

True : The sentence is talking about a possible ADE

False : The sentence doesn’t have any information about an ADE.

The corpus used for model training is ADE-Corpus-V2 Dataset: Adverse Drug Reaction Data. This is a dataset for classification of a sentence if it is ADE-related (True) or not (False).

Predicted Entities

True, False

Download Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = Tokenizer()\
    .setInputCols("document")\
    .setOutputCol("token")

classifier_ml = DocumentMLClassifierModel.pretrained("classifierml_ade", "en", "clinical/models")\
    .setInputCols("token")\
    .setOutputCol("prediction")

clf_Pipeline = Pipeline(stages=[
    document_assembler, 
    tokenizer,
    classifier_ml])

data = spark.createDataFrame([["""I feel great after taking tylenol."""], ["""Toxic epidermal necrolysis resulted after 19 days of treatment with 5-fluorocytosine and amphotericin B."""]]).toDF("text")

result = clf_Pipeline.fit(data).transform(data)
val document_assembler =new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val tokenizer = new Tokenizer()
    .setInputCols("document")
    .setOutputCol("token")

val classifier_ml = new DocumentMLClassifierModel.pretrained("classifierml_ade", "en", "clinical/models")
    .setInputCols("token")
    .setOutputCol("prediction")

val clf_Pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, classifier_ml))

val data = Seq(Array("I feel great after taking tylenol", "Toxic epidermal necrolysis resulted after 19 days of treatment with 5-fluorocytosine and amphotericin B.")).toDS().toDF("text")

val result = clf_Pipeline.fit(data).transform(data)

Results

+--------------------------------------------------------------------------------------------------------+-------+
|text                                                                                                    |result |
+--------------------------------------------------------------------------------------------------------+-------+
|Toxic epidermal necrolysis resulted after 19 days of treatment with 5-fluorocytosine and amphotericin B.|[True] |
|I feel great after taking tylenol                                                                       |[False]|
+--------------------------------------------------------------------------------------------------------+-------+

Model Information

Model Name: classifierml_ade
Compatibility: Healthcare NLP 4.4.1+
License: Licensed
Edition: Official
Input Labels: [token]
Output Labels: [prediction]
Language: en
Size: 2.6 MB

References

The corpus used for model training is ADE-Corpus-V2 Dataset: Adverse Drug Reaction Data. This is a dataset for classification of a sentence if it is ADE-related (True) or not (False).

Reference: Gurulingappa et al., Benchmark Corpus to Support Information Extraction for Adverse Drug Effects, JBI, 2012. https://www.sciencedirect.com/science/article/pii/S1532046412000615

Benchmarking

       label  precision    recall  f1-score   support
       False       0.90      0.94      0.92      3359
        True       0.85      0.75      0.79      1364
    accuracy       -         -         0.89      4723
   macro avg       0.87      0.85      0.86      4723
weighted avg       0.89      0.89      0.89      4723