Classifier for Adverse Drug Events in Small Conversations

Description

Classify sentence in two categories:

True : The sentence is talking about a possible ADE

False : The sentences doesn’t have any information about an ADE.

Predicted Entities

True, False

Live Demo Open in Colab Download

How to use

document_assembler = DocumentAssembler().setInputCol("text").setOutputCol("document")

tokenizer = Tokenizer().setInputCols(['document']).setOutputCol('token')

embeddings = BertEmbeddings.pretrained('biobert_pubmed_base_cased')\
.setInputCols(["document", 'token'])\
.setOutputCol("word_embeddings")

sentence_embeddings = SentenceEmbeddings() \
.setInputCols(["document", "word_embeddings"]) \
.setOutputCol("sentence_embeddings") \
.setPoolingStrategy("AVERAGE")

classifier = ClassifierDLModel.pretrained('classifierdl_ade_conversational_biobert', 'en', 'clinical/models')\
.setInputCols(['document', 'token', 'sentence_embeddings']).setOutputCol('class')

nlp_pipeline = Pipeline(stages=[document_assembler, tokenizer, embeddings, sentence_embeddings, classifier])

light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

annotations = light_pipeline.fullAnnotate(["I feel a bit drowsy & have a little blurred vision after taking an insulin", "I feel great after taking tylenol"])
import nlu
nlu.load("en.classify.ade.conversational").predict("""I feel a bit drowsy & have a little blurred vision after taking an insulin""")

Results

|   | text                                                                       | label |
|--:|:---------------------------------------------------------------------------|:------|
| 0 | I feel a bit drowsy & have a little blurred vision after taking an insulin | True  |
| 1 | I feel great after taking tylenol                                          | False |

Model Information

Model Name: classifierdl_ade_conversational_biobert
Compatibility: Spark NLP 2.7.1+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [class]
Language: en
Dependencies: biobert_pubmed_base_cased

Data Source

Trained on a custom dataset comprising of CADEC, DRUG-AE and Twimed.

Benchmarking

precision    recall  f1-score   support

False       0.91      0.94      0.93      5706
True       0.80      0.70      0.74      1800

micro avg       0.89      0.89      0.89      7506
macro avg       0.85      0.82      0.84      7506
weighted avg       0.88      0.89      0.88      7506