Edit on GitHub

Classifier for Adverse Drug Events using Clinical Bert

Description

Classify text/sentence in two categories:

True : The sentence is talking about a possible ADE

False : The sentences doesn’t have any information about an ADE.

Predicted Entities

True, False

Live Demo Open in Colab Download Copy S3 URI

How to use

document_assembler = DocumentAssembler().setInputCol("text").setOutputCol("document")

tokenizer = Tokenizer().setInputCols(['document']).setOutputCol('token')

embeddings = BertEmbeddings.pretrained('biobert_clinical_base_cased')\
.setInputCols(["document", 'token'])\
.setOutputCol("word_embeddings")

sentence_embeddings = SentenceEmbeddings() \
.setInputCols(["document", "word_embeddings"]) \
.setOutputCol("sentence_embeddings") \
.setPoolingStrategy("AVERAGE")

classifier = ClassifierDLModel.pretrained('classifierdl_ade_clinicalbert', 'en', 'clinical/models')\
.setInputCols(['document', 'token', 'sentence_embeddings']).setOutputCol('class')

nlp_pipeline = Pipeline(stages=[document_assembler, tokenizer, embeddings, sentence_embeddings, classifier])

light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

annotations = light_pipeline.fullAnnotate(["I feel a bit drowsy & have a little blurred vision after taking an insulin", "I feel great after taking tylenol"])

import nlu
nlu.load("en.classify.ade.clinicalbert").predict("""I feel a bit drowsy & have a little blurred vision after taking an insulin""")

Results

|   | text                                                                       | label |
|--:|:---------------------------------------------------------------------------|:------|
| 0 | I feel a bit drowsy & have a little blurred vision after taking an insulin | True  |
| 1 | I feel great after taking tylenol                                          | False |

Model Information

Model Name:	classifierdl_ade_clinicalbert
Compatibility:	Spark NLP 2.7.1+
License:	Licensed
Edition:	Official
Input Labels:	[sentence_embeddings]
Output Labels:	[class]
Language:	en
Dependencies:	biobert_clinical_base_cased

Data Source

Trained on a custom dataset comprising of CADEC, DRUG-AE and Twimed.

Benchmarking

precision    recall  f1-score   support

False       0.95      0.92      0.93      6923
True       0.64      0.78      0.70      1359

micro avg       0.89      0.89      0.89      8282
macro avg       0.80      0.85      0.82      8282
weighted avg       0.90      0.89      0.90      8282

PREVIOUSClassifier for Adverse Drug Events

NEXTClassifier for Adverse Drug Events in Small Conversations