Detect Adverse Drug Events (biobert)

Description

Detect adverse drug events in tweets, reviews, and medical text using pretrained NER model.

Predicted Entities

DRUG, ADE

Live Demo Open in Colab Download

How to use


...
embeddings_clinical = BertEmbeddings.pretrained("biobert_pubmed_base_cased")  .setInputCols(["sentence", "token"])  .setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_ade_biobert", "en", "clinical/models")   .setInputCols(["sentence", "token", "embeddings"])   .setOutputCol("ner")
...
nlpPipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings_clinical, clinical_ner, ner_converter])
model = nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
results = model.transform(spark.createDataFrame([["EXAMPLE_TEXT"]]).toDF("text"))

...
val embeddings_clinical = BertEmbeddings.pretrained("biobert_pubmed_base_cased")
  .setInputCols(Array("sentence", "token"))
  .setOutputCol("embeddings")
val ner = MedicalNerModel.pretrained("ner_ade_biobert", "en", "clinical/models")
  .setInputCols(Array("sentence", "token", "embeddings"))
  .setOutputCol("ner")
...
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings_clinical, ner, ner_converter))
val result = pipeline.fit(Seq.empty[String]).transform(data)

Model Information

Model Name: ner_ade_biobert
Compatibility: Spark NLP for Healthcare 3.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en

Benchmarking

              precision    recall  f1-score   support

       B-ADE       0.48      0.82      0.60      3582
      B-DRUG       0.87      0.65      0.75     11763
       I-ADE       0.48      0.76      0.59      4309
      I-DRUG       0.95      0.28      0.43      7654
           O       0.97      0.98      0.97    303457

    accuracy                           0.95    330765
   macro avg       0.75      0.70      0.67    330765
weighted avg       0.95      0.95      0.94    330765