Detect Drug Information (Large)

Description

Pretrained named entity recognition deep learning model for posology. The SparkNLP deep learning model (MedicalNerModel) is inspired by a former state of the art model for NER: Chiu & Nicols, Named Entity Recognition with Bidirectional LSTM-CNN.

Predicted Entities

DOSAGE, DRUG, DURATION, FORM, FREQUENCY, ROUTE, STRENGTH.

Live Demo Open in Colab Download

How to use

...
embeddings_clinical = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
  .setInputCols(["sentence", "token"])\
  .setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_posology_large", "en", "clinical/models") \
  .setInputCols(["sentence", "token", "embeddings"]) \
  .setOutputCol("ner")
...
nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings_clinical, clinical_ner, ner_converter])
model = nlp_pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
results = model.transform(spark.createDataFrame(pd.DataFrame({"text": ["""The patient is a 30-year-old female with a long history of insulin dependent diabetes, type 2; coronary artery disease; chronic renal insufficiency; peripheral vascular disease, also secondary to diabetes; who was originally admitted to an outside hospital for what appeared to be acute paraplegia, lower extremities. She did receive a course of Bactrim for 14 days for UTI. Evidently, at some point in time, the patient was noted to develop a pressure-type wound on the sole of her left foot and left great toe. She was also noted to have a large sacral wound; this is in a similar location with her previous laminectomy, and this continues to receive daily care. The patient was transferred secondary to inability to participate in full physical and occupational therapy and continue medical management of her diabetes, the sacral decubitus, left foot pressure wound, and associated complications of diabetes. She is given Fragmin 5000 units subcutaneously daily, Xenaderm to wounds topically b.i.d., Lantus 40 units subcutaneously at bedtime, OxyContin 30 mg p.o. q.12 h., folic acid 1 mg daily, levothyroxine 0.1 mg p.o. daily, Prevacid 30 mg daily, Avandia 4 mg daily, Norvasc 10 mg daily, Lexapro 20 mg daily, aspirin 81 mg daily, Senna 2 tablets p.o. q.a.m., Neurontin 400 mg p.o. t.i.d., Percocet 5/325 mg 2 tablets q.4 h. p.r.n., magnesium citrate 1 bottle p.o. p.r.n., sliding scale coverage insulin, Wellbutrin 100 mg p.o. daily, and Bactrim DS b.i.d."""]})))
...
val embeddings_clinical = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
  .setInputCols(Array("sentence", "token"))
  .setOutputCol("embeddings")
val ner = MedicalNerModel.pretrained("ner_posology_large", "en", "clinical/models")
  .setInputCols("sentence", "token", "embeddings")
  .setOutputCol("ner")
...
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings_clinical, ner, ner_converter))

val result = pipeline.fit(Seq.empty["The patient is a 30-year-old female with a long history of insulin dependent diabetes, type 2; coronary artery disease; chronic renal insufficiency; peripheral vascular disease, also secondary to diabetes; who was originally admitted to an outside hospital for what appeared to be acute paraplegia, lower extremities. She did receive a course of Bactrim for 14 days for UTI. Evidently, at some point in time, the patient was noted to develop a pressure-type wound on the sole of her left foot and left great toe. She was also noted to have a large sacral wound; this is in a similar location with her previous laminectomy, and this continues to receive daily care. The patient was transferred secondary to inability to participate in full physical and occupational therapy and continue medical management of her diabetes, the sacral decubitus, left foot pressure wound, and associated complications of diabetes. She is given Fragmin 5000 units subcutaneously daily, Xenaderm to wounds topically b.i.d., Lantus 40 units subcutaneously at bedtime, OxyContin 30 mg p.o. q.12 h., folic acid 1 mg daily, levothyroxine 0.1 mg p.o. daily, Prevacid 30 mg daily, Avandia 4 mg daily, Norvasc 10 mg daily, Lexapro 20 mg daily, aspirin 81 mg daily, Senna 2 tablets p.o. q.a.m., Neurontin 400 mg p.o. t.i.d., Percocet 5/325 mg 2 tablets q.4 h. p.r.n., magnesium citrate 1 bottle p.o. p.r.n., sliding scale coverage insulin, Wellbutrin 100 mg p.o. daily, and Bactrim DS b.i.d."].toDS.toDF("text")).transform(data)

Results

+--------------+---------+
|chunk         |ner      |
+--------------+---------+
|insulin       |DRUG     |
|Bactrim       |DRUG     |
|for 14 days   |DURATION |
|Fragmin       |DRUG     |
|5000 units    |DOSAGE   |
|subcutaneously|ROUTE    |
|daily         |FREQUENCY|
|Xenaderm      |DRUG     |
|topically     |ROUTE    |
|b.i.d         |FREQUENCY|
|Lantus        |DRUG     |
|40 units      |DOSAGE   |
|subcutaneously|ROUTE    |
|at bedtime    |FREQUENCY|
|OxyContin     |DRUG     |
|30 mg         |STRENGTH |
|p.o           |ROUTE    |
|q.12 h        |FREQUENCY|
|folic acid    |DRUG     |
|1 mg          |STRENGTH |
+--------------+---------+
only showing top 20 rows

Model Information

Model Name: ner_posology_large
Compatibility: Spark NLP for Healthcare 3.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en

Data Source

Trained on the 2018 i2b2 dataset and FDA Drug datasets with ‘embeddings_clinical’. https://open.fda.gov/

Benchmarking

|    | label         |     tp |    fp |    fn |     prec |      rec |       f1 |
|---:|--------------:|-------:|------:|------:|---------:|---------:|---------:|
|  0 | B-DRUG        |  30096 |  1952 |  1265 | 0.939091 | 0.959663 | 0.949266 |
|  1 | B-STRENGTH    |  18379 |  1286 |  1142 | 0.934605 | 0.941499 | 0.938039 |
|  2 | I-DURATION    |   6346 |   692 |   391 | 0.901677 | 0.941962 | 0.921379 |
|  3 | I-STRENGTH    |  21368 |  2752 |  1411 | 0.885904 | 0.938057 | 0.911235 |
|  4 | I-FREQUENCY   |  18406 |  1525 |  1112 | 0.923486 | 0.943027 | 0.933154 |
|  5 | B-FORM        |  11297 |  1276 |   726 | 0.898513 | 0.939616 | 0.918605 |
|  6 | B-DOSAGE      |   3731 |   611 |   765 | 0.859281 | 0.829849 | 0.844309 |
|  7 | I-DOSAGE      |   2100 |   734 |   887 | 0.741002 | 0.703047 | 0.721526 |
|  8 | I-DRUG        |  11853 |  1364 |  1202 | 0.8968   | 0.907928 | 0.902329 |
|  9 | I-ROUTE       |    227 |    31 |    56 | 0.879845 | 0.80212  | 0.839187 |
| 10 | B-ROUTE       |   5870 |   436 |   488 | 0.930859 | 0.923246 | 0.927037 |
| 11 | B-DURATION    |   2493 |   313 |   205 | 0.888453 | 0.924018 | 0.905887 |
| 12 | B-FREQUENCY   |  12648 |   709 |   430 | 0.946919 | 0.96712  | 0.956913 |
| 13 | I-FORM        |    919 |   472 |   502 | 0.660676 | 0.646728 | 0.653627 |
| 14 | Macro-average | 145733 | 14153 | 10582 | 0.877651 | 0.88342  | 0.880526 |
| 15 | Micro-average | 145733 | 14153 | 10582 | 0.911481 | 0.932303 | 0.921774 |