Detect Drugs - Generalized Single Entity (ner_drugs_greedy)

Description

This is a single entity model that generalises all posology concepts into one and finds longest available chunks of drugs. It is trained using embeddings_clinical so please use the same embeddings in the pipeline.

Predicted Entities

DRUG.

Download

How to use

Use as part of an nlp pipeline with the following stages: DocumentAssembler, SentenceDetector, Tokenizer, WordEmbeddingsModel, NerDLModel. Add the NerConverter to the end of the pipeline to convert entity tokens into full entity chunks.

...
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
  .setInputCols(["sentence", "token"])\
  .setOutputCol("embeddings")
clinical_ner = NerDLModel.pretrained("ner_drugs_greedy", "en", "clinical/models") \
  .setInputCols(["sentence", "token", "embeddings"]) \
  .setOutputCol("ner")
...
nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter])
model = nlp_pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
results = model.transform(spark.createDataFrame([["DOSAGE AND ADMINISTRATION The initial dosage of hydrocortisone tablets may vary from 20 mg to 240 mg of hydrocortisone per day depending on the specific disease entity being treated."]]).toDF("text"))
...
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")
val clinical_ner = NerDLModel.pretrained("ner_drugs_greedy", "en", "clinical/models")
    .setInputCols(Array("sentence", "token", "embeddings"))
    .setOutputCol("ner")
...
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter))
val result = pipeline.fit(Seq.empty["DOSAGE AND ADMINISTRATION The initial dosage of hydrocortisone tablets may vary from 20 mg to 240 mg of hydrocortisone per day depending on the specific disease entity being treated."].toDS.toDF("text")).transform(data)

Results

+-----------------------------------+------------+
| chunk                             | ner_label  |
+-----------------------------------+------------+
| hydrocortisone tablets            | DRUG       |
| 20 mg to 240 mg of hydrocortisone | DRUG       |
+-----------------------------------+------------+

Model Information

Model Name: ner_drugs_greedy
Type: ner
Compatibility: Spark NLP 2.6.5+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Dependencies: embeddings_clinical

Data Source

Trained on augmented i2b2_med7 + FDA dataset with embeddings_clinical, https://www.i2b2.org/NLP/Medication.

Benchmarking

label	 tp	 fp	 fn	 prec	 rec	 f1
I-DRUG	 37858	 4166	 3338	 0.90086615	 0.91897273	 0.9098294
B-DRUG	 29926	 2006	 1756	 0.937179	 0.9445742	 0.9408621
tp: 67784 fp: 6172 fn: 5094 labels: 2
Macro-average	 prec: 0.91902256, rec: 0.9317734, f1: 0.92535406
Micro-average	 prec: 0.916545, rec: 0.93010235, f1: 0.9232739