Description
Pretrained named entity recognition deep learning model for posological entity detection in clinical notes. This NER model is an augmented version of ner_posology
model and trained with the embeddings_clinical
word embeddings model, so be sure to use the same embeddings in the pipeline.
Predicted Entities
DOSAGE
, DRUG
, DURATION
, FORM
, FREQUENCY
, ROUTE
, STRENGTH
Live Demo Open in Colab Copy S3 URI
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_posology_langtest", "en", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = NerConverterInternal()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
nlp_pipeline = Pipeline(
stages=[
document_assembler,
sentence_detector,
tokenizer,
word_embeddings,
clinical_ner,
ner_converter
])
text = """The patient is a 30-year-old female with a long history of insulin dependent diabetes, type 2; coronary artery disease; chronic renal insufficiency; peripheral vascular disease, also secondary to diabetes; who was originally admitted to an outside hospital for what appeared to be acute paraplegia, lower extremities. She did receive a course of Bactrim for 14 days for UTI. Evidently, at some point in time, the patient was noted to develop a pressure-type wound on the sole of her left foot and left great toe. She was also noted to have a large sacral wound; this is in a similar location with her previous laminectomy, and this continues to receive daily care. The patient was transferred secondary to inability to participate in full physical and occupational therapy and continue medical management of her diabetes, the sacral decubitus, left foot pressure wound, and associated complications of diabetes. She is given Fragmin 5000 units subcutaneously daily, Xenaderm to wounds topically b.i.d., Lantus 40 units subcutaneously at bedtime, OxyContin 30 mg p.o. q.12 h., folic acid 1 mg daily, levothyroxine 0.1 mg p.o. daily, Prevacid 30 mg daily, Avandia 4 mg daily, Norvasc 10 mg daily, Lexapro 20 mg daily, aspirin 81 mg daily, Senna 2 tablets p.o. q.a.m., Neurontin 400 mg p.o. t.i.d., Percocet 5/325 mg 2 tablets q.4 h. p.r.n., magnesium citrate 1 bottle p.o. p.r.n., sliding scale coverage insulin, Wellbutrin 100 mg p.o. daily, and Bactrim DS b.i.d."""
data = spark.createDataFrame([[text]]).toDF("text")
result = nlp_pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = new SentenceDetector()
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical_large", "en", "clinical/models")\
.setInputCols(Array("sentence", "token"))\
.setOutputCol("embeddings")
val posology_ner_model = MedicalNerModel.pretrained("ner_posology_langtest", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("posology_ner")
val posology_ner_converter = new NerConverterInternal()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("posology_ner_chunk")
val posology_pipeline = new PipelineModel().setStages(Array(document_assembler,
sentence_detector,
tokenizer,
word_embeddings,
posology_ner_model,
posology_ner_converter))
text = """The patient is a 30-year-old female with a long history of insulin dependent diabetes, type 2; coronary artery disease; chronic renal insufficiency; peripheral vascular disease, also secondary to diabetes; who was originally admitted to an outside hospital for what appeared to be acute paraplegia, lower extremities. She did receive a course of Bactrim for 14 days for UTI. Evidently, at some point in time, the patient was noted to develop a pressure-type wound on the sole of her left foot and left great toe. She was also noted to have a large sacral wound; this is in a similar location with her previous laminectomy, and this continues to receive daily care. The patient was transferred secondary to inability to participate in full physical and occupational therapy and continue medical management of her diabetes, the sacral decubitus, left foot pressure wound, and associated complications of diabetes. She is given Fragmin 5000 units subcutaneously daily, Xenaderm to wounds topically b.i.d., Lantus 40 units subcutaneously at bedtime, OxyContin 30 mg p.o. q.12 h., folic acid 1 mg daily, levothyroxine 0.1 mg p.o. daily, Prevacid 30 mg daily, Avandia 4 mg daily, Norvasc 10 mg daily, Lexapro 20 mg daily, aspirin 81 mg daily, Senna 2 tablets p.o. q.a.m., Neurontin 400 mg p.o. t.i.d., Percocet 5/325 mg 2 tablets q.4 h. p.r.n., magnesium citrate 1 bottle p.o. p.r.n., sliding scale coverage insulin, Wellbutrin 100 mg p.o. daily, and Bactrim DS b.i.d."""
val data = Seq(text).toDS.toDF("text")
val result = posology_pipeline.fit(data).transform(data)
Results
+--------------+---------+
| chunk|ner_label|
+--------------+---------+
| insulin| DRUG|
| Bactrim| DRUG|
| for 14 days| DURATION|
| Fragmin| DRUG|
| 5000 units| DOSAGE|
|subcutaneously| ROUTE|
| daily|FREQUENCY|
| topically| ROUTE|
| b.i.d|FREQUENCY|
| Lantus| DRUG|
| 40 units| DOSAGE|
|subcutaneously| ROUTE|
| at bedtime|FREQUENCY|
| OxyContin| DRUG|
| 30 mg| STRENGTH|
| p.o| ROUTE|
| q.12 h|FREQUENCY|
| folic acid| DRUG|
| 1 mg| STRENGTH|
| daily|FREQUENCY|
+--------------+---------+
Model Information
Model Name: | ner_posology_langtest |
Compatibility: | Healthcare NLP 5.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 2.8 MB |
References
trained by in-house dataset
Benchmarking
label tp fp fn total precision recall f1
DURATION 175.0 25.0 39.0 214.0 0.875 0.8178 0.8454
DRUG 1373.0 153.0 167.0 1540.0 0.8997 0.8916 0.8956
DOSAGE 153.0 52.0 71.0 224.0 0.7463 0.683 0.7133
ROUTE 283.0 29.0 47.0 330.0 0.9071 0.8576 0.8816
FREQUENCY 744.0 109.0 108.0 852.0 0.8722 0.8732 0.8727
FORM 556.0 83.0 76.0 632.0 0.8701 0.8797 0.8749
STRENGTH 826.0 145.0 143.0 969.0 0.8507 0.8524 0.8515
macro - - - - - - 0.8479
micro - - - - - - 0.8680