Description
This model detects drugs, experimental drugs, cyclelength, cyclecount, cycledaty, dosage, form, frequency, duration, route, and drug strength in text. It is based on the core ner_posology model, supports additional things like drug cycles, and enriched with more data from clinical trials.
Predicted Entities
Administration, Cyclenumber, Strength, Cycleday, Duration, Cyclecount, Route, Form, Frequency, Cyclelength, Drug, Dosage
Live Demo Open in Colab Copy S3 URI
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_posology_experimental", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = NerConverter()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter])
model = nlp_pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
results = model.transform(spark.createDataFrame([["Y-90 Humanized Anti-Tac: 10 mCi (if a bone marrow transplant was part of the patient's previous therapy) or 15 mCi of yttrium labeled anti-TAC; followed by calcium trisodium Inj (Ca DTPA)..\n\nCalcium-DTPA: Ca-DTPA will be administered intravenously on Days 1-3 to clear the radioactive agent from the body."]]).toDF("text"))
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = new SentenceDetector()
.setInputCols("document")
.setOutputCol("sentence")
tokenizer = Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val clinical_ner = MedicalNerModel.pretrained("ner_posology_experimental", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverter()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, clinical_ner, ner_converter))
val data = Seq("""Y-90 Humanized Anti-Tac: 10 mCi (if a bone marrow transplant was part of the patient's previous therapy) or 15 mCi of yttrium labeled anti-TAC; followed by calcium trisodium Inj (Ca DTPA)..\n\nCalcium-DTPA: Ca-DTPA will be administered intravenously on Days 1-3 to clear the radioactive agent from the body.""").toDS.toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.med_ner.posology.experimental").predict("""Y-90 Humanized Anti-Tac: 10 mCi (if a bone marrow transplant was part of the patient's previous therapy) or 15 mCi of yttrium labeled anti-TAC; followed by calcium trisodium Inj (Ca DTPA)..\n\nCalcium-DTPA: Ca-DTPA will be administered intravenously on Days 1-3 to clear the radioactive agent from the body.""")
Results
| | chunk | begin | end | entity |
|---:|:-------------------------|--------:|------:|:---------|
| 0 | Anti-Tac | 15 | 22 | Drug |
| 1 | 10 mCi | 25 | 30 | Dosage |
| 2 | 15 mCi | 108 | 113 | Dosage |
| 3 | yttrium labeled anti-TAC | 118 | 141 | Drug |
| 4 | calcium trisodium Inj | 156 | 176 | Drug |
| 5 | Calcium-DTPA | 191 | 202 | Drug |
| 6 | Ca-DTPA | 205 | 211 | Drug |
| 7 | intravenously | 234 | 246 | Route |
| 8 | Days 1-3 | 251 | 258 | Cycleday |
Model Information
| Model Name: | ner_posology_experimental |
| Compatibility: | Healthcare NLP 3.1.3+ |
| License: | Licensed |
| Edition: | Official |
| Input Labels: | [sentence, token, embeddings] |
| Output Labels: | [ner] |
| Language: | en |
Data Source
This model is trained on FDA 2018 Medication dataset, enriched with clinical trials data.
Benchmarking
label tp fp fn prec rec f1
B-Drug 30260 1321 1630 0.95817107 0.9488868 0.95350635
B-Cycleday 294 1 7 0.99661016 0.9767442 0.9865772
B-Dosage 4019 441 972 0.9011211 0.8052494 0.85049194
I-Strength 21784 2375 1616 0.9016929 0.9309401 0.9160832
I-Cyclenumber 113 2 1 0.9826087 0.9912280 0.98689955
B-Cyclelength 217 3 0 0.98636365 1.0 0.99313504
B-Administration 97 1 5 0.9897959 0.95098037 0.96999997
I-Cyclecount 174 7 3 0.96132594 0.9830508 0.972067
B-Strength 18871 1299 1161 0.9355974 0.9420427 0.93880904
B-Frequency 13064 464 713 0.96570075 0.9482471 0.95689434
B-Cyclenumber 93 2 1 0.97894734 0.9893617 0.9841269
I-Duration 6116 519 738 0.92177844 0.89232564 0.9068129
B-Cyclecount 120 5 3 0.96 0.9756098 0.9677419
B-Form 10964 912 986 0.92320645 0.9174895 0.9203391
I-Route 275 42 51 0.8675079 0.8435583 0.85536546
I-Cyclelength 261 5 0 0.981203 1.0 0.9905123
I-Dosage 2385 471 1107 0.835084 0.6829897 0.75141776
I-Cycleday 548 5 13 0.9909584 0.9768271 0.983842
I-Frequency 18644 967 1574 0.9506909 0.9221486 0.9362023
I-Administration 303 10 5 0.9680511 0.98376626 0.9758454
I-Form 642 284 553 0.6933045 0.5372385 0.6053748
B-Route 5930 280 692 0.9549114 0.8954998 0.92425185
B-Duration 2422 261 359 0.9027208 0.87090975 0.88653
I-Drug 11472 1066 1240 0.9149784 0.9024544 0.9086733
Macro-average 149068 10743 13430 0.93426394 0.9111479 0.92256117
Micro-average 149068 10743 13430 0.93277687 0.91735286 0.9250006