Detect Posology concepts (ner_posology_healthcare)

Description

Detect Drug, Dosage and administration instructions in text using pretraiend NER model.

Predicted Entities

Drug, Duration, Strength, Form, Frequency, Dosage, Route

Live Demo Open in Colab Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models") \
	.setInputCols(["document"]) \
	.setOutputCol("sentence")

tokenizer = Tokenizer()\
	.setInputCols(["sentence"])\
	.setOutputCol("token")

embeddings_clinical = WordEmbeddingsModel.pretrained("embeddings_healthcare_100d", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

clinical_ner = MedicalNerModel.pretrained("ner_posology_healthcare", "en", "clinical/models")\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")

ner_converter = NerConverter() \
    .setInputCols(["sentence", "token", "ner"]) \
    .setOutputCol("entities")

pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings_clinical, clinical_ner, ner_converter])

model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

results = model.transform(spark.createDataFrame([["The patient is a 40-year-old white male who presents with a chief complaint of 'chest pain'. The patient is diabetic and has a prior history of coronary artery disease. The patient presents today stating that chest pain started yesterday evening.  He has been advised Aspirin 81 milligrams QDay. insulin 50 units in a.m. HCTZ 50 mg QDay. Nitroglycerin 1/150 sublingually."]]).toDF("text"))

val document_assembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")
	.setInputCols("document")
	.setOutputCol("sentence")

val tokenizer = new Tokenizer()
	.setInputCols("sentence")
	.setOutputCol("token")

val embeddings_clinical = WordEmbeddingsModel.pretrained("embeddings_healthcare_100d", "en", "clinical/models")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")

val clinical_ner = MedicalNerModel.pretrained("ner_posology_healthcare", "en", "clinical/models")
    .setInputCols(Array("sentence", "token", "embeddings"))
    .setOutputCol("ner")

val ner_converter = new NerConverter()
    .setInputCols(Array("sentence", "token", "ner"))
    .setOutputCol("entities")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings_clinical, ner, ner_converter))

val text = """The patient is a 40-year-old white male who presents with a chief complaint of 'chest pain'. The patient is diabetic and has a prior history of coronary artery disease. The patient presents today stating that chest pain started yesterday evening.  He has been advised Aspirin 81 milligrams QDay. insulin 50 units in a.m. HCTZ 50 mg QDay. Nitroglycerin 1/150 sublingually."""

val data = Seq(text).toDS.toDF("text")

val results = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.med_ner.posology.healthcare").predict("""Put your text here.""")

Results

+-------------+---------+
|chunk        |ner_label|
+-------------+---------+
|Aspirin      |Drug     |
|81 milligrams|Strength |
|QDay         |Frequency|
|insulin      |Drug     |
|50 units     |Dosage   |
|in a.m.      |Frequency|
|HCTZ         |Drug     |
|50 mg        |Strength |
|QDay         |Frequency|
|Nitroglycerin|Drug     |
|1/150        |Strength |
|sublingually.|Route    |
+-------------+---------+

Model Information

Model Name: ner_posology_healthcare
Compatibility: Healthcare NLP 3.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en

Benchmarking

label       tp      fp     fn     total      precision  recall  f1
DURATION    995.0   463.0  132.0  1127.0     0.6824     0.8829  0.7698
DRUG        4957.0  632.0  476.0  5433.0     0.8869     0.9124  0.8995
DOSAGE      539.0   183.0  380.0   919.0     0.7465     0.5865  0.6569
ROUTE       676.0   47.0   129.0   805.0      0.935     0.8398  0.8848
FREQUENCY   3688.0  675.0  313.0  4001.0     0.8453     0.9218  0.8819
FORM        1328.0  261.0  294.0  1622.0     0.8357     0.8187  0.8272
STRENGTH    5008.0  687.0  557.0  5565.0     0.8794     0.8999  0.8895
macro-avg     -       -      -       -         -          -     0.82994
micro-avg     -       -      -       -         -          -     0.86743