Detect Problems, Tests and Treatments (LangTest)

Description

Pretrained named entity recognition deep learning model for clinical terms. The SparkNLP deep learning model (MedicalNerModel) is inspired by a former state-of-the-art model for NER: Chiu & Nicols, Named Entity Recognition with Bidirectional LSTM-CNN. It is the version of ner_clinical model augmented with langtest library.

test_type before fail_count after fail_count before pass_count after pass_count minimum pass_rate before pass_rate after pass_rate
add_ocr_typo 470 154 635 951 80% 57% 86%
lowercase 174 150 1308 1332 80% 88% 90%
strip_punctuation 49 32 1190 1207 80% 96% 97%
titlecase 421 297 1012 1136 70% 71% 79%
uppercase 763 379 615 999 70% 45% 72%
weighted average 1877 1012 4760 5625 76% 71.72% 84.75%

Predicted Entities

PROBLEM, TEST, TREATMENT

Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")\
    .setInputCols(["document"])\ 
    .setOutputCol("sentence")

tokenizer = Tokenizer() \
    .setInputCols(["sentence"]) \
    .setOutputCol("token")

word_embeddings = WordEmbeddingsModel().pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"]) \
    .setOutputCol("embeddings")

ner = MedicalNerModel.pretrained("ner_clinical_langtest", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_converter = NerConverter() \
    .setInputCols(["sentence", "token", "ner"]) \
    .setOutputCol("ner_chunk")

pipeline = Pipeline(stages=[document_assembler,
                            sentence_detector,
                            tokenizer,
                            word_embeddings,
                            ner,
                            ner_converter])

data = spark.createDataFrame([["""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, and associated with an acute hepatitis, presented with a one-week history of polyuria, poor appetite, and vomiting. She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG. She had been on dapagliflozin for six months at the time of presentation. Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness, guarding, or rigidity. Pertinent laboratory findings on admission were: serum glucose 111 mg/dl,  creatinine 0.4 mg/dL, triglycerides 508 mg/dL, total cholesterol 122 mg/dL, and venous pH 7.27."""]]).toDF("text")

result = pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
        .setInputCol("text")
        .setOutputCol("document")
         
val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")
        .setInputCols("document") 
        .setOutputCol("sentence")

val tokenizer = new Tokenizer()
        .setInputCols("sentence")
        .setOutputCol("token")

val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
        .setInputCols(Array("sentence", "token"))
        .setOutputCol("embeddings")

val ner = MedicalNerModel.pretrained("ner_clinical_langtest", "en", "clinical/models")
        .setInputCols(Array("sentence", "token", "embeddings"))
        .setOutputCol("ner")

val ner_converter = new NerConverter()
   	    .setInputCols(Array("sentence", "token", "ner"))
   	    .setOutputCol("ner_chunk")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, ner, ner_converter))

val data = Seq("""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, and associated with an acute hepatitis, presented with a one-week history of polyuria, poor appetite, and vomiting. She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG. She had been on dapagliflozin for six months at the time of presentation. Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness, guarding, or rigidity. Pertinent laboratory findings on admission were: serum glucose 111 mg/dl,  creatinine 0.4 mg/dL, triglycerides 508 mg/dL, total cholesterol 122 mg/dL, and venous pH 7.27.""").toDS().toDF("text")

val result = pipeline.fit(data).transform(data)

Results

+------------------------------------+---------+
|chunk                               |ner_label|
+------------------------------------+---------+
|gestational diabetes mellitus       |PROBLEM  |
|type two diabetes mellitus          |PROBLEM  |
|T2DM                                |PROBLEM  |
|HTG-induced pancreatitis            |PROBLEM  |
|an acute hepatitis                  |PROBLEM  |
|polyuria                            |PROBLEM  |
|poor appetite                       |PROBLEM  |
|vomiting                            |PROBLEM  |
|metformin                           |TREATMENT|
|glipizide                           |TREATMENT|
|dapagliflozin                       |TREATMENT|
|T2DM                                |PROBLEM  |
|atorvastatin                        |TREATMENT|
|gemfibrozil                         |TREATMENT|
|HTG                                 |PROBLEM  |
|dapagliflozin                       |TREATMENT|
|Physical examination on presentation|TEST     |
|dry oral mucosa                     |PROBLEM  |
|her abdominal examination           |TEST     |
|tenderness                          |PROBLEM  |
|guarding                            |PROBLEM  |
|rigidity                            |PROBLEM  |
|serum glucose                       |TEST     |
|creatinine                          |TEST     |
|triglycerides                       |TEST     |
|total cholesterol                   |TEST     |
|venous pH                           |TEST     |
+------------------------------------+---------+

Model Information

Model Name: ner_clinical_langtest
Compatibility: Healthcare NLP 5.1.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 14.5 MB

References

Trained on augmented version of 2010 i2b2 challenge data with ‘embeddings_clinical’. https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/

Benchmarking

label         precision  recall  f1-score  support 
PROBLEM       0.85       0.86    0.86      1085    
TEST          0.89       0.87    0.88      717     
TREATMENT     0.87       0.85    0.86      667     
micro-avg     0.87       0.86    0.86      2469    
macro-avg     0.87       0.86    0.86      2469    
weighted-avg  0.87       0.86    0.86      2469