Detect Problems, Tests and Treatments (LangTest)

Description

Pretrained named entity recognition deep learning model for clinical terms. The SparkNLP deep learning model (MedicalNerModel) is inspired by a former state-of-the-art model for NER: Chiu & Nicols, Named Entity Recognition with Bidirectional LSTM-CNN. It is the version of ner_clinical model augmented with langtest library.

test_type	before fail_count	after fail_count	before pass_count	after pass_count	minimum pass_rate	before pass_rate	after pass_rate
add_ocr_typo	470	154	635	951	80%	57%	86%
lowercase	174	150	1308	1332	80%	88%	90%
strip_punctuation	49	32	1190	1207	80%	96%	97%
titlecase	421	297	1012	1136	70%	71%	79%
uppercase	763	379	615	999	70%	45%	72%
weighted average	1877	1012	4760	5625	76%	71.72%	84.75%

Predicted Entities

PROBLEM, TEST, TREATMENT

Download Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")\
    .setInputCols(["document"])\ 
    .setOutputCol("sentence")

tokenizer = Tokenizer() \
    .setInputCols(["sentence"]) \
    .setOutputCol("token")

word_embeddings = WordEmbeddingsModel().pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"]) \
    .setOutputCol("embeddings")

ner = MedicalNerModel.pretrained("ner_clinical_langtest", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")

ner_converter = NerConverter() \
    .setInputCols(["sentence", "token", "ner"]) \
    .setOutputCol("ner_chunk")

pipeline = Pipeline(stages=[document_assembler,
                            sentence_detector,
                            tokenizer,
                            word_embeddings,
                            ner,
                            ner_converter])

data = spark.createDataFrame([["""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, and associated with an acute hepatitis, presented with a one-week history of polyuria, poor appetite, and vomiting. She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG. She had been on dapagliflozin for six months at the time of presentation. Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness, guarding, or rigidity. Pertinent laboratory findings on admission were: serum glucose 111 mg/dl,  creatinine 0.4 mg/dL, triglycerides 508 mg/dL, total cholesterol 122 mg/dL, and venous pH 7.27."""]]).toDF("text")

result = pipeline.fit(data).transform(data)

val document_assembler = new DocumentAssembler()
        .setInputCol("text")
        .setOutputCol("document")
         
val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl_healthcare", "en", "clinical/models")
        .setInputCols("document") 
        .setOutputCol("sentence")

val tokenizer = new Tokenizer()
        .setInputCols("sentence")
        .setOutputCol("token")

val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
        .setInputCols(Array("sentence", "token"))
        .setOutputCol("embeddings")

val ner = MedicalNerModel.pretrained("ner_clinical_langtest", "en", "clinical/models")
        .setInputCols(Array("sentence", "token", "embeddings"))
        .setOutputCol("ner")

val ner_converter = new NerConverter()
   	    .setInputCols(Array("sentence", "token", "ner"))
   	    .setOutputCol("ner_chunk")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, word_embeddings, ner, ner_converter))

val data = Seq("""A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus (T2DM), one prior episode of HTG-induced pancreatitis three years prior to presentation, and associated with an acute hepatitis, presented with a one-week history of polyuria, poor appetite, and vomiting. She was on metformin, glipizide, and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG. She had been on dapagliflozin for six months at the time of presentation. Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness, guarding, or rigidity. Pertinent laboratory findings on admission were: serum glucose 111 mg/dl,  creatinine 0.4 mg/dL, triglycerides 508 mg/dL, total cholesterol 122 mg/dL, and venous pH 7.27.""").toDS().toDF("text")

val result = pipeline.fit(data).transform(data)

Results

+------------------------------------+---------+
|chunk                               |ner_label|
+------------------------------------+---------+
|gestational diabetes mellitus       |PROBLEM  |
|type two diabetes mellitus          |PROBLEM  |
|T2DM                                |PROBLEM  |
|HTG-induced pancreatitis            |PROBLEM  |
|an acute hepatitis                  |PROBLEM  |
|polyuria                            |PROBLEM  |
|poor appetite                       |PROBLEM  |
|vomiting                            |PROBLEM  |
|metformin                           |TREATMENT|
|glipizide                           |TREATMENT|
|dapagliflozin                       |TREATMENT|
|T2DM                                |PROBLEM  |
|atorvastatin                        |TREATMENT|
|gemfibrozil                         |TREATMENT|
|HTG                                 |PROBLEM  |
|dapagliflozin                       |TREATMENT|
|Physical examination on presentation|TEST     |
|dry oral mucosa                     |PROBLEM  |
|her abdominal examination           |TEST     |
|tenderness                          |PROBLEM  |
|guarding                            |PROBLEM  |
|rigidity                            |PROBLEM  |
|serum glucose                       |TEST     |
|creatinine                          |TEST     |
|triglycerides                       |TEST     |
|total cholesterol                   |TEST     |
|venous pH                           |TEST     |
+------------------------------------+---------+

Model Information

Model Name:	ner_clinical_langtest
Compatibility:	Healthcare NLP 5.1.0+
License:	Licensed
Edition:	Official
Input Labels:	[sentence, token, embeddings]
Output Labels:	[ner]
Language:	en
Size:	14.5 MB

References

Trained on augmented version of 2010 i2b2 challenge data with ‘embeddings_clinical’. https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/

Benchmarking

label         precision  recall  f1-score  support 
PROBLEM       0.85       0.86    0.86      1085    
TEST          0.89       0.87    0.88      717     
TREATMENT     0.87       0.85    0.86      667     
micro-avg     0.87       0.86    0.86      2469    
macro-avg     0.87       0.86    0.86      2469    
weighted-avg  0.87       0.86    0.86      2469    

PREVIOUSText-to-SQL Generation (Custom_DB_Schema_Single_Table_Augmented)

NEXTDetect PHI for Deidentification (LangTest - Generic - Augmented)