Description
This NER model extracts valuable information from clinical documents, concentrating on various medical conditions and crucial sections of patient records. With defined labels encompassing diseases such as heart disease, diabetes, and Alzheimer’s, it offers deeper insights into diagnosis and treatment patterns.
Definitions of Predicted Entities
Heart disease
: References to any diagnosed cardiovascular pathology that compromises the heart’s structure or function.
Cerebrovascular disease
: References to diagnosed pathologies that affect cerebral circulation or blood vessels within the brain.
Oncological Disease
: Refers to confirmed diagnoses associated with malignant growths or tumors, which arise from uncontrolled and abnormal cell division.
Respiratory disease
: References to diagnosed pathologies that compromise the structure or function of the respiratory tract.
Obesity
: Diagnosis of the condition characterized by excessive body fat that adversely affects a patient’s health. (overweight and BMI will not be extracted under this label).
Diabetes
: Diagnosis of any form of diabetes mellitus, chronic disease that occurs either when the pancreas does not produce enough insulin or when the body cannot effectively use the insulin it produces.
Infectious disease
: Diagnosed conditions that pertain to diseases caused by infectious pathogens, such as bacteria, viruses, fungi, or parasites.
Kidney disease
: Diagnoses related to pathologies that compromise the renal function or structure.
Mental disorder
: Diagnoses encompassing a wide array of psychiatric or psychological disorders that affect cognitive, emotional, or behavioral function.
Alzheimer Disease
: Specific diagnosis of Alzheimer’s disease, a neurodegenerative disorder characterized by progressive cognitive decline.
Patient info header
: The section of a document that contains essential details about the patient such as the patient’s full name, date of birth, gender, contact information, insurance details and any other pertinent demographic data necessary for accurate patient identification.
Medical History Header
: Identifies the section of a medical document that contains a summary of the patient’s medical conditions. It encompasses details about the patient’s overall health, including chronic illnesses, past injuries, and significant events related to their health.
Clinical History Header
: Identifies section headers that refer to the patient’s clinical history, including previous and ongoing healthcare encounters and interventions.
History of Present Illness Header
: Identifies section headers that refer to the narrative description of the development of the patient’s present illness from the first sign or symptom until the present.
Medications Header
: Identifies section headers that pertain to the patient’s current and past medications.
Allergies Header
: Identifies section headers that detail the patient’s known allergies, including drug allergies and other types of hypersensitivity reactions.
Laboratory Results Header
: Identifies section headers that include results of lab tests, such as blood tests, urine tests, or other laboratory examinations.
Imaging Studies Header
: Identifies section headers that summarize the findings of imaging studies, such as X-rays, CT scans, or MRI scans.
Diagnosis Header
: Identifies section headers that list the patient’s current and past diagnoses.
Treatment Plan Header
: Identifies section headers that outline the patient’s management plan, including medications, therapies, surgeries, or other interventions.
Predicted Entities
Heart disease
, Cerebrovascular disease
, Respiratory disease
, Alzheimer Disease
, Obesity
, Oncological Disease
, Diabetes
, Infectious disease
, Kidney disease
, Mental disorder
, Patient info header
, Medical History Header
, Clinical History Header
, History of Present Illness Header
, Medications Header
, Allergies Header
, Laboratory Results Header
, Imaging Studies Header
, Diagnosis Header
, Treatment Plan Header
How to use
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical","en","clinical/models")\
.setInputCols(["sentence","token"])\
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_section_header_diagnosis", "en","clinical/models")\
.setInputCols(["sentence","token","embeddings"])\
.setOutputCol("ner")\
.setLabelCasing("upper") #decide if we want to return the tags in upper or lower case
ner_converter = NerConverterInternal()\
.setInputCols(["sentence","token","ner"])\
.setOutputCol("ner_chunk")
nlpPipeline = Pipeline(stages=[
documentAssembler,
sentenceDetector,
tokenizer,
word_embeddings,
clinical_ner,
ner_converter])
text = '''
Patient Name: Samantha Johnson
Age: 52
Gender: Female
Patient Info:
Name: Samantha Johnson
Age: 52
Gender: Female
Medical History:
Patient has a history of Chronic respiratory disease.
Clinical History:
Patient presented with shortness of breath and chest pain.
Chief Complaint:
Patient complained of chest pain and difficulty breathing.
History of Present Illness:
Patient has been experiencing chest pain and shortness of breath for the past week. Symptoms were relieved by medication at first but became worse over time.
Past Medical History:
Patient has a history of Asthma and was previously diagnosed with Bronchitis.
Medications:
Patient is currently taking Albuterol, Singulair, and Advair for respiratory issues.
Allergies:
Patient has a documented allergy to Penicillin.
Physical Examination:
Patient had diffuse wheezing and decreased breath sounds on lung auscultation. Heart rate and rhythm were regular.
Laboratory Results:
Pulmonary function test results showed a decrease in Forced Expiratory Volume in one second (FEV1).
Imaging Studies:
Chest x-ray showed bilateral infiltrates consistent with Chronic obstructive pulmonary disease (COPD).
Diagnosis:
The patient was diagnosed with COPD exacerbation.
Treatment Plan:
The patient was managed with nebulized bronchodilators, steroid therapy, and oxygen as needed. The patient was discharged with instructions to continue bronchodilator and steroid therapy and to follow up with primary care physician in two weeks.
'''
data = spark.createDataFrame([[text]]).toDF("text")
result = nlpPipeline.fit(data).transform(data)
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentenceDetector = new SentenceDetector()
.setInputCols(Array("document"))
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols(Array("sentence"))
.setOutputCol("token")
val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val clinical_ner = MedicalNerModel.pretrained("ner_section_header_diagnosis", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
.setLabelCasing("upper") // decide if we want to return the tags in upper or lower case
val ner_converter = new NerConverterInternal()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val nlpPipeline = new Pipeline()
.setStages(Array(documentAssembler, sentenceDetector, tokenizer, word_embeddings, clinical_ner, ner_converter))
val text = '''
Patient Name: Samantha Johnson
Age: 52
Gender: Female
Patient Info:
Name: Samantha Johnson
Age: 52
Gender: Female
Medical History:
Patient has a history of Chronic respiratory disease.
Clinical History:
Patient presented with shortness of breath and chest pain.
Chief Complaint:
Patient complained of chest pain and difficulty breathing.
History of Present Illness:
Patient has been experiencing chest pain and shortness of breath for the past week. Symptoms were relieved by medication at first but became worse over time.
Past Medical History:
Patient has a history of Asthma and was previously diagnosed with Bronchitis.
Medications:
Patient is currently taking Albuterol, Singulair, and Advair for respiratory issues.
Allergies:
Patient has a documented allergy to Penicillin.
Physical Examination:
Patient had diffuse wheezing and decreased breath sounds on lung auscultation. Heart rate and rhythm were regular.
Laboratory Results:
Pulmonary function test results showed a decrease in Forced Expiratory Volume in one second (FEV1).
Imaging Studies:
Chest x-ray showed bilateral infiltrates consistent with Chronic obstructive pulmonary disease (COPD).
Diagnosis:
The patient was diagnosed with COPD exacerbation.
Treatment Plan:
The patient was managed with nebulized bronchodilators, steroid therapy, and oxygen as needed. The patient was discharged with instructions to continue bronchodilator and steroid therapy and to follow up with primary care physician in two weeks.
'''
val data: DataFrame = Seq(text).toDS.toDF("text")
val result = nlpPipeline.fit(data).transform(data)
Results
|index|chunks|begin|end|sentence\_id|entities|confidence|
|---|---|---|---|---|---|---|
|0|Patient Info|55|66|0|PATIENT\_INFO\_HEADER|0\.91190004|
|1|Medical History|115|129|0|MEDICAL\_HISTORY\_HEADER|0\.8115|
|2|Chronic respiratory disease|157|183|0|RESPIRATORY\_DISEASE|0\.7356667|
|3|Clinical History|186|201|1|CLINICAL\_HISTORY\_HEADER|0\.76595|
|4|Chief Complaint|263|277|2|CHIEF\_COMPLAINT\_HEADER|0\.8484|
|5|History of Present Illness|339|364|3|HISTORY\_PRES\_ILNESS\_HEADER|0\.9933|
|6|Past Medical History|525|544|5|MEDICAL\_HISTORY\_HEADER|0\.7084667|
|7|Asthma|572|577|5|RESPIRATORY\_DISEASE|0\.9994|
|8|Bronchitis|613|622|5|RESPIRATORY\_DISEASE|0\.8429|
|9|Medications|625|635|6|MEDICATIONS\_HEADER|0\.9991|
|10|Allergies|723|731|7|ALLERGIES\_HEADER|0\.9999|
|11|Laboratory Results|919|936|10|LAB\_RESULTS\_HEADER|0\.95780003|
|12|Imaging Studies|1039|1053|11|IMAGING\_HEADER|0\.93614995|
|13|Chronic obstructive pulmonary disease|1113|1149|11|RESPIRATORY\_DISEASE|0\.816625|
|14|COPD|1152|1155|11|RESPIRATORY\_DISEASE|0\.9985|
|15|Diagnosis|1159|1167|12|DIAGNOSIS\_HEADER|0\.9993|
|16|COPD exacerbation|1201|1217|12|RESPIRATORY\_DISEASE|0\.87365|
|17|Treatment Plan|1220|1233|13|TREATMENT\_PLAN\_HEADER|0\.95054996|
Model Information
Model Name: | ner_section_header_diagnosis |
Compatibility: | Healthcare NLP 5.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 3.0 MB |
References
trained by in-house datasets
Benchmarking
label tp fp fn total precision recall f1
B-Allergies_header 165 2 3 168 0.988024 0.982143 0.985075
I-Allergies_header 72 4 2 74 0.947368 0.972973 0.960000
B-Alzheimer 372 5 2 374 0.986737 0.994652 0.990679
I-Alzheimer 253 3 2 255 0.988281 0.992157 0.990215
B-Cerebrovascular_disease 298 23 23 321 0.928349 0.928349 0.928349
I-Cerebrovascular_disease 304 15 11 315 0.952978 0.965079 0.958991
B-Chief_complaint_header 216 5 22 238 0.977376 0.907563 0.941176
I-Chief_complaint_header 204 5 27 231 0.976077 0.883117 0.927273
B-Clinical_history_header 165 40 4 169 0.804878 0.976331 0.882353
I-Clinical_history_header 165 40 5 170 0.804878 0.970588 0.880000
B-Diabetes 806 19 14 820 0.976970 0.982927 0.979939
I-Diabetes 742 16 5 747 0.978892 0.993307 0.986047
B-Diagnosis_header 251 8 12 263 0.969112 0.954373 0.961686
I-Diagnosis_header 13 5 6 19 0.722222 0.684211 0.702703
B-Heart_disease 846 24 31 877 0.972414 0.964652 0.968517
I-Heart_disease 666 25 25 691 0.963821 0.963821 0.963821
B-History_pres_ilness_header 217 2 1 218 0.990868 0.995413 0.993135
I-History_pres_ilness_header 729 12 5 734 0.983806 0.993188 0.988475
B-Imaging_header 181 6 5 186 0.967914 0.973118 0.970509
I-Imaging_header 203 6 3 206 0.971292 0.985437 0.978313
B-Infectious_disease 279 41 19 298 0.871875 0.936242 0.902913
I-Infectious_disease 278 22 14 292 0.926667 0.952055 0.939189
B-Kidney_disease 520 1 5 525 0.998081 0.990476 0.994264
I-Kidney_disease 915 0 6 921 1.000000 0.993485 0.996732
B-Lab_results_header 213 0 12 225 1.000000 0.946667 0.972603
I-Lab_results_header 259 1 18 277 0.996154 0.935018 0.964618
B-Medical_history_header 443 23 21 464 0.950644 0.954741 0.952688
I-Medical_history_header 751 40 28 779 0.949431 0.964056 0.956688
B-Medications_header 226 7 9 235 0.969957 0.961702 0.965812
I-Medications_header 70 8 8 78 0.897436 0.897436 0.897436
B-Mental_disorder 467 26 28 495 0.947262 0.943434 0.945344
I-Mental_disorder 320 18 19 339 0.946746 0.943953 0.945347
B-Obesity 605 0 8 613 1.000000 0.986949 0.993432
B-Oncological_disease 398 23 8 406 0.945368 0.980296 0.962515
I-Oncological_disease 411 12 5 416 0.971631 0.987981 0.979738
B-Patient_info_header 285 8 5 290 0.972696 0.982759 0.977702
I-Patient_info_header 291 8 4 295 0.973244 0.986441 0.979798
B-Respiratory_disease 1077 18 14 1091 0.983562 0.987168 0.985361
I-Respiratory_disease 585 11 14 599 0.981544 0.976628 0.979079
B-Treatment_plan_header 263 15 9 272 0.946043 0.966912 0.956364
I-Treatment_plan_header 263 15 1 264 0.946043 0.996212 0.970480
Macro-average 15787 562 463 - 0.951869 0.95936 0.95560
Micro-average 15787 562 463 - 0.965624 0.97150 0.96855