Description
SDOH NER model is designed to detect and label social determinants of health (SDOH) health behavior and problem related entities within text data. Social determinants of health are crucial factors that influence individuals’ health outcomes, encompassing various social, economic, and environmental elements. The model has been trained using advanced machine-learning techniques on a diverse range of text sources. The model’s accuracy and precision have been carefully validated against expert-labeled data to ensure reliable and consistent results. Here are the labels of the SDOH NER model with their description:
Communicable_Disease
: Include all the communicable diseases. “HIV, hepatitis, tuberculosis, sexually transmitted diseases, etc.”Diet
: Information regarding the patient’s dietary habits. “vegetarian, vegan, healthy foods, low-calorie diet, etc.”Disability
: Mentions related to disabilityEating_Disorder
: This entity is used to extract eating disorders. “anorexia, bulimia, pica, etc.”Exercise
: Mentions of the exercise habits of a patient. “exercise, physical activity, play football, go to the gym, etc.”Hyperlipidemia
: Terms that indicate hyperlipidemia and relevant subtypes. “hyperlipidemia, hypercholesterolemia, elevated cholesterol, etc.”Hypertension
: Terms related to hypertension. “hypertension, high blood pressure, etc.”Mental_Health
: Include all the mental, neurodegenerative, and neurodevelopmental diagnoses, disorders, conditions, or syndromes mentioned. “depression, anxiety, bipolar disorder, psychosis, etc.”Obesity
: Terms related to the patient being obese. “obesity, overweight, etc.”Other_Disease
: Include all the diseases mentioned. “psoriasis, thromboembolism, etc.”Quality_Of_Life
: Quality of life refers to how an individual feels about their current station in life. “ lower quality of life, profoundly impact his quality of life, etc.”Sexual_Activity
: Mentions of patient’s sexual behaviors. “monogamous, sexual activity, inconsistent condom use, etc.”
Predicted Entities
Communicable_Disease
, Diet
, Disability
, Eating_Disorder
, Exercise
, Hyperlipidemia
, Hypertension
, Mental_Health
, Obesity
, Other_Disease
, Quality_Of_Life
, Sexual_Activity
Live Demo Open in Colab Copy S3 URI
How to use
from pyspark.sql.types import StringType
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
ner_model = MedicalNerModel.pretrained("ner_sdoh_health_behaviours_problems", "en", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = NerConverterInternal()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
pipeline = Pipeline(stages=[
document_assembler,
sentence_detector,
tokenizer,
clinical_embeddings,
ner_model,
ner_converter
])
sample_texts = ["She has not been getting regular exercise and not followed the diet for approximately two years due to chronic sciatic pain.", "Medical History: The patient is a 32-year-old female who presents with a history of anxiety, depression, bulimia nervosa, elevated cholesterol, and substance abuse. She used to play basketball and tennis.", "Pt was intubated at the scene & currently sedated due to high BP. Also, he is currently on social security disability.", "A 28-year-old single female teacher presented with concerns about her overall health and well-being. She had a history of hypertension and hyperlipidemia. Her sedentary lifestyle and poor diet contributed to obesity, negatively impacting her quality of life and self-esteem. She expressed a desire to improve her lifestyle, lose weight, and address her mental well-being and sexual satisfaction. She is also advised to go to the gym."]
data = spark.createDataFrame(sample_texts, StringType()).toDF("text")
result = pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val ner_model = MedicalNerModel.pretrained("ner_sdoh_health_behaviours_problems", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverterInternal()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(
document_assembler,
sentence_detector,
tokenizer,
clinical_embeddings,
ner_model,
ner_converter
))
val data = Seq(Array("She has not been getting regular exercise and not followed the diet for approximately two years due to chronic sciatic pain.", "Medical History: The patient is a 32-year-old female who presents with a history of anxiety, depression, bulimia nervosa, elevated cholesterol, and substance abuse. She used to play basketball and tennis.", "Pt was intubated at the scene & currently sedated due to high BP. Also, he is currently on social security disability.", "A 28-year-old single female teacher presented with concerns about her overall health and well-being. She had a history of hypertension and hyperlipidemia. Her sedentary lifestyle and poor diet contributed to obesity, negatively impacting her quality of life and self-esteem. She expressed a desire to improve her lifestyle, lose weight, and address her mental well-being and sexual satisfaction. She is also advised to go to the gym.")).toDS.toDF("text")
val result = pipeline.fit(data).transform(data)
Results
+--------------------+-----+---+---------------+
|chunk |begin|end|ner_label |
+--------------------+-----+---+---------------+
|regular exercise |25 |40 |Exercise |
|diet |63 |66 |Diet |
|chronic sciatic pain|103 |122|Other_Disease |
|anxiety |84 |90 |Mental_Health |
|depression |93 |102|Mental_Health |
|bulimia nervosa |105 |119|Eating_Disorder|
|basketball |182 |191|Exercise |
|tennis |197 |202|Exercise |
|high BP |57 |63 |Hypertension |
|disability |107 |116|Disability |
|overall health |70 |83 |Quality_Of_Life|
|well-being |89 |98 |Quality_Of_Life|
|hypertension |122 |133|Hypertension |
|hyperlipidemia |139 |152|Hyperlipidemia |
|sedentary lifestyle |159 |177|Exercise |
|poor diet |183 |191|Diet |
|obesity |208 |214|Obesity |
|quality of life |242 |256|Quality_Of_Life|
|self-esteem |262 |272|Quality_Of_Life|
|mental well-being |353 |369|Mental_Health |
|sexual satisfaction |375 |393|Sexual_Activity|
|gym |429 |431|Exercise |
+--------------------+-----+---+---------------+
Model Information
Model Name: | ner_sdoh_health_behaviours_problems |
Compatibility: | Healthcare NLP 4.4.4+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 3.0 MB |
Dependencies: | embeddings_clinical |
References
Internal SDOH Project
Benchmarking
label precision recall f1-score support
Communicable_Disease 0.77 1.00 0.87 17
Diet 0.93 0.95 0.94 44
Disability 0.98 1.00 0.99 53
Eating_Disorder 0.89 0.94 0.91 33
Exercise 0.86 0.98 0.92 52
Hyperlipidemia 1.00 0.85 0.92 13
Hypertension 0.95 1.00 0.98 21
Mental_Health 0.92 0.92 0.92 476
Obesity 1.00 0.71 0.83 14
Other_Disease 0.89 0.92 0.91 628
Quality_Of_Life 0.79 0.96 0.87 47
Sexual_Activity 0.86 0.86 0.86 29
micro-avg 0.90 0.93 0.91 1427
macro-avg 0.90 0.92 0.91 1427
weighted-avg 0.90 0.93 0.91 1427