Description
SDOH NER model is designed to detect and label social determinants of health (SDOH) community condition-related entities within text data. Social determinants of health are crucial factors that influence individuals’ health outcomes, encompassing various social, economic, and environmental elements. The model has been trained using advanced machine-learning techniques on a diverse range of text sources. The model’s accuracy and precision have been carefully validated against expert-labeled data to ensure reliable and consistent results. Here are the labels of the SDOH NER model with their description:
Community_Safety
: safety of the neighborhood or places of study or work. “dangerous neighborhood, safe area, etc.”Environmental_Condition
: Conditions of the environment where people live. “pollution, air quality, noisy environment, etc.”Food_Insecurity
: Food insecurity is defined as a lack of consistent access to enough food for every person in a household to live an active, healthy life. “food insecurity, scarcity of protein, lack of food, etc.”Housing
: Conditions of the patient’s living spaces. “homeless, housing, small apartment, etc.”Transportation
: mentions of accessibility to transportation means. “car, bus, train, etc.”
Predicted Entities
Community_Safety
, Environmental_Condition
, Food_Insecurity
, Housing
, Transportation
Live Demo Open in Colab Copy S3 URI
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
ner_model = MedicalNerModel.pretrained("ner_sdoh_community_condition", "en", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = NerConverterInternal()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
pipeline = Pipeline(stages=[
document_assembler,
sentence_detector,
tokenizer,
clinical_embeddings,
ner_model,
ner_converter
])
sample_texts = [["He is currently experiencing financial stress due to job insecurity, and he lives in a small apartment in a low-income neighbourhood with limited access to green spaces and outdoor recreational activities. There is air pollution in the living area."], ["Patient reports difficulty for affording healthy food and relies on cheaper, processed options. He lives in an unsafe neighborhood. He lives alone"], ["She reports her husband and sons provide transportation to medical appts and do her grocery shopping."], ["He lives with his family, in his own house in a remote town, with a monthly income of $1200 per month. Due to lack of transportation, he is unable to access healthcare. "]]
data = spark.createDataFrame(sample_texts).toDF("text")
result = pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val ner_model = MedicalNerModel.pretrained("ner_sdoh_community_condition", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverterInternal()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(
document_assembler,
sentence_detector,
tokenizer,
clinical_embeddings,
ner_model,
ner_converter
))
val data = Seq(Array("He is currently experiencing financial stress due to job insecurity, and he lives in a small apartment in a low-income neighbourhood with limited access to green spaces and outdoor recreational activities. There is air pollution in the living area.", "Patient reports difficulty for affording healthy food and relies on cheaper, processed options. He lives in an unsafe neighborhood. He lives alone", "She reports her husband and sons provide transportation to medical appts and do her grocery shopping.", "He lives with his family, in his own house in a remote town, with a monthly income of $1200 per month. Due to lack of transportation, he is unable to access healthcare. ")).toDS.toDF("text")
val result = pipeline.fit(data).transform(data)
Results
+-------------------------------+-----+---+-----------------------+
|chunk |begin|end|ner_label |
+-------------------------------+-----+---+-----------------------+
|small apartment |87 |101|Housing |
|low-income neighbourhood |108 |131|Community_Safety |
|green spaces |156 |167|Environmental_Condition|
|outdoor recreational activities|173 |203|Environmental_Condition|
|pollution |219 |227|Environmental_Condition|
|healthy food |41 |52 |Food_Insecurity |
|unsafe neighborhood |111 |129|Community_Safety |
|lives alone |135 |145|Housing |
|transportation |41 |54 |Transportation |
|own house |33 |41 |Housing |
|transportation |118 |131|Transportation |
+-------------------------------+-----+---+-----------------------+
Model Information
Model Name: | ner_sdoh_community_condition |
Compatibility: | Healthcare NLP 4.4.4+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 850.3 KB |
Dependencies: | embeddings_clinical |
References
Internal SDOH Project
Benchmarking
label precision recall f1-score support
Community_Safety 0.93 0.95 0.94 56
Environmental_Condition 1.00 0.75 0.86 4
Food_Insecurity 0.79 1.00 0.88 34
Housing 0.97 0.90 0.94 410
Transportation 0.92 0.82 0.87 44
micro-avg 0.95 0.91 0.93 548
macro-avg 0.92 0.88 0.90 548
weighted-avg 0.95 0.91 0.93 548