Extract Community Condition Entities from Social Determinants of Health Texts

Description

SDOH NER model is designed to detect and label social determinants of health (SDOH) community condition-related entities within text data. Social determinants of health are crucial factors that influence individuals’ health outcomes, encompassing various social, economic, and environmental elements. The model has been trained using advanced machine-learning techniques on a diverse range of text sources. The model’s accuracy and precision have been carefully validated against expert-labeled data to ensure reliable and consistent results. Here are the labels of the SDOH NER model with their description:

  • Community_Safety: safety of the neighborhood or places of study or work. “dangerous neighborhood, safe area, etc.”
  • Environmental_Condition: Conditions of the environment where people live. “pollution, air quality, noisy environment, etc.”
  • Food_Insecurity: Food insecurity is defined as a lack of consistent access to enough food for every person in a household to live an active, healthy life. “food insecurity, scarcity of protein, lack of food, etc.”
  • Housing: Conditions of the patient’s living spaces. “homeless, housing, small apartment, etc.”
  • Transportation: mentions of accessibility to transportation means. “car, bus, train, etc.”

Predicted Entities

Community_Safety, Environmental_Condition, Food_Insecurity, Housing, Transportation

Live Demo Open in Colab Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

ner_model = MedicalNerModel.pretrained("ner_sdoh_community_condition", "en", "clinical/models")\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")

ner_converter = NerConverterInternal()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")

pipeline = Pipeline(stages=[
    document_assembler, 
    sentence_detector,
    tokenizer,
    clinical_embeddings,
    ner_model,
    ner_converter   
    ])

sample_texts = [["He is currently experiencing financial stress due to job insecurity, and he lives in a small apartment in a low-income neighbourhood with limited access to green spaces and outdoor recreational activities. There is air pollution in the living area."], ["Patient reports difficulty for affording healthy food and relies on cheaper, processed options. He lives in an unsafe neighborhood. He lives alone"], ["She reports her husband and sons provide transportation to medical appts and do her grocery shopping."], ["He lives with his family, in his own house in a remote town, with a monthly income of $1200 per month. Due to lack of transportation, he is unable to access healthcare. "]]
             
data = spark.createDataFrame(sample_texts).toDF("text")

result = pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")
    .setInputCols("document")
    .setOutputCol("sentence")

val tokenizer = new Tokenizer()
    .setInputCols("sentence")
    .setOutputCol("token")

val clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")

val ner_model = MedicalNerModel.pretrained("ner_sdoh_community_condition", "en", "clinical/models")
    .setInputCols(Array("sentence", "token", "embeddings"))
    .setOutputCol("ner")

val ner_converter = new NerConverterInternal()
    .setInputCols(Array("sentence", "token", "ner"))
    .setOutputCol("ner_chunk")

val pipeline = new Pipeline().setStages(Array(
    document_assembler, 
    sentence_detector,
    tokenizer,
    clinical_embeddings,
    ner_model,
    ner_converter   
))

val data = Seq(Array("He is currently experiencing financial stress due to job insecurity, and he lives in a small apartment in a low-income neighbourhood with limited access to green spaces and outdoor recreational activities. There is air pollution in the living area.", "Patient reports difficulty for affording healthy food and relies on cheaper, processed options. He lives in an unsafe neighborhood. He lives alone", "She reports her husband and sons provide transportation to medical appts and do her grocery shopping.", "He lives with his family, in his own house in a remote town, with a monthly income of $1200 per month. Due to lack of transportation, he is unable to access healthcare. ")).toDS.toDF("text")

val result = pipeline.fit(data).transform(data)

Results

+-------------------------------+-----+---+-----------------------+
|chunk                          |begin|end|ner_label              |
+-------------------------------+-----+---+-----------------------+
|small apartment                |87   |101|Housing                |
|low-income neighbourhood       |108  |131|Community_Safety       |
|green spaces                   |156  |167|Environmental_Condition|
|outdoor recreational activities|173  |203|Environmental_Condition|
|pollution                      |219  |227|Environmental_Condition|
|healthy food                   |41   |52 |Food_Insecurity        |
|unsafe neighborhood            |111  |129|Community_Safety       |
|lives alone                    |135  |145|Housing                |
|transportation                 |41   |54 |Transportation         |
|own house                      |33   |41 |Housing                |
|transportation                 |118  |131|Transportation         |
+-------------------------------+-----+---+-----------------------+

Model Information

Model Name: ner_sdoh_community_condition
Compatibility: Healthcare NLP 4.4.4+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 850.3 KB
Dependencies: embeddings_clinical

References

Internal SDOH Project

Benchmarking

                  label  precision    recall  f1-score   support
       Community_Safety       0.93      0.95      0.94        56
Environmental_Condition       1.00      0.75      0.86         4
        Food_Insecurity       0.79      1.00      0.88        34
                Housing       0.97      0.90      0.94       410
         Transportation       0.92      0.82      0.87        44
              micro-avg       0.95      0.91      0.93       548
              macro-avg       0.92      0.88      0.90       548
           weighted-avg       0.95      0.91      0.93       548