Description
SDOH NER model is designed to detect and label social determinants of health (SDOH) access to healthcare entities within text data. Social determinants of health are crucial factors that influence individuals’ health outcomes, encompassing various social, economic, and environmental element. The model has been trained using advanced machine learning techniques on a diverse range of text sources. The model’s accuracy and precision have been carefully validated against expert-labeled data to ensure reliable and consistent results. Here are the labels of the SDOH NER model with their description:
Access_To_Care
: Patient’s ability or barriers to access the care needed. “long distances, access to health care, rehab program, etc.”Healthcare_Institution
: Health care institution means every place, institution, building or agency. “hospital, clinic, trauma centers, etc.”Insurance_Status
: Information regarding the patient’s insurance status. “uninsured, insured, Medicare, Medicaid, etc.”
Predicted Entities
Access_To_Care
, Healthcare_Institution
, Insurance_Status
Live Demo Open in Colab Copy S3 URI
How to use
from pyspark.sql.types import StringType
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("embeddings")
ner_model = MedicalNerModel.pretrained("ner_sdoh_access_to_healthcare", "en", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = NerConverterInternal()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
pipeline = Pipeline(stages=[
document_assembler,
sentence_detector,
tokenizer,
clinical_embeddings,
ner_model,
ner_converter
])
sample_texts = ["She has a pension and private health insurance, she reports feeling lonely and isolated.", "pt has a Medicare insurance and he visited oncology clinic last week.", "He also reported food insecurity during his childhood and lack of access to adequate healthcare. He is uninsured.", "She used to work as a unit clerk at XYZ Medical Center.","Smith works as a cleaning assistant at ABC Clinic and has access to health insurance. She is aware she needs rehab.", "he has a limited insurance."]
data = spark.createDataFrame(sample_texts, StringType()).toDF("text")
result = pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en")
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val clinical_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val ner_model = MedicalNerModel.pretrained("ner_sdoh_access_to_healthcare", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverterInternal()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(
document_assembler,
sentence_detector,
tokenizer,
clinical_embeddings,
ner_model,
ner_converter
))
val data = Seq(Array("She has a pension and private health insurance, she reports feeling lonely and isolated.", "pt has a Medicare insurance and he visited oncology clinic last week.", "He also reported food insecurity during his childhood and lack of access to adequate healthcare. He is uninsured.", "She used to work as a unit clerk at XYZ Medical Center.","Smith works as a cleaning assistant at ABC Clinic and has access to health insurance. She is aware she needs rehab.", "he has a limited insurance.")).toDS.toDF("text")
val result = pipeline.fit(data).transform(data)
Results
+-----------------------------+-----+---+----------------------+
|chunk |begin|end|ner_label |
+-----------------------------+-----+---+----------------------+
|private health insurance |22 |45 |Insurance_Status |
|Medicare insurance |9 |26 |Insurance_Status |
|oncology clinic |43 |57 |Healthcare_Institution|
|access to adequate healthcare|66 |94 |Access_To_Care |
|uninsured |103 |111|Insurance_Status |
|XYZ Medical Center |36 |53 |Healthcare_Institution|
|ABC Clinic |39 |48 |Healthcare_Institution|
|health insurance |68 |83 |Insurance_Status |
|rehab |109 |113|Access_To_Care |
|limited insurance |9 |25 |Insurance_Status |
+-----------------------------+-----+---+----------------------+
Model Information
Model Name: | ner_sdoh_access_to_healthcare |
Compatibility: | Healthcare NLP 4.4.4+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 850.9 KB |
Dependencies: | embeddings_clinical |
References
Internal SDOH Project
Benchmarking
label precision recall f1-score support
Access_To_Care 0.90 0.92 0.91 483
Healthcare_Institution 0.99 0.95 0.97 726
Insurance_Status 0.93 0.83 0.88 90
micro-avg 0.95 0.93 0.94 1299
macro-avg 0.94 0.90 0.92 1299
weighted-avg 0.95 0.93 0.94 1299