Detect Social Determinants of Health Mentions

Description

This Named Entity Recognition model is intended for detecting Social Determinants of Health mentions in clinical notes and trained by using MedicalNerApproach annotator that allows to train generic NER models based on Neural Networks.

Entitiy Name Descriptions Sample Texts chunks+labels
sdoh_community The patient’s social
and community networks, including family members,
friends, and other social connections.
- He has a 27 yo son.
- The patient lives with mother.
- She is a widow.
- Married and has two children.
- (son),(sdoh_community)
- (mother),(sdoh_community)
- (widow),(sdoh_community)
- (Married, children),(sdoh_community,sdoh_community)
sdoh_economics The patient’s economic
status and financial resources, including their
occupation, income, and employment status.
- The patient worked as an accountant.
- He is a retired history professor.
- She is a lawyer.
- Worked in insurance, currently unemployed.
- (worked),(sdoh_economics)
- (retired),(sdoh_economics)
- (lawyer),(sdoh_economics)
- (worked, unemployed),(sdoh_economics, sdoh_economics)
sdoh_education The patient’s education-related
passages such as schooling, college, or degrees attained.
- She graduated from high school.
- He is a fourth grade teacher in inner city.
- He completed some college.
- (graduated from high school),(sdoh_education)
- (teacher),(sdoh_education)
- (college),(sdoh_education)
sdoh_environment The patient’s living
environment and access to housing.
- He lives at home.
- Her other daughter lives in the apartment below.
- The patient lives with her husband in a retirement community.
- (home),(sdoh_environment)
- (apartment),(sdoh_environment)
- (retirement community),(sdoh_environment)
behavior_tobacco This entity is labeled based on any indication of
the patient’s current or past tobacco
use and smoking history
- She smoked one pack a day for forty years.
- The patient denies tobacco use.
- The patient smokes an occasional cigar.
- (smoked one pack),(behavior_tobacco)
- (tobacco),(behavior_tobacco)
- (smokes an occasional cigar),(behavior_tobacco)
behavior_alcohol This entity is used to label indications of
the patient’s alcohol consumption.
- She drinks alcohol.
- The patient denies ethanol.
- He denies ETOH.
- (alcohol),(behavior_alcohol)
- (ethanol),(behavior_alcohol)
- (ETOH),(behavior_alcohol)
behavior_drug This entity is used to label any indications of
the patient’s current or past drug use.
- She denies any intravenous drug abuse.
- No illicit drug use including IV per family.
- The patient any using recreational drugs.
- (intravenous drug),(behavior_drug)
- (illicit drug, IV),(behavior_drug, behavior_drug)
- (recreational drugs),(behavior_drug)

Predicted Entities

sdoh_community, sdoh_economics, sdoh_education, sdoh_environment, behavior_tobacco, behavior_alcohol, behavior_drug

Live Demo Open in Colab Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")\
      
sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
    .setInputCols("document")\
    .setOutputCol("sentence")
    
tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")
    
embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols("sentence", "token")\
    .setOutputCol("embeddings")
    
ner_model = MedicalNerModel.pretrained("ner_sdoh_mentions", "en", "clinical/models")\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")
    
ner_converter = NerConverterInternal()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")
    
nlpPipeline = Pipeline(stages=[
    document_assembler,
    sentenceDetector,
    tokenizer,
    embeddings,
    ner_model,
    ner_converter])

data = spark.createDataFrame([["Mr. Known lastname 9880 is a pleasant, cooperative gentleman with a long standing history (20 years) diverticulitis. He is married and has 3 children. He works in a bank. He denies any alcohol or intravenous drug use. He has been smoking for many years."]]).toDF("text")

result = nlpPipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")
    
val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")
    .setInputCols("document")
    .setOutputCol("sentence")
    
val tokenizer = new Tokenizer()
    .setInputCols("sentence")
    .setOutputCol("token")
    
val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")
    
val ner_model = MedicalNerModel.pretrained("ner_sdoh_mentions", "en", "clinical/models")
    .setInputCols(Array("sentence", "token", "embeddings"))
    .setOutputCol("ner")
    
val ner_converter = new NerConverterInternal()
    .setInputCols(Array("sentence", "token", "ner"))
    .setOutputCol("ner_chunk")
    
val nlpPipeline = new PipelineModel().setStages(Array(
    document_assembler, 
    sentenceDetector,
    tokenizer,
    embeddings,
    ner_model,
    ner_converter))

val data = Seq("Mr. Known lastname 9880 is a pleasant, cooperative gentleman with a long standing history (20 years) diverticulitis. He is married and has 3 children. He works in a bank. He denies any alcohol or intravenous drug use. He has been smoking for many years.").toDS.toDF("text")

val result = nlpPipeline.fit(data).transform(data)
import nlu
nlu.load("en.med_ner.sdoh_mentions").predict("""Mr. Known lastname 9880 is a pleasant, cooperative gentleman with a long standing history (20 years) diverticulitis. He is married and has 3 children. He works in a bank. He denies any alcohol or intravenous drug use. He has been smoking for many years.""")

Results

+----------------+----------------+
|chunk           |ner_label       |
+----------------+----------------+
|married         |sdoh_community  |
|children        |sdoh_community  |
|works           |sdoh_economics  |
|alcohol         |behavior_alcohol|
|intravenous drug|behavior_drug   |
|smoking         |behavior_tobacco|
+----------------+----------------+

Model Information

Model Name: ner_sdoh_mentions
Compatibility: Healthcare NLP 4.2.2+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 15.1 MB

Benchmarking

           label  precision    recall  f1-score   support
behavior_alcohol       0.95      0.94      0.94       798
   behavior_drug       0.93      0.92      0.92       366
behavior_tobacco       0.95      0.95      0.95       936
  sdoh_community       0.97      0.97      0.97       969
  sdoh_economics       0.95      0.91      0.93       363
  sdoh_education       0.69      0.65      0.67        34
sdoh_environment       0.93      0.90      0.92       651
       micro-avg       0.95      0.94      0.94      4117
       macro-avg       0.91      0.89      0.90      4117
    weighted-avg       0.95      0.94      0.94      4117