Detect Social Determinants of Health Mentions

Description

This Named Entity Recognition model is intended for detecting Social Determinants of Health mentions in clinical notes and trained by using MedicalNerApproach annotator that allows to train generic NER models based on Neural Networks.

Entitiy Name	Descriptions	Sample Texts	chunks+labels
sdoh_community	The patient’s social and community networks, including family members, friends, and other social connections.	- He has a 27 yo son. - The patient lives with mother. - She is a widow. - Married and has two children.	- (son),(sdoh_community) - (mother),(sdoh_community) - (widow),(sdoh_community) - (Married, children),(sdoh_community,sdoh_community)
sdoh_economics	The patient’s economic status and financial resources, including their occupation, income, and employment status.	- The patient worked as an accountant. - He is a retired history professor. - She is a lawyer. - Worked in insurance, currently unemployed.	- (worked),(sdoh_economics) - (retired),(sdoh_economics) - (lawyer),(sdoh_economics) - (worked, unemployed),(sdoh_economics, sdoh_economics)
sdoh_education	The patient’s education-related passages such as schooling, college, or degrees attained.	- She graduated from high school. - He is a fourth grade teacher in inner city. - He completed some college.	- (graduated from high school),(sdoh_education) - (teacher),(sdoh_education) - (college),(sdoh_education)
sdoh_environment	The patient’s living environment and access to housing.	- He lives at home. - Her other daughter lives in the apartment below. - The patient lives with her husband in a retirement community.	- (home),(sdoh_environment) - (apartment),(sdoh_environment) - (retirement community),(sdoh_environment)
behavior_tobacco	This entity is labeled based on any indication of the patient’s current or past tobacco use and smoking history	- She smoked one pack a day for forty years. - The patient denies tobacco use. - The patient smokes an occasional cigar.	- (smoked one pack),(behavior_tobacco) - (tobacco),(behavior_tobacco) - (smokes an occasional cigar),(behavior_tobacco)
behavior_alcohol	This entity is used to label indications of the patient’s alcohol consumption.	- She drinks alcohol. - The patient denies ethanol. - He denies ETOH.	- (alcohol),(behavior_alcohol) - (ethanol),(behavior_alcohol) - (ETOH),(behavior_alcohol)
behavior_drug	This entity is used to label any indications of the patient’s current or past drug use.	- She denies any intravenous drug abuse. - No illicit drug use including IV per family. - The patient any using recreational drugs.	- (intravenous drug),(behavior_drug) - (illicit drug, IV),(behavior_drug, behavior_drug) - (recreational drugs),(behavior_drug)

Predicted Entities

sdoh_community, sdoh_economics, sdoh_education, sdoh_environment, behavior_tobacco, behavior_alcohol, behavior_drug

Live Demo Open in Colab Download Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")\
      
sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
    .setInputCols("document")\
    .setOutputCol("sentence")
    
tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")
    
embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
    .setInputCols("sentence", "token")\
    .setOutputCol("embeddings")
    
ner_model = MedicalNerModel.pretrained("ner_sdoh_mentions", "en", "clinical/models")\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")
    
ner_converter = NerConverterInternal()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")
    
nlpPipeline = Pipeline(stages=[
    document_assembler,
    sentenceDetector,
    tokenizer,
    embeddings,
    ner_model,
    ner_converter])

data = spark.createDataFrame([["Mr. Known lastname 9880 is a pleasant, cooperative gentleman with a long standing history (20 years) diverticulitis. He is married and has 3 children. He works in a bank. He denies any alcohol or intravenous drug use. He has been smoking for many years."]]).toDF("text")

result = nlpPipeline.fit(data).transform(data)

val document_assembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")
    
val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")
    .setInputCols("document")
    .setOutputCol("sentence")
    
val tokenizer = new Tokenizer()
    .setInputCols("sentence")
    .setOutputCol("token")
    
val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")
    
val ner_model = MedicalNerModel.pretrained("ner_sdoh_mentions", "en", "clinical/models")
    .setInputCols(Array("sentence", "token", "embeddings"))
    .setOutputCol("ner")
    
val ner_converter = new NerConverterInternal()
    .setInputCols(Array("sentence", "token", "ner"))
    .setOutputCol("ner_chunk")
    
val nlpPipeline = new PipelineModel().setStages(Array(
    document_assembler, 
    sentenceDetector,
    tokenizer,
    embeddings,
    ner_model,
    ner_converter))

val data = Seq("Mr. Known lastname 9880 is a pleasant, cooperative gentleman with a long standing history (20 years) diverticulitis. He is married and has 3 children. He works in a bank. He denies any alcohol or intravenous drug use. He has been smoking for many years.").toDS.toDF("text")

val result = nlpPipeline.fit(data).transform(data)

import nlu
nlu.load("en.med_ner.sdoh_mentions").predict("""Mr. Known lastname 9880 is a pleasant, cooperative gentleman with a long standing history (20 years) diverticulitis. He is married and has 3 children. He works in a bank. He denies any alcohol or intravenous drug use. He has been smoking for many years.""")

Results

+----------------+----------------+
|chunk           |ner_label       |
+----------------+----------------+
|married         |sdoh_community  |
|children        |sdoh_community  |
|works           |sdoh_economics  |
|alcohol         |behavior_alcohol|
|intravenous drug|behavior_drug   |
|smoking         |behavior_tobacco|
+----------------+----------------+

Model Information

Model Name:	ner_sdoh_mentions
Compatibility:	Healthcare NLP 4.2.2+
License:	Licensed
Edition:	Official
Input Labels:	[sentence, token, embeddings]
Output Labels:	[ner]
Language:	en
Size:	15.1 MB

Benchmarking

           label  precision    recall  f1-score   support
behavior_alcohol       0.95      0.94      0.94       798
   behavior_drug       0.93      0.92      0.92       366
behavior_tobacco       0.95      0.95      0.95       936
  sdoh_community       0.97      0.97      0.97       969
  sdoh_economics       0.95      0.91      0.93       363
  sdoh_education       0.69      0.65      0.67        34
sdoh_environment       0.93      0.90      0.92       651
       micro-avg       0.95      0.94      0.94      4117
       macro-avg       0.91      0.89      0.90      4117
    weighted-avg       0.95      0.94      0.94      4117

PREVIOUSProfessions & Occupations NER model in Spanish (meddroprof_scielowiki)

NEXTLegal Agreement Document Binary Classifier (Bert Sentence Embeddings)