Description
This Named Entity Recognition model is intended for detecting Social Determinants of Health mentions in clinical notes and trained by using MedicalNerApproach annotator that allows to train generic NER models based on Neural Networks.
Entitiy Name | Descriptions | Sample Texts | chunks+labels |
---|---|---|---|
sdoh_community | The patient’s social and community networks, including family members, friends, and other social connections. |
- He has a 27 yo son. - The patient lives with mother. - She is a widow. - Married and has two children. |
- (son),(sdoh_community) - (mother),(sdoh_community) - (widow),(sdoh_community) - (Married, children),(sdoh_community,sdoh_community) |
sdoh_economics | The patient’s economic status and financial resources, including their occupation, income, and employment status. |
- The patient worked as an accountant. - He is a retired history professor. - She is a lawyer. - Worked in insurance, currently unemployed. |
- (worked),(sdoh_economics) - (retired),(sdoh_economics) - (lawyer),(sdoh_economics) - (worked, unemployed),(sdoh_economics, sdoh_economics) |
sdoh_education | The patient’s education-related passages such as schooling, college, or degrees attained. |
- She graduated from high school. - He is a fourth grade teacher in inner city. - He completed some college. |
- (graduated from high school),(sdoh_education) - (teacher),(sdoh_education) - (college),(sdoh_education) |
sdoh_environment | The patient’s living environment and access to housing. |
- He lives at home. - Her other daughter lives in the apartment below. - The patient lives with her husband in a retirement community. |
- (home),(sdoh_environment) - (apartment),(sdoh_environment) - (retirement community),(sdoh_environment) |
behavior_tobacco | This entity is labeled based on any indication of the patient’s current or past tobacco use and smoking history |
- She smoked one pack a day for forty years. - The patient denies tobacco use. - The patient smokes an occasional cigar. |
- (smoked one pack),(behavior_tobacco) - (tobacco),(behavior_tobacco) - (smokes an occasional cigar),(behavior_tobacco) |
behavior_alcohol | This entity is used to label indications of the patient’s alcohol consumption. |
- She drinks alcohol. - The patient denies ethanol. - He denies ETOH. |
- (alcohol),(behavior_alcohol) - (ethanol),(behavior_alcohol) - (ETOH),(behavior_alcohol) |
behavior_drug | This entity is used to label any indications of the patient’s current or past drug use. |
- She denies any intravenous drug abuse. - No illicit drug use including IV per family. - The patient any using recreational drugs. |
- (intravenous drug),(behavior_drug) - (illicit drug, IV),(behavior_drug, behavior_drug) - (recreational drugs),(behavior_drug) |
Predicted Entities
sdoh_community
, sdoh_economics
, sdoh_education
, sdoh_environment
, behavior_tobacco
, behavior_alcohol
, behavior_drug
Live Demo Open in Colab Copy S3 URI
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")\
sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
.setInputCols("document")\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")\
.setInputCols("sentence", "token")\
.setOutputCol("embeddings")
ner_model = MedicalNerModel.pretrained("ner_sdoh_mentions", "en", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = NerConverterInternal()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
nlpPipeline = Pipeline(stages=[
document_assembler,
sentenceDetector,
tokenizer,
embeddings,
ner_model,
ner_converter])
data = spark.createDataFrame([["Mr. Known lastname 9880 is a pleasant, cooperative gentleman with a long standing history (20 years) diverticulitis. He is married and has 3 children. He works in a bank. He denies any alcohol or intravenous drug use. He has been smoking for many years."]]).toDF("text")
result = nlpPipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")
val ner_model = MedicalNerModel.pretrained("ner_sdoh_mentions", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverterInternal()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val nlpPipeline = new PipelineModel().setStages(Array(
document_assembler,
sentenceDetector,
tokenizer,
embeddings,
ner_model,
ner_converter))
val data = Seq("Mr. Known lastname 9880 is a pleasant, cooperative gentleman with a long standing history (20 years) diverticulitis. He is married and has 3 children. He works in a bank. He denies any alcohol or intravenous drug use. He has been smoking for many years.").toDS.toDF("text")
val result = nlpPipeline.fit(data).transform(data)
import nlu
nlu.load("en.med_ner.sdoh_mentions").predict("""Mr. Known lastname 9880 is a pleasant, cooperative gentleman with a long standing history (20 years) diverticulitis. He is married and has 3 children. He works in a bank. He denies any alcohol or intravenous drug use. He has been smoking for many years.""")
Results
+----------------+----------------+
|chunk |ner_label |
+----------------+----------------+
|married |sdoh_community |
|children |sdoh_community |
|works |sdoh_economics |
|alcohol |behavior_alcohol|
|intravenous drug|behavior_drug |
|smoking |behavior_tobacco|
+----------------+----------------+
Model Information
Model Name: | ner_sdoh_mentions |
Compatibility: | Healthcare NLP 4.2.2+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 15.1 MB |
Benchmarking
label precision recall f1-score support
behavior_alcohol 0.95 0.94 0.94 798
behavior_drug 0.93 0.92 0.92 366
behavior_tobacco 0.95 0.95 0.95 936
sdoh_community 0.97 0.97 0.97 969
sdoh_economics 0.95 0.91 0.93 363
sdoh_education 0.69 0.65 0.67 34
sdoh_environment 0.93 0.90 0.92 651
micro-avg 0.95 0.94 0.94 4117
macro-avg 0.91 0.89 0.90 4117
weighted-avg 0.95 0.94 0.94 4117