Description
Zero-shot Named Entity Recognition (NER) enables the identification of entities in text with minimal effort. By leveraging pre-trained language models and contextual understanding, zero-shot NER extends entity recognition capabilities to new domains and languages. While the model card includes default labels as examples, it is important to highlight that users are not limited to these labels.
The model is designed to support any set of entity labels, allowing users to adapt it to their specific use cases. For best results, it is recommended to use labels that are conceptually similar to the provided defaults.
Predicted Entities
Access_To_Care
, Age
, Alcohol
, Childhood_Development
, Diet
, Disability
, Eating_Disorder
, Education
, Employment
, Environmental_Condition
, Exercise
, Family_Member
, Financial_Status
, Gender
, Geographic_Entity
, Healthcare_Institution
, Housing
, Hypertension
, Income
, Insurance_Status
, Language
, Legal_Issues
, Marital_Status
, Mental_Health
, Obesity
, Other_Disease
, Other_SDoH_Keywords
, Quality_Of_Life
, Race_Ethnicity
, Sexual_Activity
, Sexual_Orientation
, Smoking
, Social_Exclusion
, Social_Support
, Spiritual_Beliefs
, Transportation
, Violence_Or_Abuse
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
labels = [
'Access_To_Care', 'Age', 'Alcohol', 'Childhood_Development', 'Diet', 'Disability',
'Eating_Disorder', 'Education', 'Employment', 'Environmental_Condition', 'Exercise',
'Family_Member', 'Financial_Status', 'Gender', 'Geographic_Entity', 'Healthcare_Institution',
'Housing', 'Hypertension', 'Income', 'Insurance_Status', 'Language', 'Legal_Issues',
'Marital_Status', 'Mental_Health', 'Obesity', 'Other_Disease', 'Other_SDoH_Keywords',
'Quality_Of_Life', 'Race_Ethnicity', 'Sexual_Activity', 'Sexual_Orientation', 'Smoking',
'Social_Exclusion', 'Social_Support', 'Spiritual_Beliefs', 'Transportation', 'Violence_Or_Abuse'
]
pretrained_zero_shot_ner = PretrainedZeroShotNER().pretrained("zeroshot_ner_sdoh_medium", "en", "clinical/models")\
.setInputCols("sentence", "token")\
.setOutputCol("ner")\
.setPredictionThreshold(0.5)\
.setLabels(labels)
ner_converter = NerConverterInternal()\
.setInputCols("sentence", "token", "ner")\
.setOutputCol("ner_chunk")
pipeline = Pipeline().setStages([
document_assembler,
sentence_detector,
tokenizer,
pretrained_zero_shot_ner,
ner_converter
])
data = spark.createDataFrame([["""Smith is 55 years old, living in New York, a divorced Mexcian American woman with financial problems. She lives in an apartment. She has been struggling with diabetes for the past 10 years and has recently been experiencing frequent hospitalizations due to uncontrolled blood sugar levels. Smith works as a cleaning assistant and cannot access health insurance or paid sick leave. She has a son, a student at college. Pt with likely long-standing depression. She is aware she needs rehab. Pt reports having her catholic faith as a means of support as well. She has a long history of etoh abuse, beginning in her teens. She reports she has been a daily drinker for 30 years, most recently drinking beer daily. She smokes a pack of cigarettes a day. She had DUI in April and was due to court this week."""]]).toDF("text")
result = pipeline.fit(data).transform(data)
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = nlp.SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
labels = [
'Access_To_Care', 'Age', 'Alcohol', 'Childhood_Development', 'Diet', 'Disability',
'Eating_Disorder', 'Education', 'Employment', 'Environmental_Condition', 'Exercise',
'Family_Member', 'Financial_Status', 'Gender', 'Geographic_Entity', 'Healthcare_Institution',
'Housing', 'Hypertension', 'Income', 'Insurance_Status', 'Language', 'Legal_Issues',
'Marital_Status', 'Mental_Health', 'Obesity', 'Other_Disease', 'Other_SDoH_Keywords',
'Quality_Of_Life', 'Race_Ethnicity', 'Sexual_Activity', 'Sexual_Orientation', 'Smoking',
'Social_Exclusion', 'Social_Support', 'Spiritual_Beliefs', 'Transportation', 'Violence_Or_Abuse'
]
pretrained_zero_shot_ner = medical.PretrainedZeroShotNER().pretrained("zeroshot_ner_sdoh_medium", "en", "clinical/models")\
.setInputCols("sentence", "token")\
.setOutputCol("ner")\
.setPredictionThreshold(0.5)\
.setLabels(labels)
ner_converter = medical.NerConverterInternal()\
.setInputCols("sentence", "token", "ner")\
.setOutputCol("ner_chunk")
pipeline = nlp.Pipeline().setStages([
document_assembler,
sentence_detector,
tokenizer,
pretrained_zero_shot_ner,
ner_converter
])
data = spark.createDataFrame([["""Smith is 55 years old, living in New York, a divorced Mexcian American woman with financial problems. She lives in an apartment. She has been struggling with diabetes for the past 10 years and has recently been experiencing frequent hospitalizations due to uncontrolled blood sugar levels. Smith works as a cleaning assistant and cannot access health insurance or paid sick leave. She has a son, a student at college. Pt with likely long-standing depression. She is aware she needs rehab. Pt reports having her catholic faith as a means of support as well. She has a long history of etoh abuse, beginning in her teens. She reports she has been a daily drinker for 30 years, most recently drinking beer daily. She smokes a pack of cigarettes a day. She had DUI in April and was due to court this week."""]]).toDF("text")
result = pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = new SentenceDetector()
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
labels = Array("Access_To_Care", "Age", "Alcohol", "Childhood_Development", "Diet", "Disability",
"Eating_Disorder", "Education", "Employment", "Environmental_Condition", "Exercise",
"Family_Member", "Financial_Status", "Gender", "Geographic_Entity", "Healthcare_Institution",
"Housing", "Hypertension", "Income", "Insurance_Status", "Language", "Legal_Issues",
"Marital_Status", "Mental_Health", "Obesity", "Other_Disease", "Other_SDoH_Keywords",
"Quality_Of_Life", "Race_Ethnicity", "Sexual_Activity", "Sexual_Orientation", "Smoking",
"Social_Exclusion", "Social_Support", "Spiritual_Beliefs", "Transportation", "Violence_Or_Abuse")
val pretrained_zero_shot_ner = PretrainedZeroShotNER().pretrained("zeroshot_ner_sdoh_medium", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("ner")
.setPredictionThreshold(0.5)
.setLabels(labels)
val ner_converter = new NerConverterInternal()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(
document_assembler,
sentence_detector,
tokenizer,
pretrained_zero_shot_ner,
ner_converter
))
val data = Seq([["""Smith is 55 years old, living in New York, a divorced Mexcian American woman with financial problems. She lives in an apartment. She has been struggling with diabetes for the past 10 years and has recently been experiencing frequent hospitalizations due to uncontrolled blood sugar levels. Smith works as a cleaning assistant and cannot access health insurance or paid sick leave. She has a son, a student at college. Pt with likely long-standing depression. She is aware she needs rehab. Pt reports having her catholic faith as a means of support as well. She has a long history of etoh abuse, beginning in her teens. She reports she has been a daily drinker for 30 years, most recently drinking beer daily. She smokes a pack of cigarettes a day. She had DUI in April and was due to court this week."""]]).toDF("text")
val result = pipeline.fit(data).transform(data)
Results
+------------------+-----+---+-----------------+----------+
|chunk |begin|end|ner_label |confidence|
+------------------+-----+---+-----------------+----------+
|55 years old |9 |20 |Age |0.98757917|
|New York |33 |40 |Geographic_Entity|0.96678746|
|divorced |45 |52 |Marital_Status |0.9670806 |
|Mexcian American |54 |69 |Race_Ethnicity |0.6176904 |
|woman |71 |75 |Gender |0.9907975 |
|financial problems|82 |99 |Financial_Status |0.81308645|
|She |102 |104|Gender |0.9993352 |
|apartment |118 |126|Housing |0.615918 |
|She |129 |131|Gender |0.9991904 |
|diabetes |158 |165|Other_Disease |0.8660285 |
|cleaning assistant|307 |324|Employment |0.5079581 |
|She |381 |383|Gender |0.99975115|
|son |391 |393|Family_Member |0.99613535|
|student |398 |404|Education |0.8893471 |
|college |409 |415|Education |0.95348775|
|depression |447 |456|Mental_Health |0.9034998 |
|She |459 |461|Gender |0.99977094|
|she |472 |474|Gender |0.9998542 |
|rehab |482 |486|Access_To_Care |0.7825011 |
|her |507 |509|Gender |0.99785334|
|catholic faith |511 |524|Spiritual_Beliefs|0.95611227|
|support |540 |546|Social_Support |0.9163973 |
|She |558 |560|Gender |0.9997602 |
|her |609 |611|Gender |0.9995541 |
|teens |613 |617|Age |0.94968903|
|She |620 |622|Gender |0.99972373|
|she |632 |634|Gender |0.99988306|
|drinker |653 |659|Alcohol |0.9500115 |
|beer |698 |701|Alcohol |0.7686361 |
|She |710 |712|Gender |0.9997906 |
|smokes |714 |719|Smoking |0.92108554|
|cigarettes |731 |740|Smoking |0.93036956|
|She |749 |751|Gender |0.9998215 |
|DUI |757 |759|Legal_Issues |0.754453 |
+------------------+-----+---+-----------------+----------+
Model Information
Model Name: | zeroshot_ner_sdoh_medium |
Compatibility: | Healthcare NLP 5.5.1+ |
License: | Licensed |
Edition: | Official |
Language: | en |
Size: | 711.2 MB |
Benchmarking
label precision recall f1-score support
Access_To_Care 0.6721 0.8094 0.7344 1018
Age 0.8487 0.8243 0.8363 905
Alcohol 0.8774 0.9513 0.9128 677
Childhood_Development 0.5789 0.8462 0.6875 26
Diet 0.4451 0.8021 0.5725 96
Disability 0.8681 0.9294 0.8977 85
Eating_Disorder 0.6190 0.7800 0.6903 50
Education 0.6690 0.7951 0.7266 122
Employment 0.8914 0.9264 0.9086 4269
Environmental_Condition 0.3739 0.5244 0.5365 82
Exercise 0.4790 0.7619 0.5882 105
Family_Member 0.9598 0.9755 0.9676 4042
Financial_Status 0.4944 0.8224 0.6175 214
Gender 0.9920 0.9792 0.9856 10248
Geographic_Entity 0.6810 0.6930 0.6870 228
Healthcare_Institution 0.8948 0.3609 0.5143 1391
Housing 0.7553 0.8753 0.8109 850
Hypertension 0.5513 0.7049 0.6187 61
Income 0.6744 0.6744 0.6744 86
Insurance_Status 0.6951 0.7170 0.7059 159
Language 0.4923 0.8421 0.6214 38
Legal_Issues 0.2261 0.5246 0.5160 122
Marital_Status 0.9632 0.9458 0.9544 166
Mental_Health 0.6846 0.8265 0.7489 1003
O 0.9840 0.9782 0.9811 169565
Obesity 0.4667 0.7500 0.5753 28
Other_Disease 0.7056 0.9214 0.7992 1285
Other_SDoH_Keywords 0.7738 0.7156 0.7436 545
Quality_Of_Life 0.4959 0.5021 0.4990 241
Race_Ethnicity 0.8214 0.8214 0.8214 56
Sexual_Activity 0.6329 0.9434 0.7576 53
Sexual_Orientation 0.7778 0.8974 0.8333 39
Smoking 0.8683 0.9797 0.9206 148
Social_Exclusion 0.7455 0.8367 0.7885 49
Social_Support 0.8910 0.8610 0.8757 1367
Spiritual_Beliefs 0.6149 0.8349 0.7082 109
Transportation 0.3725 0.8120 0.5108 117
Violence_Or_Abuse 0.6623 0.6667 0.6645 153
accuracy - - 0.9651 199798
macro-avg 0.6895 0.8003 0.7288 199798
weighted-avg 0.9684 0.9651 0.9655 199798