Description
Zero-shot Named Entity Recognition (NER) enables the identification of entities in text with minimal effort. By leveraging pre-trained language models and contextual understanding, zero-shot NER extends entity recognition capabilities to new domains and languages. While the model card includes default labels as examples, it is important to highlight that users are not limited to these labels. The model is designed to support any set of entity labels, allowing users to adapt it to their specific use cases. For best results, it is recommended to use labels that are conceptually similar to the provided defaults.
Predicted Entities
DRUG
, ADE
,PROBLEM
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
labels = ['DRUG', 'ADE','PROBLEM'] # You can change the entities
pretrained_zero_shot_ner = PretrainedZeroShotNER().pretrained("zeroshot_ner_ade_clinical_large", "en", "clinical/models")\
.setInputCols("sentence", "token")\
.setOutputCol("ner")\
.setPredictionThreshold(0.5)\
.setLabels(labels)
ner_converter = NerConverterInternal()\
.setInputCols("sentence", "token", "ner")\
.setOutputCol("ner_chunk")
pipeline = Pipeline().setStages([
document_assembler,
sentence_detector,
tokenizer,
pretrained_zero_shot_ner,
ner_converter
])
data = spark.createDataFrame([["""To alleviate severe seasonal allergies that included symptoms such as sneezing, watery eyes, and nasal congestion, the doctor recommended a combination of antihistamines and nasal corticosteroids, which collectively provided the patient with substantial symptomatic relief and improved quality of life. However, the patient reported experiencing side effects such as drowsiness from the antihistamines and occasional nosebleeds due to the nasal corticosteroids."""]]).toDF("text")
result = pipeline.fit(data).transform(data)
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = nlp.SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
labels = ['DRUG', 'ADE','PROBLEM']
pretrained_zero_shot_ner = medical.PretrainedZeroShotNER().pretrained("zeroshot_ner_ade_clinical_large", "en", "clinical/models")\
.setInputCols("sentence", "token")\
.setOutputCol("ner")\
.setPredictionThreshold(0.5)\
.setLabels(labels)
ner_converter = medical.NerConverterInternal()\
.setInputCols("sentence", "token", "ner")\
.setOutputCol("ner_chunk")
pipeline = nlp.Pipeline().setStages([
document_assembler,
sentence_detector,
tokenizer,
pretrained_zero_shot_ner,
ner_converter
])
data = spark.createDataFrame([["""To alleviate severe seasonal allergies that included symptoms such as sneezing, watery eyes, and nasal congestion, the doctor recommended a combination of antihistamines and nasal corticosteroids, which collectively provided the patient with substantial symptomatic relief and improved quality of life. However, the patient reported experiencing side effects such as drowsiness from the antihistamines and occasional nosebleeds due to the nasal corticosteroids."""]]).toDF("text")
result = pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = new SentenceDetector()
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
labels = Array("DRUG", "ADE","PROBLEM") // You can change the entities
val pretrained_zero_shot_ner = PretrainedZeroShotNER().pretrained("zeroshot_ner_ade_clinical_large", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("ner")
.setPredictionThreshold(0.5)
.setLabels(labels)
val ner_converter = new NerConverterInternal()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(
document_assembler,
sentence_detector,
tokenizer,
pretrained_zero_shot_ner,
ner_converter
))
val data = Seq(("""To alleviate severe seasonal allergies that included symptoms such as sneezing, watery eyes, and nasal congestion, the doctor recommended a combination of antihistamines and nasal corticosteroids, which collectively provided the patient with substantial symptomatic relief and improved quality of life. However, the patient reported experiencing side effects such as drowsiness from the antihistamines and occasional nosebleeds due to the nasal corticosteroids.""")).toDF("text")
val result = pipeline.fit(data).transform(data)
Results
+----------------------+-----+---+---------+----------+
|chunk |begin|end|ner_label|confidence|
+----------------------+-----+---+---------+----------+
|seasonal allergies |21 |38 |PROBLEM |0.7983188 |
|symptoms |54 |61 |PROBLEM |0.72892165|
|sneezing |71 |78 |PROBLEM |0.99215865|
|watery eyes |81 |91 |PROBLEM |0.98562455|
|nasal congestion |98 |113|PROBLEM |0.9831299 |
|antihistamines |156 |169|DRUG |0.99360174|
|nasal corticosteroids |175 |195|DRUG |0.81538504|
|side effects |347 |358|PROBLEM |0.77464384|
|drowsiness |368 |377|ADE |0.99594826|
|antihistamines |388 |401|DRUG |0.9916972 |
|nosebleeds |418 |427|ADE |0.98120177|
|nasal corticosteroids.|440 |461|DRUG |0.52606887|
+----------------------+-----+---+---------+----------+
Model Information
Model Name: | zeroshot_ner_ade_clinical_large |
Compatibility: | Healthcare NLP 5.5.1+ |
License: | Licensed |
Edition: | Official |
Language: | en |
Size: | 1.6 GB |
Benchmarking
label precision recall f1-score support
B-ADE 0.817 0.717 0.764 3551
B-DRUG 0.878 0.894 0.886 7551
B-PROBLEM 0.868 0.787 0.826 18238
I-ADE 0.818 0.648 0.723 4296
I-DRUG 0.883 0.757 0.815 10488
I-PROBLEM 0.862 0.636 0.732 15925
accuracy - - 0.942 347624
macro-avg 0.869 0.775 0.816 347624
weighted-avg 0.940 0.942 0.940 347624