4.1.0
Highlights
- Zero-Shot NER model to extract entities with no training dataset
- 7 new clinical NER models in Spanish
- 8 new clinical classification models in English and German related to public health topics (depression, covid sentiment, health mentions)
- New pretrained chunk mapper model (
drug_ade_mapper
) to map drugs with their corresponding adverse drug events - A new pretrained resolver pipeline (
medication_resolver_pipeline
) to extract medications and resolve their adverse reactions (ADE), RxNorm, UMLS, NDC, SNOMED CT codes and action/treatments in clinical text with a single line of code. - Updated NER profiling pretrained pipelines with new NER models to allow running 64 clinical NER models at once
- Core improvements and bug fixes
- New and updated notebooks
- 20+ new clinical models and pipelines added & updated in total
Zero-Shot NER model to Extract Entities With No Training Dataset
We are releasing the first of its kind Zero-Shot NER model that can detect any named entities without using any annotated dataset to train a model. It allows extracting entities by crafting appropriate prompts to query any RoBERTa Question Answering model.
See Models Hub Page for more details.
Example :
...
zero_shot_ner = ZeroShotNerModel.pretrained("zero_shot_ner_roberta", "en", "clincial/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("zero_shot_ner")\
.setEntityDefinitions(
{
"PROBLEM": ["What is the disease?", "What is his symptom?", "What is her disease?", "What is his disease?",
"What is the problem?" ,"What does a patient suffer", 'What was the reason that the patient is admitted to the clinic?'],
"DRUG": ["Which drug?", "Which is the drug?", "What is the drug?", "Which drug does he use?", "Which drug does she use?", "Which drug do I use?", "Which drug is prescribed for a symptom?"],
"ADMISSION_DATE": ["When did patient admitted to a clinic?"],
"PATIENT_AGE": ["How old is the patient?",'What is the age of the patient?']
})\
...
sample_text = ["The doctor pescribed Majezik for my severe headache.",
"The patient was admitted to the hospital for his colon cancer.",
"27 years old patient was admitted to clinic on Sep 1st by Dr. X for a right-sided pleural effusion for thoracentesis."]
Results :
+------------------------------------------------+--------------+----------+
| chunk| ner_label|confidence|
+------------------------------------------------+--------------+----------+
| Majezik| DRUG|0.64671576|
| severe headache| PROBLEM| 0.5526346|
| colon cancer| PROBLEM| 0.8898498|
| 27 years old| PATIENT_AGE| 0.6943085|
| Sep 1st|ADMISSION_DATE|0.95646095|
|a right-sided pleural effusion for thoracentesis| PROBLEM|0.50026613|
+------------------------------------------------+--------------+----------+
7 New Clinical NER Models in Spanish
- We are releasing 4 new
MedicalNerModel
and 3 newMedicalBertForTokenClassifier
NER models in Spanish.
model name | description | predicted entities |
---|---|---|
ner_negation_uncertainty | This model detects relevant entities from Spanish medical texts | NEG UNC USCO NSCO |
disease_mentions_tweet | This model detects disease mentions in Spanish tweets | ENFERMEDAD |
ner_clinical_trials_abstracts | This model detects relevant entities from Spanish clinical trial abstracts | CHEM DISO PROC |
ner_pharmacology | This model detects pharmacological entities from Spanish medical texts | PROTEINAS NORMALIZABLES |
bert_token_classifier_ner_clinical_trials_abstracts | This model detects relevant entities from Spanish clinical trial abstracts | CHEM DISO PROC |
bert_token_classifier_negation_uncertainty | This model detects relevant entities from Spanish medical texts | NEG NSCO UNC USCO |
bert_token_classifier_pharmacology | This model detects pharmacological entities from Spanish medical texts | PROTEINAS NORMALIZABLES |
Example :
...
ner = MedicalNerModel.pretrained('ner_clinical_trials_abstracts', "es", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
example_text= """"Efecto de la suplementación con ácido fólico sobre los niveles de homocisteína total en pacientes en hemodiálisis. La hiperhomocisteinemia es un marcador de riesgo independiente de morbimortalidad cardiovascular. Hemos prospectivamente reducir los niveles de homocisteína total (tHcy) mediante suplemento con ácido fólico y vitamina B6 (pp), valorando su posible correlación con dosis de diálisis, función residual y parámetros nutricionales.""""
Results :
+-----------------------------+---------+
|chunk |ner_label|
+-----------------------------+---------+
|suplementación |PROC |
|ácido fólico |CHEM |
|niveles de homocisteína |PROC |
|hemodiálisis |PROC |
|hiperhomocisteinemia |DISO |
|niveles de homocisteína total|PROC |
|tHcy |PROC |
|ácido fólico |CHEM |
|vitamina B6 |CHEM |
|pp |CHEM |
|diálisis |PROC |
|función residual |PROC |
+-----------------------------+---------+
8 New Clinical Classification Models in English and German Related to Public Health Topics (Depression, Covid Sentiment, Health Mentions)
- We are releasing 8 new
MedicalBertForSequenceClassification
models to classify text from social media data in English and German related to public health topics (depression, covid sentiment, health mentions)
model name | description | predicted entities |
---|---|---|
bert_sequence_classifier_depression_binary | This model classifies whether a social media text expresses depression or not. | no-depression depression |
bert_sequence_classifier_health_mentions_gbert_large | This GBERT-large based model classifies public health mentions in German social media text. | non-health health-related |
bert_sequence_classifier_health_mentions_medbert | This German-MedBERT based model classifies public health mentions in German social media text. | non-health health-related |
bert_sequence_classifier_health_mentions_gbert | This GBERT-large based model classifies public health mentions in German social media text. | non-health health-related |
bert_sequence_classifier_health_mentions_bert | This bert-base-german based model classifies public health mentions in German social media text. | non-health health-related |
bert_sequence_classifier_depression_twitter | This PHS-BERT based model classifies whether tweets contain depressive text or not. | depression no-depression |
bert_sequence_classifier_depression | This PHS-BERT based model classifies depression level of social media text into three levels. | no-depression minimum high-depression |
bert_sequence_classifier_covid_sentiment | This BioBERT based sentiment analysis model classifies whether a tweet contains positive, negative, or neutral sentiments about COVID-19 pandemic. | neutral positive negative |
Example :
...
sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_depression_twitter", "en", "clinical/models")\
.setInputCols(["document","token"])\
.setOutputCol("class")
example_text = ["Do what makes you happy, be with who makes you smile, laugh as much as you breathe, and love as long as you live!",
"Everything is a lie, everyone is fake, I'm so tired of living"]
Results :
+------------------------------------------------------------------------------------------------------------------+---------------+
|text |result |
+-----------------------------------------------------------------------------------------------------------------+---------------+
|Do what makes you happy, be with who makes you smile, laugh as much as you breathe, and love as long as you live!|[no-depression]|
|Everything is a lie, everyone is fake, I am so tired of living. |[depression] |
+-----------------------------------------------------------------------------------------------------------------+---------------+
New Pretrained Chunk Mapper Model (drug_ade_mapper
) to Map Drugs With Their Corresponding Adverse Drug Events
We are releasing new drug_ade_mapper
pretrained chunk mapper model to map drugs with their corresponding adverse drug events.
See Models Hub Page for more details.
Example :
...
chunkMapper = ChunkMapperModel.pretrained("drug_ade_mapper", "en", "clinical/models")\
.setInputCols(["ner_chunk"])\
.setOutputCol("mappings")\
.setRels(["ADE"])
...
sample_text = "The patient was prescribed 1000 mg fish oil and multivitamins. She was discharged on zopiclone and ambrisentan."
Results :
+----------------+------------+-------------------------------------------------------------------------------------------+
|ner_chunk |ade_mappings|all_relations |
+----------------+------------+-------------------------------------------------------------------------------------------+
|1000 mg fish oil|Dizziness |Myocardial infarction:::Nausea |
|multivitamins |Erythema |Acne:::Dry skin:::Skin burning sensation:::Inappropriate schedule of product administration|
|zopiclone |Vomiting |Malaise:::Drug interaction:::Asthenia:::Hyponatraemia |
|ambrisentan |Dyspnoea |Therapy interrupted:::Death:::Dizziness:::Drug ineffective |
+----------------+------------+-------------------------------------------------------------------------------------------+
A New Pretrained Resolver Pipeline (medication_resolver_pipeline
) to Extract Medications and Resolve Their Adverse Reactions (ADE), RxNorm, UMLS, NDC, SNOMED CT Codes and Action/Treatments in Clinical Text.
We are releasing the medication_resolver_pipeline
pretrained pipeline to extract medications and resolve their adverse reactions (ADE), RxNorm, UMLS, NDC, SNOMED CT codes and action/treatments in clinical text with a single line of code.
Also, you can use medication_resolver_transform_pipeline
to use transform method of Spark.
See Models Hub Page for more details.
Example :
from sparknlp.pretrained import PretrainedPipeline
sample_text = """The patient was prescribed Amlodopine Vallarta 10-320mg, Eviplera.
The other patient is given Lescol 40 MG and Everolimus 1.5 mg tablet."""
med_pipeline = PretrainedPipeline("medication_resolver_pipeline", "en", "clinical/models")
med_pipeline.annotate(sample_text)
med_transform_pipeline = PretrainedPipeline("medication_resolver_transform_pipeline", "en", "clinical/models")
med_transform_pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))
Results :
| chunk | ner_label | ADE | RxNorm | Action | Treatment | UMLS | SNOMED_CT | NDC_Product | NDC_Package |
|:-----------------------------|:------------|:----------------------------|---------:|:---------------------------|:-------------------------------------------|:---------|:------------|:--------------|:--------------|
| Amlodopine Vallarta 10-320mg | DRUG | Gynaecomastia | 722131 | NONE | NONE | C1949334 | 425838008 | 00093-7693 | 00093-7693-56 |
| Eviplera | DRUG | Anxiety | 217010 | Inhibitory Bone Resorption | Osteoporosis | C0720318 | NONE | NONE | NONE |
| Lescol 40 MG | DRUG | NONE | 103919 | Hypocholesterolemic | Heterozygous Familial Hypercholesterolemia | C0353573 | NONE | 00078-0234 | 00078-0234-05 |
| Everolimus 1.5 mg tablet | DRUG | Acute myocardial infarction | 2056895 | NONE | NONE | C4723581 | NONE | 00054-0604 | 00054-0604-21 |
Updated NER Profiling Pretrained Pipelines With New NER Models to Allow Running 64 Clinical NER Models at Once
We have upadated ner_profiling_clinical
and ner_profiling_biobert
pretrained pipelines with the new NER models. When you run these pipelines over your text, now you will end up with the predictions coming out of 64 clinical NER models in ner_profiling_clinical
and 22 clinical NER models in ner_profiling_biobert
results.
You can check ner_profiling_clinical and ner_profiling_biobert Models Hub pages for more details and the NER model lists that these pipelines include.
Core Improvements and Bug Fixes
- Updated HCC module (
from sparknlp_jsl.functions import profile
) with the new changes in HCC score calculation functions. AnnotationToolJsonReader
,NerDLMetrics
andStructuredDeidentification
: These annotators can be used on Spark 3.0 now.NerDLMetrics
:- Added
case_sensitive
parameter and case sensitivity issue in tokens is solved. - Added
drop_o
parameter tocomputeMetricsFromDF
method anddropO
parameter inNerDLMetrics
class is deprecated.
- Added
MedicalNerModel
: Inconsistent NER model results between different versions issue is solved.AssertionDLModel
: Unindexed chunks will be ignored by theAssertionDLModel
instead of raising an exception.ContextualParserApproach
: These two issues are solved when usingruleScope: "document"
configuration:- Wrong index computations of chunks after matching sub-tokens.
- Including sub-token matches even though
completeMatchRegex: "true"
.
New and Updated Notebooks
- We have a new Zero-Shot Clinical NER Notebook to show how to use zero-shot NER model.
- We have updated Medicare Risk Adjustment Score Calculation Notebook with the new changes in HCC score calculation functions.
- We have updated these notebooks with the new updates in NER profiling pretrained pipelines:
- We have updated Clinical Assertion Model Notebook according to the bug fix in the training section.
- We moved all Azure/AWS/Databricks notebooks to
products
folder in spark-nlp-worksop repo.
20+ New Clinical Models and Pipelines Added & Updated in Total
zero_shot_ner_roberta
medication_resolver_pipeline
medication_resolver_transform_pipeline
ner_profiling_clinical
ner_profiling_biobert
drug_ade_mapper
ner_negation_uncertainty
disease_mentions_tweet
ner_clinical_trials_abstracts
ner_pharmacology
bert_token_classifier_ner_clinical_trials_abstracts
bert_token_classifier_negation_uncertainty
bert_token_classifier_pharmacology
bert_sequence_classifier_depression_binary
bert_sequence_classifier_health_mentions_gbert_large
bert_sequence_classifier_health_mentions_medbert
bert_sequence_classifier_health_mentions_gbert
bert_sequence_classifier_health_mentions_bert
bert_sequence_classifier_depression_twitter
bert_sequence_classifier_depression
bert_sequence_classifier_covid_sentiment
Versions
- 5.5.0
- 5.4.1
- 5.4.0
- 5.3.3
- 5.3.2
- 5.3.1
- 5.3.0
- 5.2.1
- 5.2.0
- 5.1.4
- 5.1.3
- 5.1.2
- 5.1.1
- 5.1.0
- 5.0.2
- 5.0.1
- 5.0.0
- 4.4.4
- 4.4.3
- 4.4.2
- 4.4.1
- 4.4.0
- 4.3.2
- 4.3.1
- 4.3.0
- 4.2.8
- 4.2.4
- 4.2.3
- 4.2.2
- 4.2.1
- 4.2.0
- 4.1.0
- 4.0.2
- 4.0.0
- 3.5.3
- 3.5.2
- 3.5.1
- 3.5.0
- 3.4.2
- 3.4.1
- 3.4.0
- 3.3.4
- 3.3.2
- 3.3.1
- 3.3.0
- 3.2.3
- 3.2.2
- 3.2.1
- 3.2.0
- 3.1.3
- 3.1.2
- 3.1.1
- 3.1.0
- 3.0.3
- 3.0.2
- 3.0.1
- 3.0.0
- 2.7.6
- 2.7.5
- 2.7.4
- 2.7.3
- 2.7.2
- 2.7.1
- 2.7.0
- 2.6.2
- 2.6.0
- 2.5.5
- 2.5.3
- 2.5.2
- 2.5.0
- 2.4.6
- 2.4.5
- 2.4.2
- 2.4.1
- 2.4.0