Healthcare NLP v4.1.0 Release Notes

4.1.0

Highlights

Zero-Shot NER model to extract entities with no training dataset
7 new clinical NER models in Spanish
8 new clinical classification models in English and German related to public health topics (depression, covid sentiment, health mentions)
New pretrained chunk mapper model (drug_ade_mapper) to map drugs with their corresponding adverse drug events
A new pretrained resolver pipeline (medication_resolver_pipeline) to extract medications and resolve their adverse reactions (ADE), RxNorm, UMLS, NDC, SNOMED CT codes and action/treatments in clinical text with a single line of code.
Updated NER profiling pretrained pipelines with new NER models to allow running 64 clinical NER models at once
Core improvements and bug fixes
New and updated notebooks
20+ new clinical models and pipelines added & updated in total

Zero-Shot NER model to Extract Entities With No Training Dataset

We are releasing the first of its kind Zero-Shot NER model that can detect any named entities without using any annotated dataset to train a model. It allows extracting entities by crafting appropriate prompts to query any RoBERTa Question Answering model.

See Models Hub Page for more details.

Example :

...
zero_shot_ner = ZeroShotNerModel.pretrained("zero_shot_ner_roberta", "en", "clincial/models")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("zero_shot_ner")\
    .setEntityDefinitions(
        {
            "PROBLEM": ["What is the disease?", "What is his symptom?", "What is her disease?", "What is his disease?",
                        "What is the problem?" ,"What does a patient suffer", 'What was the reason that the patient is admitted to the clinic?'],
            "DRUG": ["Which drug?", "Which is the drug?", "What is the drug?", "Which drug does he use?", "Which drug does she use?", "Which drug do I use?", "Which drug is prescribed for a symptom?"],
            "ADMISSION_DATE": ["When did patient admitted to a clinic?"],
            "PATIENT_AGE": ["How old is the patient?",'What is the age of the patient?']
        })\
...

sample_text = ["The doctor pescribed Majezik for my severe headache.",
               "The patient was admitted to the hospital for his colon cancer.",
               "27 years old patient was admitted to clinic on Sep 1st by Dr. X for a right-sided pleural effusion for thoracentesis."]

Results :

+------------------------------------------------+--------------+----------+
|                                           chunk|     ner_label|confidence|
+------------------------------------------------+--------------+----------+
|                                         Majezik|          DRUG|0.64671576|
|                                 severe headache|       PROBLEM| 0.5526346|
|                                    colon cancer|       PROBLEM| 0.8898498|
|                                    27 years old|   PATIENT_AGE| 0.6943085|
|                                         Sep 1st|ADMISSION_DATE|0.95646095|
|a right-sided pleural effusion for thoracentesis|       PROBLEM|0.50026613|
+------------------------------------------------+--------------+----------+

7 New Clinical NER Models in Spanish

We are releasing 4 new MedicalNerModel and 3 new MedicalBertForTokenClassifier NER models in Spanish.

model name	description	predicted entities
ner_negation_uncertainty	This model detects relevant entities from Spanish medical texts	`NEG` `UNC` `USCO` `NSCO`
disease_mentions_tweet	This model detects disease mentions in Spanish tweets	`ENFERMEDAD`
ner_clinical_trials_abstracts	This model detects relevant entities from Spanish clinical trial abstracts	`CHEM` `DISO` `PROC`
ner_pharmacology	This model detects pharmacological entities from Spanish medical texts	`PROTEINAS` `NORMALIZABLES`
bert_token_classifier_ner_clinical_trials_abstracts	This model detects relevant entities from Spanish clinical trial abstracts	`CHEM` `DISO` `PROC`
bert_token_classifier_negation_uncertainty	This model detects relevant entities from Spanish medical texts	`NEG` `NSCO` `UNC` `USCO`
bert_token_classifier_pharmacology	This model detects pharmacological entities from Spanish medical texts	`PROTEINAS` `NORMALIZABLES`

Example :

...
ner = MedicalNerModel.pretrained('ner_clinical_trials_abstracts', "es", "clinical/models") \
	.setInputCols(["sentence", "token", "embeddings"]) \
	.setOutputCol("ner")

example_text=  """"Efecto de la suplementación con ácido fólico sobre los niveles de homocisteína total en pacientes en hemodiálisis. La hiperhomocisteinemia es un marcador de riesgo independiente de morbimortalidad cardiovascular. Hemos prospectivamente reducir los niveles de homocisteína total (tHcy) mediante suplemento con ácido fólico y vitamina B6 (pp), valorando su posible correlación con dosis de diálisis, función  residual y parámetros nutricionales.""""

Results :

+-----------------------------+---------+
|chunk                        |ner_label|
+-----------------------------+---------+
|suplementación               |PROC     |
|ácido fólico                 |CHEM     |
|niveles de homocisteína      |PROC     |
|hemodiálisis                 |PROC     |
|hiperhomocisteinemia         |DISO     |
|niveles de homocisteína total|PROC     |
|tHcy                         |PROC     |
|ácido fólico                 |CHEM     |
|vitamina B6                  |CHEM     |
|pp                           |CHEM     |
|diálisis                     |PROC     |
|función  residual            |PROC     |
+-----------------------------+---------+

We are releasing 8 new MedicalBertForSequenceClassification models to classify text from social media data in English and German related to public health topics (depression, covid sentiment, health mentions)

model name	description	predicted entities
bert_sequence_classifier_depression_binary	This model classifies whether a social media text expresses depression or not.	`no-depression` `depression`
bert_sequence_classifier_health_mentions_gbert_large	This GBERT-large based model classifies public health mentions in German social media text.	`non-health` `health-related`
bert_sequence_classifier_health_mentions_medbert	This German-MedBERT based model classifies public health mentions in German social media text.	`non-health` `health-related`
bert_sequence_classifier_health_mentions_gbert	This GBERT-large based model classifies public health mentions in German social media text.	`non-health` `health-related`
bert_sequence_classifier_health_mentions_bert	This bert-base-german based model classifies public health mentions in German social media text.	`non-health` `health-related`
bert_sequence_classifier_depression_twitter	This PHS-BERT based model classifies whether tweets contain depressive text or not.	`depression` `no-depression`
bert_sequence_classifier_depression	This PHS-BERT based model classifies depression level of social media text into three levels.	`no-depression` `minimum` `high-depression`
bert_sequence_classifier_covid_sentiment	This BioBERT based sentiment analysis model classifies whether a tweet contains positive, negative, or neutral sentiments about COVID-19 pandemic.	`neutral` `positive` `negative`

Example :

...
sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_depression_twitter", "en", "clinical/models")\
     .setInputCols(["document","token"])\
     .setOutputCol("class")

example_text = ["Do what makes you happy, be with who makes you smile, laugh as much as you breathe, and love as long as you live!",
                "Everything is a lie, everyone is fake, I'm so tired of living"]

Results :

+------------------------------------------------------------------------------------------------------------------+---------------+
 |text                                                                                                             |result         |
 +-----------------------------------------------------------------------------------------------------------------+---------------+
 |Do what makes you happy, be with who makes you smile, laugh as much as you breathe, and love as long as you live!|[no-depression]|
 |Everything is a lie, everyone is fake, I am so tired of living.                                                  |[depression]   |
 +-----------------------------------------------------------------------------------------------------------------+---------------+

New Pretrained Chunk Mapper Model (`drug_ade_mapper`) to Map Drugs With Their Corresponding Adverse Drug Events

We are releasing new drug_ade_mapper pretrained chunk mapper model to map drugs with their corresponding adverse drug events.

See Models Hub Page for more details.

Example :

...
chunkMapper = ChunkMapperModel.pretrained("drug_ade_mapper", "en", "clinical/models")\
      .setInputCols(["ner_chunk"])\
      .setOutputCol("mappings")\
      .setRels(["ADE"])
...

sample_text = "The patient was prescribed 1000 mg fish oil and multivitamins. She was discharged on zopiclone and ambrisentan."

Results :

+----------------+------------+-------------------------------------------------------------------------------------------+
|ner_chunk       |ade_mappings|all_relations                                                                              |
+----------------+------------+-------------------------------------------------------------------------------------------+
|1000 mg fish oil|Dizziness   |Myocardial infarction:::Nausea                                                             |
|multivitamins   |Erythema    |Acne:::Dry skin:::Skin burning sensation:::Inappropriate schedule of product administration|
|zopiclone       |Vomiting    |Malaise:::Drug interaction:::Asthenia:::Hyponatraemia                                      |
|ambrisentan     |Dyspnoea    |Therapy interrupted:::Death:::Dizziness:::Drug ineffective                                 |
+----------------+------------+-------------------------------------------------------------------------------------------+

A New Pretrained Resolver Pipeline (`medication_resolver_pipeline`) to Extract Medications and Resolve Their Adverse Reactions (ADE), RxNorm, UMLS, NDC, SNOMED CT Codes and Action/Treatments in Clinical Text.

We are releasing the medication_resolver_pipeline pretrained pipeline to extract medications and resolve their adverse reactions (ADE), RxNorm, UMLS, NDC, SNOMED CT codes and action/treatments in clinical text with a single line of code.

Also, you can use medication_resolver_transform_pipeline to use transform method of Spark.

See Models Hub Page for more details.

Example :

from sparknlp.pretrained import PretrainedPipeline

sample_text = """The patient was prescribed Amlodopine Vallarta 10-320mg, Eviplera.
                 The other patient is given Lescol 40 MG and Everolimus 1.5 mg tablet."""

med_pipeline = PretrainedPipeline("medication_resolver_pipeline", "en", "clinical/models")
med_pipeline.annotate(sample_text)

med_transform_pipeline = PretrainedPipeline("medication_resolver_transform_pipeline", "en", "clinical/models")
med_transform_pipeline.transform(spark.createDataFrame([[sample_text]]).toDF("text"))

Results :

| chunk                        | ner_label   | ADE                         |   RxNorm | Action                     | Treatment                                  | UMLS     | SNOMED_CT   | NDC_Product   | NDC_Package   |
|:-----------------------------|:------------|:----------------------------|---------:|:---------------------------|:-------------------------------------------|:---------|:------------|:--------------|:--------------|
| Amlodopine Vallarta 10-320mg | DRUG        | Gynaecomastia               |   722131 | NONE                       | NONE                                       | C1949334 | 425838008   | 00093-7693    | 00093-7693-56 |
| Eviplera                     | DRUG        | Anxiety                     |   217010 | Inhibitory Bone Resorption | Osteoporosis                               | C0720318 | NONE        | NONE          | NONE          |
| Lescol 40 MG                 | DRUG        | NONE                        |   103919 | Hypocholesterolemic        | Heterozygous Familial Hypercholesterolemia | C0353573 | NONE        | 00078-0234    | 00078-0234-05 |
| Everolimus 1.5 mg tablet     | DRUG        | Acute myocardial infarction |  2056895 | NONE                       | NONE                                       | C4723581 | NONE        | 00054-0604    | 00054-0604-21 |

Updated NER Profiling Pretrained Pipelines With New NER Models to Allow Running 64 Clinical NER Models at Once

We have upadated ner_profiling_clinical and ner_profiling_biobert pretrained pipelines with the new NER models. When you run these pipelines over your text, now you will end up with the predictions coming out of 64 clinical NER models in ner_profiling_clinical and 22 clinical NER models in ner_profiling_biobert results.

You can check ner_profiling_clinical and ner_profiling_biobert Models Hub pages for more details and the NER model lists that these pipelines include.

Core Improvements and Bug Fixes

Updated HCC module (from sparknlp_jsl.functions import profile) with the new changes in HCC score calculation functions.
AnnotationToolJsonReader, NerDLMetrics and StructuredDeidentification: These annotators can be used on Spark 3.0 now.
NerDLMetrics:
- Added case_sensitive parameter and case sensitivity issue in tokens is solved.
- Added drop_o parameter to computeMetricsFromDF method and dropO parameter in NerDLMetrics class is deprecated.
MedicalNerModel: Inconsistent NER model results between different versions issue is solved.
AssertionDLModel: Unindexed chunks will be ignored by the AssertionDLModel instead of raising an exception.
ContextualParserApproach: These two issues are solved when using ruleScope: "document" configuration:
- Wrong index computations of chunks after matching sub-tokens.
- Including sub-token matches even though completeMatchRegex: "true".

New and Updated Notebooks

We have a new Zero-Shot Clinical NER Notebook to show how to use zero-shot NER model.
We have updated Medicare Risk Adjustment Score Calculation Notebook with the new changes in HCC score calculation functions.
We have updated these notebooks with the new updates in NER profiling pretrained pipelines:
We have updated Clinical Assertion Model Notebook according to the bug fix in the training section.
We moved all Azure/AWS/Databricks notebooks to products folder in spark-nlp-worksop repo.

20+ New Clinical Models and Pipelines Added & Updated in Total

zero_shot_ner_roberta
medication_resolver_pipeline
medication_resolver_transform_pipeline
ner_profiling_clinical
ner_profiling_biobert
drug_ade_mapper
ner_negation_uncertainty
disease_mentions_tweet
ner_clinical_trials_abstracts
ner_pharmacology
bert_token_classifier_ner_clinical_trials_abstracts
bert_token_classifier_negation_uncertainty
bert_token_classifier_pharmacology
bert_sequence_classifier_depression_binary
bert_sequence_classifier_health_mentions_gbert_large
bert_sequence_classifier_health_mentions_medbert
bert_sequence_classifier_health_mentions_gbert
bert_sequence_classifier_health_mentions_bert
bert_sequence_classifier_depression_twitter
bert_sequence_classifier_depression
bert_sequence_classifier_covid_sentiment

Versions

Version
Version
Version

PREVIOUSVersion Compatibility

4.1.0

Highlights

Zero-Shot NER model to Extract Entities With No Training Dataset

7 New Clinical NER Models in Spanish

8 New Clinical Classification Models in English and German Related to Public Health Topics (Depression, Covid Sentiment, Health Mentions)

New Pretrained Chunk Mapper Model (drug_ade_mapper) to Map Drugs With Their Corresponding Adverse Drug Events

A New Pretrained Resolver Pipeline (medication_resolver_pipeline) to Extract Medications and Resolve Their Adverse Reactions (ADE), RxNorm, UMLS, NDC, SNOMED CT Codes and Action/Treatments in Clinical Text.

Updated NER Profiling Pretrained Pipelines With New NER Models to Allow Running 64 Clinical NER Models at Once

Core Improvements and Bug Fixes

New and Updated Notebooks

20+ New Clinical Models and Pipelines Added & Updated in Total

Versions

New Pretrained Chunk Mapper Model (`drug_ade_mapper`) to Map Drugs With Their Corresponding Adverse Drug Events

A New Pretrained Resolver Pipeline (`medication_resolver_pipeline`) to Extract Medications and Resolve Their Adverse Reactions (ADE), RxNorm, UMLS, NDC, SNOMED CT Codes and Action/Treatments in Clinical Text.