Healthcare NLP v4.4.4 Release Notes

4.4.4

Highlights

We are delighted to announce a suite of remarkable enhancements and updates in our latest release of Spark NLP for Healthcare. This release comes with 40+ new clinical pretrained models and pipelines, and is a testament to our commitment to continuously innovate and improve, furnishing you with a more sophisticated and powerful toolkit for healthcare natural language processing.

Enhanced PySpark v3.4.X support for advanced Natural Language Processing
New module focused on extracting the most relevant information with Extractive Summarization
Customized prompts in TextGenerator Annotator
Arabic language obfuscation support in Deidentification
One liner Arabic language Clinical Deidentification Pipeline
36 New Voice of Patient (VOP) NER models and pipelines for entity extraction from patient’s own words (usually in non-medical jargon)
New BioBERT-based VOP Classification Models for classifying certain tones (if HCP consult, medically sound, mention of ADE, self-reported etc.) in patient’s own words
Enhanced entity detection accuracy with the official version of Social Determinants of Health (SDOH) model
New NER model for precise detection of demographic characteristics in clinical notes
Updated Medicare Risk Adjustment score calculation module incorporating CMS’s latest proposed updates including the Version 28 support
New Resources Downloader Notebook that includes comprehensive guideline for model downloading
Pretrained pipelines now compatible with all PySpark versions
Various core improvements; bug fixes, enhanced overall robustness and reliability of Spark NLP for Healthcare
Updated notebooks and demonstrations for making Spark NLP for Healthcare easier to navigate and understand
The addition and update of numerous new clinical models and pipelines continue to reinforce our offering in the healthcare domain

We believe that these enhancements will elevate your experience with Spark NLP for Healthcare, enabling more efficient, accurate, and streamlined analysis of healthcare-related natural language data.

Enhanced PySpark v3.4.X Support for Advanced Natural Language Processing

SparkNLP, now offers enhanced support for PySpark v3.4, enabling data scientists and NLP practitioners to leverage the latest features and capabilities of Apache Spark while working with text data.

New Module Focused On Extracting The Most Relevant Information With Extractive Summarization

Extractive summarization focuses on extracting the most relevant information rather than generating new content. The process typically includes preprocessing the text, identifying important sentences using various criteria, ranking them based on their importance, and selecting the top-ranked sentences for the final summary. Extractive summarization is favored for its objectivity, preserving the factual accuracy of the original text.

Parameters:

similarityThreshold: Sets the minimal cosine similarity between sentences to consider them similar.
summarySize: Sets the number of sentences to summarize the text

Example:

sentence_embeddings = BertSentenceEmbeddings()\
    .pretrained("sent_small_bert_L2_128")\
    .setInputCols(["sentences"])\
    .setOutputCol("sentence_embeddings")

summarizer = ExtractiveSummarization()\
    .setInputCols(["sentences", "sentence_embeddings"])\
    .setOutputCol("summaries")\
    .setSummarySize(2)\
    .setSimilarityThreshold(0)

text = """Residual disease after initial surgery for ovarian cancer is the strongest prognostic factor for survival. However, the extent of surgical resection required to achieve optimal cytoreduction is controversial. Our goal was to estimate the effect of aggressive surgical resection on ovarian cancer patient survival.
A retrospective cohort study of consecutive patients with International Federation of Gynecology and Obstetrics stage IIIC ovarian cancer undergoing primary surgery was conducted between January 1, 1994, and December 31, 1998. The main outcome measures were residual disease after cytoreduction, frequency of radical surgical resection, and 5-year disease-specific survival.
The study comprised 194 patients, including 144 with carcinomatosis. The mean patient age and follow-up time were 64.4 and 3.5 years, respectively. After surgery, 131 (67.5%) of the 194 patients had less than 1 cm of residual disease (definition of optimal cytoreduction). Considering all patients, residual disease was the only independent predictor of survival; the need to perform radical procedures to achieve optimal cytoreduction was not associated with a decrease in survival. For the subgroup of patients with carcinomatosis, residual disease and the performance of radical surgical procedures were the only independent predictors. Disease-specific survival was markedly improved for patients with carcinomatosis operated on by surgeons who most frequently used radical procedures compared with those least likely to use radical procedures (44% versus 17%, P < .001).
Overall, residual disease was the only independent predictor of survival. Minimizing residual disease through aggressive surgical resection was beneficial, especially in patients with carcinomatosis."""

Result:

'The main outcome measures were residual disease after cytoreduction, frequency of radical surgical resection, and 5-year disease-specific survival.\nThe study comprised 194 patients, including 144 with carcinomatosis.',
'Considering all patients, residual disease was the only independent predictor of survival; the need to perform radical procedures to achieve optimal cytoreduction was not associated with a decrease in survival. For the subgroup of patients with carcinomatosis, residual disease and the performance of radical surgical procedures were the only independent predictors.'

See Extractive Summarization Notebook for examples.

Customized Prompts in `TextGenerator` Annotator

The MedicalTextGenerator() function incorporates a powerful feature called SetCustomPrompt, designed to enhance text generation capabilities. By utilizing this feature, users can input a custom prompt, typically in the format of “question: {DOCUMENT} answer:”. This structure allows for the generation of informative and contextually relevant medical text responses. This feature enhances the flexibility and usability of the MedicalTextGenerator() function, empowering users to generate accurate and contextually appropriate medical text with ease.

Example:

gpt_qa = MedicalTextGenerator().pretrained("biogpt_chat_jsl", "en", "clinical/models")\
    .setInputCols("documents")\
    .setOutputCol("answer")\
    .setMaxNewTokens(299)\
    .setStopAtEos(True)\
    .setDoSample(False)\
    .setTopK(3)\
    .setRandomSeed(42)\
    .setCustomPrompt("question: {DOCUMENT} answer:")

text = "What medications are commonly used to treat emphysema?"

Result:

question: What medications are commonly used to treat emphysema ? 
answer: Hello, There are two types of medications to treat emphysema: 1. Alpha agonists ( like albuterol or albuterol / levosalbutamol ) are used to treat symptoms of shortness of breath ( SOB ) and tightness in the chest ( tightness in chest ). These meds cause a mild to moderate increase in heart rate ( tachycardia ). 2. Beta blockers ( like propranolol or metoprolol ) are used to treat or to prevent symptoms of heart failure ( ejection fraction is 20 % or less ). These medications cause or worsen shortness of breath, tightness in chest, heart rate. You can take a combination of these medications. The combination that you will work best is a two - pill combination of albuterol and propranolol ( half tablet twice a day ). This will reduce or eliminate the need for albuterol and albuterol / levosalbutamol in your case. The other medications are used in consultation with your physician.

See Medical Text Generation Notebook for examples.

One Liner Arabic Language Clinical Deidentification Pipeline

We’re thrilled to announce that Spark NLP for Healthcare now supports obfuscation in Arabic De-identification (Deid) models. This feature enhances data privacy by substituting sensitive Protected Health Information (PHI) with corresponding synthetic data, while preserving data integrity through observance of certain consistency rules.

Example:

deid_masked_entity = DeIdentification()\
    .setInputCols(["sentence", "token", "ner_chunk"])\
    .setOutputCol("masked_with_entity")\
    .setMode("mask")\
    .setLanguage('ar')\
    .setMaskingPolicy("entity_labels")

deid_masked_char = DeIdentification()\
    .setInputCols(["sentence", "token", "ner_chunk"])\
    .setOutputCol("masked_with_chars")\
    .setMode("mask")\
    .setLanguage('ar')\
    .setMaskingPolicy("same_length_chars")

deid_masked_fixed_char = DeIdentification()\
    .setInputCols(["sentence", "token", "ner_chunk"])\
    .setOutputCol("masked_fixed_length_chars")\
    .setMode("mask")\
    .setLanguage('ar')\
    .setMaskingPolicy("fixed_length_chars")\
    .setFixedMaskLength(4)

obfuscation = DeIdentification()\
    .setInputCols(["sentence", "token", "ner_chunk"]) \
    .setOutputCol("obfuscated") \
    .setMode("obfuscate")\
    .setLanguage('ar')\
    .setObfuscateDate(True)\
    .setObfuscateRefSource("faker") \
    .setRegexOverride(True)

text = '''
الملاحظات السريرية - مريض السكري
التاريخ: 11 مايو 1999
اسم المريض: فاطمة علي
العنوان: شارع الحرية ، حي السلام ، القاهرة
دولة: مصر
اسم المستشفى: مستشفى الشفاء
اسم الطبيب: د.محمد صلاح
'''

Result:

	original sentence	Masked	Masked with Chars	Masked with Fixed Chars	Obfuscated
0	الملاحظات السريرية - مريض السكري	الملاحظات السريرية - مريض السكري	الملاحظات السريرية - مريض السكري	الملاحظات السريرية - مريض السكري	الملاحظات السريرية - مريض السكري
1	التاريخ: 11 مايو 1999	التاريخ: [تاريخ]	التاريخ: [٭٭٭٭٭٭٭٭٭٭]	التاريخ: ٭٭٭٭	التاريخ: 11 يوليو 1999
2	اسم المريض: فاطمة علي	اسم المريض: [الاسم]	اسم المريض: [٭٭٭٭٭٭٭]	اسم المريض: ٭٭٭٭	اسم المريض: رياض محروق
3	العنوان: شارع الحرية ، حي السلام ، القاهرة	العنوان: شارع الحرية ، حي [الموقع] ، [الموقع]	العنوان: شارع الحرية ، حي [٭٭٭٭] ، [٭٭٭٭٭]	العنوان: شارع الحرية ، حي ٭٭٭٭ ، ٭٭٭٭	العنوان: شارع الحرية ، حي شارع الجنوب أفريقي ، شارع التقدم
4	دولة: مصر	دولة: [الموقع]	دولة: [٭]	دولة: ٭٭٭٭	دولة: الشارع الأسفلتي
5	اسم المستشفى: مستشفى الشفاء	اسم المستشفى: مستشفى الشفاء	اسم المستشفى: مستشفى الشفاء	اسم المستشفى: مستشفى الشفاء	اسم المستشفى: مستشفى الشفاء
6	اسم الطبيب: د.محمد صلاح	اسم الطبيب: [الاسم]	اسم الطبيب: [٭٭٭٭٭٭٭٭٭]	اسم الطبيب: ٭٭٭٭	اسم الطبيب: ليث مصري

See Clinincal Multi Language Deidentification Notebook for examples.

New Arabic Clinical Deidentification Pipeline

This pipeline can be used to deidentify Arabic PHI information from medical texts in one one liner to ease the process of building the entire pipeline one by one. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate CONTACT, NAME, DATE, ID, LOCATION, AGE, PATIENT, HOSPITAL, ORGANIZATION, CITY, STREET, USERNAME, SEX, IDNUM, EMAIL, ZIP, MEDICALRECORD, PROFESSION, PHONE, COUNTRY, DOCTOR, SSN, ACCOUNT, LICENSE, DLN and VIN.

Example:

from sparknlp.pretrained import PretrainedPipeline
deid_pipeline_ar = PretrainedPipeline("clinical_deidentification", "ar", "clinical/models")

text = """
ملاحظات سريرية - مريض الربو:
التاريخ: 30 مايو 2023
اسم المريضة: ليلى حسن
تم تسجيل المريض في النظام باستخدام رقم الضمان الاجتماعي 123456789012.
العنوان: شارع المعرفة، مبنى رقم 789، حي الأمانة، جدة
الرمز البريدي: 54321
البلد: المملكة العربية السعودية
اسم المستشفى: مستشفى النور
اسم الطبيب: د. أميرة أحمد
"""

Result:

	Sentence	masked_with_entity	Masked with Chars	Masked with Fixed Chars	Obfuscated
0	ملاحظات سريرية - مريض الربو:\nالتاريخ: 30 مايو 2023	ملاحظات سريرية - مريض الربو:\nالتاريخ: [تاريخ] [تاريخ]	ملاحظات سريرية - مريض الربو:\nالتاريخ: [٭٭٭٭٭] [٭٭]	ملاحظات سريرية - مريض الربو:\nالتاريخ: ٭٭٭٭ ٭٭٭٭	ملاحظات سريرية - مريض الربو:\nالتاريخ: 30 يونيو 2024
1	اسم المريضة: ليلى حسن	اسم المريضة: [المريض]	اسم المريضة: [٭٭٭٭٭٭]	اسم المريضة: ٭٭٭٭	اسم المريضة: أدهم جبالي
2	تم تسجيل المريض في النظام باستخدام رقم الضمان الاجتماعي 123456789012.	تم تسجيل المريض في النظام باستخدام رقم الضمان الاجتماعي [هاتف].	تم تسجيل المريض في النظام باستخدام رقم الضمان الاجتماعي [٭٭٭٭٭٭٭٭٭٭].	تم تسجيل المريض في النظام باستخدام رقم الضمان الاجتماعي ٭٭٭٭.	تم تسجيل المريض في النظام باستخدام رقم الضمان الاجتماعي 963525347201.
3	العنوان: شارع المعرفة، مبنى رقم 789، حي الأمانة، جدة	العنوان: شارع المعرفة، مبنى رقم [الرمز البريدي] [المدينة] [المدينة]	العنوان: شارع المعرفة، مبنى رقم [٭٭] [٭٭٭٭٭٭٭٭٭] [٭]	العنوان: شارع المعرفة، مبنى رقم ٭٭٭٭ ٭٭٭٭ ٭٭٭٭	العنوان: شارع المعرفة، مبنى رقم 160، كلميم سانت كاترين
4	الرمز البريدي: 54321	الرمز البريدي: [الرمز البريدي]	الرمز البريدي: [٭٭٭]	الرمز البريدي: ٭٭٭٭	الرمز البريدي: 79915
5	البلد: المملكة العربية السعودية	البلد: [المدينة] [البلد]	البلد: [٭٭٭٭٭٭٭٭٭٭٭٭٭] [٭٭٭٭٭٭]	البلد: ٭٭٭٭ ٭٭٭٭	البلد: زغوان الغربية أوزبكستان
6	اسم المستشفى: مستشفى النور	اسم المستشفى: [الموقع]	اسم المستشفى: [٭٭٭٭٭٭٭٭٭٭]	اسم المستشفى: ٭٭٭٭	اسم المستشفى: شارع المدارس
7	اسم الطبيب: د. أميرة أحمد	اسم الطبيب: د. [دكتور]	اسم الطبيب: د. [٭٭٭٭٭٭٭٭]	اسم الطبيب: د. ٭٭٭٭	اسم الطبيب: د. هندية قبلاوي

See Clinincal Multi Language Deidentification Notebook for examples.

36 New Voice Of Patient (VOP) NER Models And Pipelines For Entity Extraction From Patient’s Own Words

We are excited to introduce our new NER models which extract clinical entities from the documents that are shared by patients in their own words.

model_name	description	predicted_entity
ner_vop_anatomy_emb ner_vop_anatomy_emb_clinical_medium ner_vop_anatomy_emb_clinical_large ner_vop_anatomy_pipeline	Extracts anatomical terms from the documents transferred from the patient’s own sentences.	`BodyPart`, `Laterality`
ner_vop_clinical_dept ner_vop_clinical_dept_emb_clinical_medium ner_vop_clinical_dept_emb_clinical_large ner_vop_clinical_dept_pipeline	Extracts medical devices and clinical department mentions terms from the documents transferred from the patient’s own sentences.	`AdmissionDischarge`, `ClinicalDept`, `MedicalDevice`
ner_vop_demographic ner_vop_demographic_emb_clinical_medium ner_vop_demographic_emb_clinical_large ner_vop_demographic_pipeline	Extracts demographic terms from the documents transferred from the patient’s own sentences.	`Gender`, `Employment`, `RaceEthnicity`, `Age`, `Substance`, `RelationshipStatus`, `SubstanceQuantity`
ner_vop_problem ner_vop_problem_emb_clinical_medium ner_vop_problem_emb_clinical_large ner_vop_problem_pipeline	Extracts clinical problems from the documents transferred from the patient’s own sentences using a granular taxonomy.	`PsychologicalCondition`, `Disease`, `Symptom`, `HealthStatus`, `Modifier`, `InjuryOrPoisoning`
ner_vop_test ner_vop_test_emb_clinical_medium ner_vop_test_emb_clinical_large ner_vop_test_pipeline	Extracts test mentions from the documents transferred from the patient’s own sentences.	`VitalTest`, `Test`, `Measurements`, `TestResult`
ner_vop_temporal ner_vop_temporal_emb_clinical_medium ner_vop_temporal_emb_clinical_large_final ner_vop_temporal_pipeline	Extracts temporal references from the documents transferred from the patient’s own sentences.	`DateTime`, `Frequency`, `Duration`
ner_vop_treatment ner_vop_treatment_emb_clinical_medium ner_vop_treatment_emb_clinical_large ner_vop_treatment_pipeline	Extracts treatments mentioned in documents transferred from the patient’s own sentences.	`Drug`, `Form`, `Dosage`, `Frequency`, `Route`, `Duration`, `Procedure`, `Treatment`
ner_vop ner_vop_emb_clinical_medium ner_vop_emb_clinical_large ner_vop_pipeline	Extracts healthcare-related terms from the documents transferred from the patient’s own sentences.	`Gender`, `Employment`, `Age`, `BodyPart`, `Substance`, `Form`, `PsychologicalCondition`, `Vaccine`, `Drug`, `DateTime`, `ClinicalDept`, `Laterality`, `Test`, `AdmissionDischarge`, `Disease`, `VitalTest`, `Dosage`, `Duration`, `RelationshipStatus`, `Route`, `Allergen`, `Frequency`, `Symptom`, `Procedure`, `HealthStatus`, `InjuryOrPoisoning`, `Modifier`, `Treatment`, `SubstanceQuantity`, `MedicalDevice`, `TestResult`
ner_vop_problem_reduced ner_vop_problem_reduced_emb_clinical_medium ner_vop_problem_reduced_emb_clinical_large ner_vop_problem_reduced_pipeline	Extracts clinical problems from the documents transferred from the patient’s own sentences. The taxonomy is reduced (one label for all clinical problems).	`Problem`, `HealthStatus`, `Modifier`

Example:

ner = MedicalNerModel.pretrained("ner_vop", "en", "clinical/models") \
    .setInputCols(["sentence", "token", "embeddings"]) \
    .setOutputCol("ner")
                             
sample_text = """Hello,I'm 20 year old girl. I'm diagnosed with hyperthyroid 1 month ago. I was feeling weak, light headed,poor digestion, panic attacks, depression, left chest pain, increased heart rate, rapidly weight loss, from 4 months. Because of this, I stayed in the hospital and just discharged from hospital."""

Result:

chunk	ner_label
20 year old	Age
girl	Gender
hyperthyroid	Disease
1 month ago	DateTime
weak	Symptom
light	Symptom
panic attacks	PsychologicalCondition
depression	PsychologicalCondition
left	Laterality
chest	BodyPart
pain	Symptom
increased	TestResult
heart rate	VitalTest
rapidly	Modifier
weight loss	Symptom
4 months	Duration
hospital	ClinicalDept
discharged	AdmissionDischarge

For all Voice of Patient models, please check: Models Hub Page

New BioBERT-Based VOP Classification Models for Biomedical Text Analysis

We are excited to introduce a new Bert-based Voice of Patient classifier models which are a collection of BioBERT-based classifiers designed for various text classification tasks in the biomedical domain. These models leverage the power of BERT, a transformer-based language model, to analyze and classify different types of textual data. They are trained to understand the nuances of medical language and concepts to make accurate predictions.

modelname	description	pred_entity
bert_sequence_classifier_vop_drug_side_effect bert_sequence_classifier_vop_drug_side_effect_pipeline	Classify informal texts (such as tweets or forum posts) according to the presence of drug side effects.	`Drug_AE`, `Other`
bert_sequence_classifier_vop_hcp_consult bert_sequence_classifier_vop_hcp_consult_pipeline	Identify texts that mention a HCP consult.	`Consulted_By_HCP`, `Other`
bert_sequence_classifier_vop_self_report bert_sequence_classifier_vop_self_report_pipeline	Classify texts depending on if they are self-reported or if they refer to another person.	`1st_Person`, `3rd_Person`
bert_sequence_classifier_vop_sound_medical bert_sequence_classifier_vop_sound_medical_pipeline	Identify whether the suggestion that is mentioned in the text is medically sound.	`True`, `False`

Example:

sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_vop_sound_medical", "en", "clinical/models")\
    .setInputCols(["document",'token'])\
    .setOutputCol("prediction")

sample_texts = ["I had a lung surgery for emphyema and after surgery my xray showing some recovery.",
                "I was advised to put honey on a burned skin."]

Result:

text	result
My friend was treated for her skin cancer two years ago.	3rd_Person
I started with dysphagia in 2021, then, a few weeks later, felt weakness in my legs, followed by a severe diarrhea.	1st_Person

We are excited to introduce a new model that specializes in identifying social determinants of health (SDOH) mentions. This model accurately recognizes instances where SDOH characteristics, including Access_To_Care, Community_Safety, Education, Food_Insecurity, Insurance_Status, and more entities are referenced. Its advanced capabilities provide valuable insights for SDOH-related analyses and applications.

Example:

ner_model = MedicalNerModel.pretrained("ner_sdoh", "en", "clinical/models")\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")

sample_texts = """Smith is living in New York, a divorced Mexcian American woman. She has recently been experiencing frequent hospitalizations due to uncontrolled blood sugar levels. Smith works as a cleaning assistant and cannot access health insurance or paid sick leave. Pt with likely long-standing depression.She has a long history of etoh abuse, beginning in her teens. She has been a daily drinker for 30 years. She had DUI in April and was due to court this week.""" 

Result:

	chunks	begin	end	entities
0	New York	20	27	Geographic_Entity
1	divorced	32	39	Marital_Status
2	Mexcian American	41	56	Race_Ethnicity
3	woman	58	62	Gender
4	She	65	67	Gender
5	hospitalizations	109	124	Other_SDoH_Keywords
6	cleaning assistant	183	200	Employment
7	health insurance	220	235	Insurance_Status
8	depression	286	295	Mental_Health
9	She	297	299	Gender
10	etoh abuse	323	332	Alcohol
11	her	348	350	Gender
12	teens	352	356	Age
13	She	359	361	Gender
14	daily	374	378	Substance_Frequency
15	drinker	380	386	Alcohol
16	30 years	392	399	Substance_Duration
17	She	402	404	Gender
18	DUI	410	412	Legal_Issues

See Models Hub Page for more details.

New NER Model For Precise Detection Of Demographic Characteristics In Clinical Notes

This new model identifies healthcare mentions that refer to a situation where a patient’s demographic characteristics, such as race, ethnicity, gender, age, socioeconomic status, or geographic location.

Example:

ner = MedicalNerModel.pretrained("ner_demographic_extended_healthcare","en","clinical/models")\
    .setInputCols(["sentence","token","embeddings"])\
    .setOutputCol("ner")


sample_text = """Patient Information:
Gender: Non-binary
Age: 68 years old
Race: Black
Employment status: Retired
Marital Status: Divorced
Sexual Orientation: Asexual
Religion: Judaism
Body Mass Index: 29.1
Unhealthy Habits: Substance use
Socioeconomic Status: Low Income
Area of Residence: Rural setting
Disability Status: Blindness
Chief Complaint:
The patient presented to the emergency department with complaint of severe chest pain that started suddenly while asleep.
"""

Result:

chunk	ner_label	confidence
Non-binary	Gender	0.9987
68 years old	Age	0.6892667
Black	Race_ethnicity	0.9226
Retired	Employment_status	0.9426
Divorced	Marital_Status	0.9996
Asexual	Sexual_orientation	1.0
Judaism	Religion	0.986
Substance use	Unhealthy_habits	0.48755002

See Models Hub Page for more details.

Updated Medicare Risk Adjustment Score Calculation Module Incorporating CMS’s Latest Proposed Updates Including The Version 28 Support

We are excited to introduce significant updates to the Medical Risk Adjustment module of Spark-NLP for Healthcare!

CMS (Centers for Medicare & Medicaid Services) releases updated versions of risk adjustment models periodically to account for changes in healthcare data, policy requirements, and advancements in statistical modeling techniques.

With this release, Spark-NLP for Healthcare incorporates the proposed updates by CMS for the risk adjustment module, featuring a major update represented as Version 28.

Each version of the risk adjustment module is associated with a specific year, indicating the time period for which the model is designed. For example, ESRDV21Y19 (ESRD version 21 year 19) refers to the 21st version of the risk adjustment model for End-Stage Renal Disease (ESRD) and is designed for the year 2019.

Updates on Risk Adjustment Module:

New modules added to Spark-NLP for Healthcare:

Version	Year	Module Name
28	Combined	`profileV28`
28	2024	`profileV28Y24`
24	Combined	`profileV24`
ESRDV21	2019	`profileESRDV21Y19`

A new feature has been introduced in this release that enhances the risk score calculation process. With this update, the risk score calculation now incorporates the coding pattern (intensity) adjustment and normalization factor for all versions and years.
The default parameters for the profile() method have been modified in this release.
Previous Default: Version 24, Year 19
Updated Default: Version 28, Year Combined
We have included the risk_score_adj and risk_score_age_adj information in the outputs of the profile methods.

Example:

Risk Adjustment Module should be used with the following information in order:

1- A list of ICD10 codes for the measurement year.
2- The age of the patient.
3- The gender of the patient; {“M”, “F”}
4- The eligibility segment of the patient. Allowed values are as follows:

- "CFA": Community Full Benefit Dual Aged
- "CFD": Community Full Benefit Dual Disabled
- "CNA": Community NonDual Aged
- "CND": Community NonDual Disabled
- "CPA": Community Partial Benefit Dual Aged
- "CPD": Community Partial Benefit Dual Disabled
- "INS": Long Term Institutional
- "NE": New Enrollee
- "SNPNE": SNP NE

5- Original reason for entitlement code (orec).

- "0": Old age and survivor's insurance
- "1": Disability insurance benefits
- "2": End-stage renal disease
- "3": Both DIB and ESRD

6- If the patient is in Medicaid or not.

#sample pyspark dataframe called df:
df.show(5)

>>>

ICD10CM_CODES	GENDER	AGE	MEDICAID	ELIGIBILITY
M86622, M0549, I…	M	32	true	CFD
E133311, E200, T…	M	32	true	CFD
C179, I70348, C8…	M	32	true	CFD
S72463A, C37, E1…	M	32	true	CFD
S1224XB, S72115B…	M	32	true	CFD

Applying profileV28Y24 module over the sample data:

df = df.withColumn("hcc_profile", profileV28Y24(df.ICD10CM_CODES, df.AGE, df.GENDER, df.ELIGIBILITY, df.OREC, df.MEDICAID))

Result:

ICD10CM_CODES	GENDER	AGE	MEDICAID	ELIGIBILITY	hcc_profile	risk_score	hcc_lst	risk_score_adj	risk_score_age_adj	hcc_map	parameters	details
M86622, M0549, I…	M	32	true	CFD	10.022, “HCC401…	10.022	“HCC401”,”HCC21”…	9.2913	0.1771	“L97214”:”HCC38…	“elig”:”CFD”,”ag…	“CFD_HCC267”:0.5…
E133311, E200, T…	M	32	true	CFD	9.374, “HCC280”…	9.374	“HCC280”,”CHR_LU…	8.6906	0.1771	“G308”:”HCC127”…	“elig”:”CFD”,”ag…	“CFD_HCC202”:0.2…
C179, I70348, C8…	M	32	true	CFD	12.988, “HCC17”…	12.988	“HCC17”,”HCC401”…	12.0411	0.1771	“C4A20”:”HCC21”…	“elig”:”CFD”,”ag…	“CFD_HCC280”:0.2…
S72463A, C37, E1…	M	32	true	CFD	17.034, “HF_CHR…	17.034	“HF_CHR_LUNG”,”H…	15.7921	0.1771	“C8296”:”HCC21”…	“elig”:”CFD”,”ag…	“CFD_HCC267”:0.5…
S1224XB, S72115B…	M	32	true	CFD	7.99, “HCC401”,…	7.99	“HCC401”,”HCC51”…	7.4075	0.1771	“D57818”:”HCC10…	“elig”:”CFD”,”ag…	“CFD_HCC267”:0.5…

For the detailed usage of the module, please visit Calculate Medicare Risk Adjustment Score notebook.

New Resources Downloader Notebook That Includes Comprehensive Guideline For Model Downloading

Model Download Helpers Notebook includes a comprehensive guide to the various parameters and functionalities of the ResourceDownloader annotator, which facilitates the downloading and management of resources such as pretrained models and pipelines. This annotator is designed to efficiently download and manage various resources, including pretrained models and pipelines without using your sensitive AWS keys in any script/ environment with no addional library (e.g. boto3, aws cli) needed.

Examples:

with S3 URI

s3_uri = "s3://auxdata.johnsnowlabs.com/public/models/nerdl_conll_elmo_en_4.0.0_3.0_1654103884644.zip"

ResourceDownloader.downloadModelDirectly(s3_uri,  "public/models", unzip=True)

with model name ```python model_name = “public/models/nerdl_restaurant_100d_en_3.3.4_3.0_1640949258750.zip”

ResourceDownloader.downloadModelDirectly(model_name, “public/models”, unzip=True)

- updateCacheModels
```python
from sparknlp_jsl.updateModels import UpdateModels
UpdateModels.updateCacheModels()

ls ~/cache_pretrained

# embeddings_clinical_en_2.0.2_2.4_1558454742956/
# ner_clinical_large_en_2.5.0_2.4_1590021302624/ 

Pretrained Pipelines Now Compatible With All PySpark Versions

We are thrilled to announce the release of our new PretrainedPipeline, specifically designed to be compatible with all versions of PySpark. This groundbreaking update ensures seamless integration and effortless deployment of Spark NLP’s powerful pretrained models across different PySpark environments.

Various Core Improvements: Bug Fixes, Enhanced Overall Robustness, And Reliability Of Spark NLP For Healthcare

The save() method on MedicalSummarization has been fixed, ensuring proper functionality.
The bug in the Chunk2Token Python class has been resolved, eliminating any issues related to it.
The Fine-tune issue in SentenceEntityResolver has been fixed, enhancing its performance and accuracy.
The sparknlp_jsl.start()function now works correctly with the apple_silicon and aarch64 parameters.
The default scopeWindow parameter on AssertionDLApproach has been updated to (9,15), enhancing its effectiveness in processing textual assertions.
Compilation issues related to SparkNLP v443 have been resolved, resulting in smoother operation.
We have improved the documentation, providing clearer instructions and explanations for easier usage.

Updated Notebooks And Demonstrations For Making Spark NLP For Healthcare Easier To Navigate And Understand

New Extractive Summarization Notebook
New Model Download Helpers Notebook
Updated Biogpt_Chat_JSL Notebook for latest models
Updated Clinical MultiLanguage Deidentification Notebook for latest models
New All in One Voice Of Patient
New Voice Of Patient CLASSIFICATION_SIDE_EFFECT Demo
Updated SOCIAL DETERMINANT CLASSIFICATION GENERIC Demo
Updated CLINICAL DEIDENTIFICATION MULTI-LANGUAGE Demo

We Have Added And Updated A Substantial Number Of New Clinical Models And Pipelines, Further Solidifying Our Offering In The Healthcare Domain.

clinical_deidentification -> ar
summarizer_clinical_laymen_pipeline
ner_demographic_extended_healthcare
ner_sdoh
ner_vop_anatomy
ner_vop_anatomy_emb_clinical_medium
ner_vop_anatomy_emb_clinical_large
ner_vop_anatomy_pipeline
ner_vop_clinical_dept
ner_vop_clinical_dept_emb_clinical_medium
ner_vop_clinical_dept_emb_clinical_large
ner_vop_clinical_dept_pipeline
ner_vop_demographic
ner_vop_demographic_emb_clinical_medium
ner_vop_demographic_emb_clinical_large
ner_vop_demographic_pipeline
ner_vop_problem
ner_vop_problem_emb_clinical_medium
ner_vop_problem_emb_clinical_large
ner_vop_problem_pipeline
ner_vop_test
ner_vop_test_emb_clinical_medium
ner_vop_test_emb_clinical_large
ner_vop_test_pipeline
ner_vop_temporal
ner_vop_temporal_emb_clinical_medium
ner_vop_temporal_emb_clinical_large_final
ner_vop_temporal_pipeline
ner_vop_treatment
ner_vop_treatment_emb_clinical_medium
ner_vop_treatment_emb_clinical_large
ner_vop_treatment_pipeline
ner_vop
ner_vop_emb_clinical_medium
ner_vop_emb_clinical_large
ner_vop_pipeline
ner_vop_problem_reduced
ner_vop_problem_reduced_emb_clinical_medium
ner_vop_problem_reduced_emb_clinical_large
ner_vop_problem_reduced_pipeline
bert_sequence_classifier_vop_drug_side_effect
bert_sequence_classifier_vop_drug_side_effect_pipeline
bert_sequence_classifier_vop_hcp_consult
bert_sequence_classifier_vop_hcp_consult_pipeline
bert_sequence_classifier_vop_self_report
bert_sequence_classifier_vop_self_report_pipeline
bert_sequence_classifier_vop_sound_medical
bert_sequence_classifier_vop_sound_medical_pipeline

For all Spark NLP for Healthcare models, please check: Models Hub Page

Versions

Version
Version
Version

PREVIOUSVersion Compatibility

4.4.4

Highlights

Enhanced PySpark v3.4.X Support for Advanced Natural Language Processing

New Module Focused On Extracting The Most Relevant Information With Extractive Summarization

Customized Prompts in TextGenerator Annotator

One Liner Arabic Language Clinical Deidentification Pipeline

New Arabic Clinical Deidentification Pipeline

36 New Voice Of Patient (VOP) NER Models And Pipelines For Entity Extraction From Patient’s Own Words

New BioBERT-Based VOP Classification Models for Biomedical Text Analysis

Enhanced Entity Detection Accuracy With The Official Version Of Social Determinants Of Health (SDOH) Model

New NER Model For Precise Detection Of Demographic Characteristics In Clinical Notes

Updated Medicare Risk Adjustment Score Calculation Module Incorporating CMS’s Latest Proposed Updates Including The Version 28 Support

New Resources Downloader Notebook That Includes Comprehensive Guideline For Model Downloading

Pretrained Pipelines Now Compatible With All PySpark Versions

Various Core Improvements: Bug Fixes, Enhanced Overall Robustness, And Reliability Of Spark NLP For Healthcare

Updated Notebooks And Demonstrations For Making Spark NLP For Healthcare Easier To Navigate And Understand

We Have Added And Updated A Substantial Number Of New Clinical Models And Pipelines, Further Solidifying Our Offering In The Healthcare Domain.

Versions

Customized Prompts in `TextGenerator` Annotator