4.4.4
Highlights
We are delighted to announce a suite of remarkable enhancements and updates in our latest release of Spark NLP for Healthcare. This release comes with 40+ new clinical pretrained models and pipelines, and is a testament to our commitment to continuously innovate and improve, furnishing you with a more sophisticated and powerful toolkit for healthcare natural language processing.
- Enhanced PySpark v3.4.X support for advanced Natural Language Processing
- New module focused on extracting the most relevant information with Extractive Summarization
- Customized prompts in
TextGenerator
Annotator - Arabic language obfuscation support in Deidentification
- One liner Arabic language Clinical Deidentification Pipeline
- 36 New Voice of Patient (VOP) NER models and pipelines for entity extraction from patient’s own words (usually in non-medical jargon)
- New BioBERT-based VOP Classification Models for classifying certain tones (if HCP consult, medically sound, mention of ADE, self-reported etc.) in patient’s own words
- Enhanced entity detection accuracy with the official version of Social Determinants of Health (SDOH) model
- New NER model for precise detection of demographic characteristics in clinical notes
- Updated Medicare Risk Adjustment score calculation module incorporating CMS’s latest proposed updates including the Version 28 support
- New Resources Downloader Notebook that includes comprehensive guideline for model downloading
- Pretrained pipelines now compatible with all PySpark versions
- Various core improvements; bug fixes, enhanced overall robustness and reliability of Spark NLP for Healthcare
- Updated notebooks and demonstrations for making Spark NLP for Healthcare easier to navigate and understand
- The addition and update of numerous new clinical models and pipelines continue to reinforce our offering in the healthcare domain
We believe that these enhancements will elevate your experience with Spark NLP for Healthcare, enabling more efficient, accurate, and streamlined analysis of healthcare-related natural language data.
Enhanced PySpark v3.4.X Support for Advanced Natural Language Processing
SparkNLP, now offers enhanced support for PySpark v3.4, enabling data scientists and NLP practitioners to leverage the latest features and capabilities of Apache Spark while working with text data.
New Module Focused On Extracting The Most Relevant Information With Extractive Summarization
Extractive summarization focuses on extracting the most relevant information rather than generating new content. The process typically includes preprocessing the text, identifying important sentences using various criteria, ranking them based on their importance, and selecting the top-ranked sentences for the final summary. Extractive summarization is favored for its objectivity, preserving the factual accuracy of the original text.
Parameters:
similarityThreshold
: Sets the minimal cosine similarity between sentences to consider them similar.summarySize
: Sets the number of sentences to summarize the text
Example:
sentence_embeddings = BertSentenceEmbeddings()\
.pretrained("sent_small_bert_L2_128")\
.setInputCols(["sentences"])\
.setOutputCol("sentence_embeddings")
summarizer = ExtractiveSummarization()\
.setInputCols(["sentences", "sentence_embeddings"])\
.setOutputCol("summaries")\
.setSummarySize(2)\
.setSimilarityThreshold(0)
text = """Residual disease after initial surgery for ovarian cancer is the strongest prognostic factor for survival. However, the extent of surgical resection required to achieve optimal cytoreduction is controversial. Our goal was to estimate the effect of aggressive surgical resection on ovarian cancer patient survival.
A retrospective cohort study of consecutive patients with International Federation of Gynecology and Obstetrics stage IIIC ovarian cancer undergoing primary surgery was conducted between January 1, 1994, and December 31, 1998. The main outcome measures were residual disease after cytoreduction, frequency of radical surgical resection, and 5-year disease-specific survival.
The study comprised 194 patients, including 144 with carcinomatosis. The mean patient age and follow-up time were 64.4 and 3.5 years, respectively. After surgery, 131 (67.5%) of the 194 patients had less than 1 cm of residual disease (definition of optimal cytoreduction). Considering all patients, residual disease was the only independent predictor of survival; the need to perform radical procedures to achieve optimal cytoreduction was not associated with a decrease in survival. For the subgroup of patients with carcinomatosis, residual disease and the performance of radical surgical procedures were the only independent predictors. Disease-specific survival was markedly improved for patients with carcinomatosis operated on by surgeons who most frequently used radical procedures compared with those least likely to use radical procedures (44% versus 17%, P < .001).
Overall, residual disease was the only independent predictor of survival. Minimizing residual disease through aggressive surgical resection was beneficial, especially in patients with carcinomatosis."""
Result:
'The main outcome measures were residual disease after cytoreduction, frequency of radical surgical resection, and 5-year disease-specific survival.\nThe study comprised 194 patients, including 144 with carcinomatosis.',
'Considering all patients, residual disease was the only independent predictor of survival; the need to perform radical procedures to achieve optimal cytoreduction was not associated with a decrease in survival. For the subgroup of patients with carcinomatosis, residual disease and the performance of radical surgical procedures were the only independent predictors.'
See Extractive Summarization Notebook for examples.
Customized Prompts in TextGenerator
Annotator
The MedicalTextGenerator() function incorporates a powerful feature called SetCustomPrompt, designed to enhance text generation capabilities. By utilizing this feature, users can input a custom prompt, typically in the format of “question: {DOCUMENT} answer:”. This structure allows for the generation of informative and contextually relevant medical text responses. This feature enhances the flexibility and usability of the MedicalTextGenerator() function, empowering users to generate accurate and contextually appropriate medical text with ease.
Example:
gpt_qa = MedicalTextGenerator().pretrained("biogpt_chat_jsl", "en", "clinical/models")\
.setInputCols("documents")\
.setOutputCol("answer")\
.setMaxNewTokens(299)\
.setStopAtEos(True)\
.setDoSample(False)\
.setTopK(3)\
.setRandomSeed(42)\
.setCustomPrompt("question: {DOCUMENT} answer:")
text = "What medications are commonly used to treat emphysema?"
Result:
question: What medications are commonly used to treat emphysema ?
answer: Hello, There are two types of medications to treat emphysema: 1. Alpha agonists ( like albuterol or albuterol / levosalbutamol ) are used to treat symptoms of shortness of breath ( SOB ) and tightness in the chest ( tightness in chest ). These meds cause a mild to moderate increase in heart rate ( tachycardia ). 2. Beta blockers ( like propranolol or metoprolol ) are used to treat or to prevent symptoms of heart failure ( ejection fraction is 20 % or less ). These medications cause or worsen shortness of breath, tightness in chest, heart rate. You can take a combination of these medications. The combination that you will work best is a two - pill combination of albuterol and propranolol ( half tablet twice a day ). This will reduce or eliminate the need for albuterol and albuterol / levosalbutamol in your case. The other medications are used in consultation with your physician.
See Medical Text Generation Notebook for examples.
One Liner Arabic Language Clinical Deidentification Pipeline
We’re thrilled to announce that Spark NLP for Healthcare now supports obfuscation in Arabic De-identification (Deid) models. This feature enhances data privacy by substituting sensitive Protected Health Information (PHI) with corresponding synthetic data, while preserving data integrity through observance of certain consistency rules.
Example:
deid_masked_entity = DeIdentification()\
.setInputCols(["sentence", "token", "ner_chunk"])\
.setOutputCol("masked_with_entity")\
.setMode("mask")\
.setLanguage('ar')\
.setMaskingPolicy("entity_labels")
deid_masked_char = DeIdentification()\
.setInputCols(["sentence", "token", "ner_chunk"])\
.setOutputCol("masked_with_chars")\
.setMode("mask")\
.setLanguage('ar')\
.setMaskingPolicy("same_length_chars")
deid_masked_fixed_char = DeIdentification()\
.setInputCols(["sentence", "token", "ner_chunk"])\
.setOutputCol("masked_fixed_length_chars")\
.setMode("mask")\
.setLanguage('ar')\
.setMaskingPolicy("fixed_length_chars")\
.setFixedMaskLength(4)
obfuscation = DeIdentification()\
.setInputCols(["sentence", "token", "ner_chunk"]) \
.setOutputCol("obfuscated") \
.setMode("obfuscate")\
.setLanguage('ar')\
.setObfuscateDate(True)\
.setObfuscateRefSource("faker") \
.setRegexOverride(True)
text = '''
الملاحظات السريرية - مريض السكري
التاريخ: 11 مايو 1999
اسم المريض: فاطمة علي
العنوان: شارع الحرية ، حي السلام ، القاهرة
دولة: مصر
اسم المستشفى: مستشفى الشفاء
اسم الطبيب: د.محمد صلاح
'''
Result:
original sentence | Masked | Masked with Chars | Masked with Fixed Chars | Obfuscated | |
---|---|---|---|---|---|
0 | الملاحظات السريرية - مريض السكري | الملاحظات السريرية - مريض السكري | الملاحظات السريرية - مريض السكري | الملاحظات السريرية - مريض السكري | الملاحظات السريرية - مريض السكري |
1 | التاريخ: 11 مايو 1999 | التاريخ: [تاريخ] | التاريخ: [٭٭٭٭٭٭٭٭٭٭] | التاريخ: ٭٭٭٭ | التاريخ: 11 يوليو 1999 |
2 | اسم المريض: فاطمة علي | اسم المريض: [الاسم] | اسم المريض: [٭٭٭٭٭٭٭] | اسم المريض: ٭٭٭٭ | اسم المريض: رياض محروق |
3 | العنوان: شارع الحرية ، حي السلام ، القاهرة | العنوان: شارع الحرية ، حي [الموقع] ، [الموقع] | العنوان: شارع الحرية ، حي [٭٭٭٭] ، [٭٭٭٭٭] | العنوان: شارع الحرية ، حي ٭٭٭٭ ، ٭٭٭٭ | العنوان: شارع الحرية ، حي شارع الجنوب أفريقي ، شارع التقدم |
4 | دولة: مصر | دولة: [الموقع] | دولة: [٭] | دولة: ٭٭٭٭ | دولة: الشارع الأسفلتي |
5 | اسم المستشفى: مستشفى الشفاء | اسم المستشفى: مستشفى الشفاء | اسم المستشفى: مستشفى الشفاء | اسم المستشفى: مستشفى الشفاء | اسم المستشفى: مستشفى الشفاء |
6 | اسم الطبيب: د.محمد صلاح | اسم الطبيب: [الاسم] | اسم الطبيب: [٭٭٭٭٭٭٭٭٭] | اسم الطبيب: ٭٭٭٭ | اسم الطبيب: ليث مصري |
See Clinincal Multi Language Deidentification Notebook for examples.
New Arabic Clinical Deidentification Pipeline
This pipeline can be used to deidentify Arabic PHI information from medical texts in one one liner to ease the process of building the entire pipeline one by one. The PHI information will be masked and obfuscated in the resulting text. The pipeline can mask and obfuscate CONTACT
, NAME
, DATE
, ID
, LOCATION
, AGE
, PATIENT
, HOSPITAL
, ORGANIZATION
, CITY
, STREET
, USERNAME
, SEX
, IDNUM
, EMAIL
, ZIP
, MEDICALRECORD
, PROFESSION
, PHONE
, COUNTRY
, DOCTOR
, SSN
, ACCOUNT
, LICENSE
, DLN
and VIN
.
Example:
from sparknlp.pretrained import PretrainedPipeline
deid_pipeline_ar = PretrainedPipeline("clinical_deidentification", "ar", "clinical/models")
text = """
ملاحظات سريرية - مريض الربو:
التاريخ: 30 مايو 2023
اسم المريضة: ليلى حسن
تم تسجيل المريض في النظام باستخدام رقم الضمان الاجتماعي 123456789012.
العنوان: شارع المعرفة، مبنى رقم 789، حي الأمانة، جدة
الرمز البريدي: 54321
البلد: المملكة العربية السعودية
اسم المستشفى: مستشفى النور
اسم الطبيب: د. أميرة أحمد
"""
Result:
Sentence | masked_with_entity | Masked with Chars | Masked with Fixed Chars | Obfuscated | |
---|---|---|---|---|---|
0 | ملاحظات سريرية - مريض الربو:\nالتاريخ: 30 مايو 2023 | ملاحظات سريرية - مريض الربو:\nالتاريخ: [تاريخ] [تاريخ] | ملاحظات سريرية - مريض الربو:\nالتاريخ: [٭٭٭٭٭] [٭٭] | ملاحظات سريرية - مريض الربو:\nالتاريخ: ٭٭٭٭ ٭٭٭٭ | ملاحظات سريرية - مريض الربو:\nالتاريخ: 30 يونيو 2024 |
1 | اسم المريضة: ليلى حسن | اسم المريضة: [المريض] | اسم المريضة: [٭٭٭٭٭٭] | اسم المريضة: ٭٭٭٭ | اسم المريضة: أدهم جبالي |
2 | تم تسجيل المريض في النظام باستخدام رقم الضمان الاجتماعي 123456789012. | تم تسجيل المريض في النظام باستخدام رقم الضمان الاجتماعي [هاتف]. | تم تسجيل المريض في النظام باستخدام رقم الضمان الاجتماعي [٭٭٭٭٭٭٭٭٭٭]. | تم تسجيل المريض في النظام باستخدام رقم الضمان الاجتماعي ٭٭٭٭. | تم تسجيل المريض في النظام باستخدام رقم الضمان الاجتماعي 963525347201. |
3 | العنوان: شارع المعرفة، مبنى رقم 789، حي الأمانة، جدة | العنوان: شارع المعرفة، مبنى رقم [الرمز البريدي] [المدينة] [المدينة] | العنوان: شارع المعرفة، مبنى رقم [٭٭] [٭٭٭٭٭٭٭٭٭] [٭] | العنوان: شارع المعرفة، مبنى رقم ٭٭٭٭ ٭٭٭٭ ٭٭٭٭ | العنوان: شارع المعرفة، مبنى رقم 160، كلميم سانت كاترين |
4 | الرمز البريدي: 54321 | الرمز البريدي: [الرمز البريدي] | الرمز البريدي: [٭٭٭] | الرمز البريدي: ٭٭٭٭ | الرمز البريدي: 79915 |
5 | البلد: المملكة العربية السعودية | البلد: [المدينة] [البلد] | البلد: [٭٭٭٭٭٭٭٭٭٭٭٭٭] [٭٭٭٭٭٭] | البلد: ٭٭٭٭ ٭٭٭٭ | البلد: زغوان الغربية أوزبكستان |
6 | اسم المستشفى: مستشفى النور | اسم المستشفى: [الموقع] | اسم المستشفى: [٭٭٭٭٭٭٭٭٭٭] | اسم المستشفى: ٭٭٭٭ | اسم المستشفى: شارع المدارس |
7 | اسم الطبيب: د. أميرة أحمد | اسم الطبيب: د. [دكتور] | اسم الطبيب: د. [٭٭٭٭٭٭٭٭] | اسم الطبيب: د. ٭٭٭٭ | اسم الطبيب: د. هندية قبلاوي |
See Clinincal Multi Language Deidentification Notebook for examples.
36 New Voice Of Patient (VOP) NER Models And Pipelines For Entity Extraction From Patient’s Own Words
We are excited to introduce our new NER models which extract clinical entities from the documents that are shared by patients in their own words.
model_name | description | predicted_entity |
---|---|---|
ner_vop_anatomy_emb ner_vop_anatomy_emb_clinical_medium ner_vop_anatomy_emb_clinical_large ner_vop_anatomy_pipeline |
Extracts anatomical terms from the documents transferred from the patient’s own sentences. | BodyPart , Laterality |
ner_vop_clinical_dept ner_vop_clinical_dept_emb_clinical_medium ner_vop_clinical_dept_emb_clinical_large ner_vop_clinical_dept_pipeline |
Extracts medical devices and clinical department mentions terms from the documents transferred from the patient’s own sentences. | AdmissionDischarge , ClinicalDept , MedicalDevice |
ner_vop_demographic ner_vop_demographic_emb_clinical_medium ner_vop_demographic_emb_clinical_large ner_vop_demographic_pipeline |
Extracts demographic terms from the documents transferred from the patient’s own sentences. | Gender , Employment , RaceEthnicity , Age , Substance , RelationshipStatus , SubstanceQuantity |
ner_vop_problem ner_vop_problem_emb_clinical_medium ner_vop_problem_emb_clinical_large ner_vop_problem_pipeline |
Extracts clinical problems from the documents transferred from the patient’s own sentences using a granular taxonomy. | PsychologicalCondition , Disease , Symptom , HealthStatus , Modifier , InjuryOrPoisoning |
ner_vop_test ner_vop_test_emb_clinical_medium ner_vop_test_emb_clinical_large ner_vop_test_pipeline |
Extracts test mentions from the documents transferred from the patient’s own sentences. | VitalTest , Test , Measurements , TestResult |
ner_vop_temporal ner_vop_temporal_emb_clinical_medium ner_vop_temporal_emb_clinical_large_final ner_vop_temporal_pipeline |
Extracts temporal references from the documents transferred from the patient’s own sentences. | DateTime , Frequency , Duration |
ner_vop_treatment ner_vop_treatment_emb_clinical_medium ner_vop_treatment_emb_clinical_large ner_vop_treatment_pipeline |
Extracts treatments mentioned in documents transferred from the patient’s own sentences. | Drug , Form , Dosage , Frequency , Route , Duration , Procedure , Treatment |
ner_vop ner_vop_emb_clinical_medium ner_vop_emb_clinical_large ner_vop_pipeline |
Extracts healthcare-related terms from the documents transferred from the patient’s own sentences. | Gender , Employment , Age , BodyPart , Substance , Form , PsychologicalCondition , Vaccine , Drug , DateTime , ClinicalDept , Laterality , Test , AdmissionDischarge , Disease , VitalTest , Dosage , Duration , RelationshipStatus , Route , Allergen , Frequency , Symptom , Procedure , HealthStatus , InjuryOrPoisoning , Modifier , Treatment , SubstanceQuantity , MedicalDevice , TestResult |
ner_vop_problem_reduced ner_vop_problem_reduced_emb_clinical_medium ner_vop_problem_reduced_emb_clinical_large ner_vop_problem_reduced_pipeline |
Extracts clinical problems from the documents transferred from the patient’s own sentences. The taxonomy is reduced (one label for all clinical problems). | Problem , HealthStatus , Modifier |
Example:
ner = MedicalNerModel.pretrained("ner_vop", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
sample_text = """Hello,I'm 20 year old girl. I'm diagnosed with hyperthyroid 1 month ago. I was feeling weak, light headed,poor digestion, panic attacks, depression, left chest pain, increased heart rate, rapidly weight loss, from 4 months. Because of this, I stayed in the hospital and just discharged from hospital."""
Result:
chunk | ner_label |
---|---|
20 year old | Age |
girl | Gender |
hyperthyroid | Disease |
1 month ago | DateTime |
weak | Symptom |
light | Symptom |
panic attacks | PsychologicalCondition |
depression | PsychologicalCondition |
left | Laterality |
chest | BodyPart |
pain | Symptom |
increased | TestResult |
heart rate | VitalTest |
rapidly | Modifier |
weight loss | Symptom |
4 months | Duration |
hospital | ClinicalDept |
discharged | AdmissionDischarge |
For all Voice of Patient models, please check: Models Hub Page
New BioBERT-Based VOP Classification Models for Biomedical Text Analysis
We are excited to introduce a new Bert-based Voice of Patient classifier models which are a collection of BioBERT-based classifiers designed for various text classification tasks in the biomedical domain. These models leverage the power of BERT, a transformer-based language model, to analyze and classify different types of textual data. They are trained to understand the nuances of medical language and concepts to make accurate predictions.
modelname | description | pred_entity |
---|---|---|
bert_sequence_classifier_vop_drug_side_effect bert_sequence_classifier_vop_drug_side_effect_pipeline |
Classify informal texts (such as tweets or forum posts) according to the presence of drug side effects. | Drug_AE , Other |
bert_sequence_classifier_vop_hcp_consult bert_sequence_classifier_vop_hcp_consult_pipeline |
Identify texts that mention a HCP consult. | Consulted_By_HCP , Other |
bert_sequence_classifier_vop_self_report bert_sequence_classifier_vop_self_report_pipeline |
Classify texts depending on if they are self-reported or if they refer to another person. | 1st_Person , 3rd_Person |
bert_sequence_classifier_vop_sound_medical bert_sequence_classifier_vop_sound_medical_pipeline |
Identify whether the suggestion that is mentioned in the text is medically sound. | True , False |
Example:
sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_vop_sound_medical", "en", "clinical/models")\
.setInputCols(["document",'token'])\
.setOutputCol("prediction")
sample_texts = ["I had a lung surgery for emphyema and after surgery my xray showing some recovery.",
"I was advised to put honey on a burned skin."]
Result:
text | result |
---|---|
My friend was treated for her skin cancer two years ago. | 3rd_Person |
I started with dysphagia in 2021, then, a few weeks later, felt weakness in my legs, followed by a severe diarrhea. | 1st_Person |
Enhanced Entity Detection Accuracy With The Official Version Of Social Determinants Of Health (SDOH) Model
We are excited to introduce a new model that specializes in identifying social determinants of health (SDOH) mentions. This model accurately recognizes instances where SDOH characteristics, including Access_To_Care
, Community_Safety
, Education
, Food_Insecurity
, Insurance_Status
, and more entities are referenced. Its advanced capabilities provide valuable insights for SDOH-related analyses and applications.
Example:
ner_model = MedicalNerModel.pretrained("ner_sdoh", "en", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
sample_texts = """Smith is living in New York, a divorced Mexcian American woman. She has recently been experiencing frequent hospitalizations due to uncontrolled blood sugar levels. Smith works as a cleaning assistant and cannot access health insurance or paid sick leave. Pt with likely long-standing depression.She has a long history of etoh abuse, beginning in her teens. She has been a daily drinker for 30 years. She had DUI in April and was due to court this week."""
Result:
chunks | begin | end | entities | |
---|---|---|---|---|
0 | New York | 20 | 27 | Geographic_Entity |
1 | divorced | 32 | 39 | Marital_Status |
2 | Mexcian American | 41 | 56 | Race_Ethnicity |
3 | woman | 58 | 62 | Gender |
4 | She | 65 | 67 | Gender |
5 | hospitalizations | 109 | 124 | Other_SDoH_Keywords |
6 | cleaning assistant | 183 | 200 | Employment |
7 | health insurance | 220 | 235 | Insurance_Status |
8 | depression | 286 | 295 | Mental_Health |
9 | She | 297 | 299 | Gender |
10 | etoh abuse | 323 | 332 | Alcohol |
11 | her | 348 | 350 | Gender |
12 | teens | 352 | 356 | Age |
13 | She | 359 | 361 | Gender |
14 | daily | 374 | 378 | Substance_Frequency |
15 | drinker | 380 | 386 | Alcohol |
16 | 30 years | 392 | 399 | Substance_Duration |
17 | She | 402 | 404 | Gender |
18 | DUI | 410 | 412 | Legal_Issues |
See Models Hub Page for more details.
New NER Model For Precise Detection Of Demographic Characteristics In Clinical Notes
This new model identifies healthcare mentions that refer to a situation where a patient’s demographic characteristics, such as race
, ethnicity
, gender
, age
, socioeconomic
status
, or geographic location
.
Example:
ner = MedicalNerModel.pretrained("ner_demographic_extended_healthcare","en","clinical/models")\
.setInputCols(["sentence","token","embeddings"])\
.setOutputCol("ner")
sample_text = """Patient Information:
Gender: Non-binary
Age: 68 years old
Race: Black
Employment status: Retired
Marital Status: Divorced
Sexual Orientation: Asexual
Religion: Judaism
Body Mass Index: 29.1
Unhealthy Habits: Substance use
Socioeconomic Status: Low Income
Area of Residence: Rural setting
Disability Status: Blindness
Chief Complaint:
The patient presented to the emergency department with complaint of severe chest pain that started suddenly while asleep.
"""
Result:
chunk | ner_label | confidence |
---|---|---|
Non-binary | Gender | 0.9987 |
68 years old | Age | 0.6892667 |
Black | Race_ethnicity | 0.9226 |
Retired | Employment_status | 0.9426 |
Divorced | Marital_Status | 0.9996 |
Asexual | Sexual_orientation | 1.0 |
Judaism | Religion | 0.986 |
Substance use | Unhealthy_habits | 0.48755002 |
See Models Hub Page for more details.
Updated Medicare Risk Adjustment Score Calculation Module Incorporating CMS’s Latest Proposed Updates Including The Version 28 Support
We are excited to introduce significant updates to the Medical Risk Adjustment module of Spark-NLP for Healthcare!
CMS (Centers for Medicare & Medicaid Services) releases updated versions of risk adjustment models periodically to account for changes in healthcare data, policy requirements, and advancements in statistical modeling techniques.
With this release, Spark-NLP for Healthcare incorporates the proposed updates by CMS for the risk adjustment module, featuring a major update represented as Version 28.
Each version of the risk adjustment module is associated with a specific year, indicating the time period for which the model is designed.
For example, ESRDV21Y19
(ESRD version 21 year 19) refers to the 21st version of the risk adjustment model for End-Stage Renal Disease (ESRD) and is designed for the year 2019.
Updates on Risk Adjustment Module:
- New modules added to Spark-NLP for Healthcare:
Version | Year | Module Name |
---|---|---|
28 | Combined | profileV28 |
28 | 2024 | profileV28Y24 |
24 | Combined | profileV24 |
ESRDV21 | 2019 | profileESRDV21Y19 |
-
A new feature has been introduced in this release that enhances the risk score calculation process. With this update, the risk score calculation now incorporates the coding pattern (intensity) adjustment and normalization factor for all versions and years.
-
The default parameters for the
profile()
method have been modified in this release.
Previous Default: Version 24, Year 19
Updated Default: Version 28, Year Combined -
We have included the
risk_score_adj
andrisk_score_age_adj
information in the outputs of the profile methods.
Example:
Risk Adjustment Module should be used with the following information in order:
1- A list of ICD10 codes for the measurement year.
2- The age of the patient.
3- The gender of the patient; {“M”, “F”}
4- The eligibility segment of the patient. Allowed values are as follows:
- "CFA": Community Full Benefit Dual Aged - "CFD": Community Full Benefit Dual Disabled - "CNA": Community NonDual Aged - "CND": Community NonDual Disabled - "CPA": Community Partial Benefit Dual Aged - "CPD": Community Partial Benefit Dual Disabled - "INS": Long Term Institutional - "NE": New Enrollee - "SNPNE": SNP NE
5- Original reason for entitlement code (orec).
- "0": Old age and survivor's insurance - "1": Disability insurance benefits - "2": End-stage renal disease - "3": Both DIB and ESRD
6- If the patient is in Medicaid or not.
#sample pyspark dataframe called df:
df.show(5)
>>>
ICD10CM_CODES | GENDER | OREC | AGE | MEDICAID | ELIGIBILITY |
---|---|---|---|---|---|
M86622, M0549, I… | M | 0 | 32 | true | CFD |
E133311, E200, T… | M | 0 | 32 | true | CFD |
C179, I70348, C8… | M | 0 | 32 | true | CFD |
S72463A, C37, E1… | M | 0 | 32 | true | CFD |
S1224XB, S72115B… | M | 0 | 32 | true | CFD |
Applying profileV28Y24
module over the sample data:
df = df.withColumn("hcc_profile", profileV28Y24(df.ICD10CM_CODES, df.AGE, df.GENDER, df.ELIGIBILITY, df.OREC, df.MEDICAID))
Result:
ICD10CM_CODES | GENDER | OREC | AGE | MEDICAID | ELIGIBILITY | hcc_profile | risk_score | hcc_lst | risk_score_adj | risk_score_age_adj | hcc_map | parameters | details |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
M86622, M0549, I… | M | 0 | 32 | true | CFD | 10.022, “HCC401… | 10.022 | “HCC401”,”HCC21”… | 9.2913 | 0.1771 | “L97214”:”HCC38… | “elig”:”CFD”,”ag… | “CFD_HCC267”:0.5… |
E133311, E200, T… | M | 0 | 32 | true | CFD | 9.374, “HCC280”… | 9.374 | “HCC280”,”CHR_LU… | 8.6906 | 0.1771 | “G308”:”HCC127”… | “elig”:”CFD”,”ag… | “CFD_HCC202”:0.2… |
C179, I70348, C8… | M | 0 | 32 | true | CFD | 12.988, “HCC17”… | 12.988 | “HCC17”,”HCC401”… | 12.0411 | 0.1771 | “C4A20”:”HCC21”… | “elig”:”CFD”,”ag… | “CFD_HCC280”:0.2… |
S72463A, C37, E1… | M | 0 | 32 | true | CFD | 17.034, “HF_CHR… | 17.034 | “HF_CHR_LUNG”,”H… | 15.7921 | 0.1771 | “C8296”:”HCC21”… | “elig”:”CFD”,”ag… | “CFD_HCC267”:0.5… |
S1224XB, S72115B… | M | 0 | 32 | true | CFD | 7.99, “HCC401”,… | 7.99 | “HCC401”,”HCC51”… | 7.4075 | 0.1771 | “D57818”:”HCC10… | “elig”:”CFD”,”ag… | “CFD_HCC267”:0.5… |
For the detailed usage of the module, please visit Calculate Medicare Risk Adjustment Score notebook.
New Resources Downloader Notebook That Includes Comprehensive Guideline For Model Downloading
Model Download Helpers Notebook includes a comprehensive guide to the various parameters and functionalities of the ResourceDownloader annotator, which facilitates the downloading and management of resources such as pretrained models and pipelines. This annotator is designed to efficiently download and manage various resources, including pretrained models and pipelines without using your sensitive AWS keys in any script/ environment with no addional library (e.g. boto3, aws cli) needed.
Examples:
- with S3 URI
s3_uri = "s3://auxdata.johnsnowlabs.com/public/models/nerdl_conll_elmo_en_4.0.0_3.0_1654103884644.zip"
ResourceDownloader.downloadModelDirectly(s3_uri, "public/models", unzip=True)
- with model name ```python model_name = “public/models/nerdl_restaurant_100d_en_3.3.4_3.0_1640949258750.zip”
ResourceDownloader.downloadModelDirectly(model_name, “public/models”, unzip=True)
- updateCacheModels
```python
from sparknlp_jsl.updateModels import UpdateModels
UpdateModels.updateCacheModels()
ls ~/cache_pretrained
# embeddings_clinical_en_2.0.2_2.4_1558454742956/
# ner_clinical_large_en_2.5.0_2.4_1590021302624/
Pretrained Pipelines Now Compatible With All PySpark Versions
We are thrilled to announce the release of our new PretrainedPipeline, specifically designed to be compatible with all versions of PySpark. This groundbreaking update ensures seamless integration and effortless deployment of Spark NLP’s powerful pretrained models across different PySpark environments.
Various Core Improvements: Bug Fixes, Enhanced Overall Robustness, And Reliability Of Spark NLP For Healthcare
- The
save()
method onMedicalSummarization
has been fixed, ensuring proper functionality. - The bug in the
Chunk2Token
Python class has been resolved, eliminating any issues related to it. - The Fine-tune issue in
SentenceEntityResolver
has been fixed, enhancing its performance and accuracy. - The
sparknlp_jsl.start()
function now works correctly with theapple_silicon
andaarch64
parameters. - The default
scopeWindow
parameter onAssertionDLApproach
has been updated to (9,15), enhancing its effectiveness in processing textual assertions. - Compilation issues related to SparkNLP v443 have been resolved, resulting in smoother operation.
- We have improved the documentation, providing clearer instructions and explanations for easier usage.
Updated Notebooks And Demonstrations For Making Spark NLP For Healthcare Easier To Navigate And Understand
- New Extractive Summarization Notebook
- New Model Download Helpers Notebook
- Updated Biogpt_Chat_JSL Notebook for latest models
- Updated Clinical MultiLanguage Deidentification Notebook for latest models
- New All in One Voice Of Patient
- New Voice Of Patient CLASSIFICATION_SIDE_EFFECT Demo
- Updated SOCIAL DETERMINANT CLASSIFICATION GENERIC Demo
- Updated CLINICAL DEIDENTIFICATION MULTI-LANGUAGE Demo
We Have Added And Updated A Substantial Number Of New Clinical Models And Pipelines, Further Solidifying Our Offering In The Healthcare Domain.
clinical_deidentification
->ar
summarizer_clinical_laymen_pipeline
ner_demographic_extended_healthcare
ner_sdoh
ner_vop_anatomy
ner_vop_anatomy_emb_clinical_medium
ner_vop_anatomy_emb_clinical_large
ner_vop_anatomy_pipeline
ner_vop_clinical_dept
ner_vop_clinical_dept_emb_clinical_medium
ner_vop_clinical_dept_emb_clinical_large
ner_vop_clinical_dept_pipeline
ner_vop_demographic
ner_vop_demographic_emb_clinical_medium
ner_vop_demographic_emb_clinical_large
ner_vop_demographic_pipeline
ner_vop_problem
ner_vop_problem_emb_clinical_medium
ner_vop_problem_emb_clinical_large
ner_vop_problem_pipeline
ner_vop_test
ner_vop_test_emb_clinical_medium
ner_vop_test_emb_clinical_large
ner_vop_test_pipeline
ner_vop_temporal
ner_vop_temporal_emb_clinical_medium
ner_vop_temporal_emb_clinical_large_final
ner_vop_temporal_pipeline
ner_vop_treatment
ner_vop_treatment_emb_clinical_medium
ner_vop_treatment_emb_clinical_large
ner_vop_treatment_pipeline
ner_vop
ner_vop_emb_clinical_medium
ner_vop_emb_clinical_large
ner_vop_pipeline
ner_vop_problem_reduced
ner_vop_problem_reduced_emb_clinical_medium
ner_vop_problem_reduced_emb_clinical_large
ner_vop_problem_reduced_pipeline
bert_sequence_classifier_vop_drug_side_effect
bert_sequence_classifier_vop_drug_side_effect_pipeline
bert_sequence_classifier_vop_hcp_consult
bert_sequence_classifier_vop_hcp_consult_pipeline
bert_sequence_classifier_vop_self_report
bert_sequence_classifier_vop_self_report_pipeline
bert_sequence_classifier_vop_sound_medical
bert_sequence_classifier_vop_sound_medical_pipeline
For all Spark NLP for Healthcare models, please check: Models Hub Page
Versions
- 5.5.1
- 5.5.0
- 5.4.1
- 5.4.0
- 5.3.3
- 5.3.2
- 5.3.1
- 5.3.0
- 5.2.1
- 5.2.0
- 5.1.4
- 5.1.3
- 5.1.2
- 5.1.1
- 5.1.0
- 5.0.2
- 5.0.1
- 5.0.0
- 4.4.4
- 4.4.3
- 4.4.2
- 4.4.1
- 4.4.0
- 4.3.2
- 4.3.1
- 4.3.0
- 4.2.8
- 4.2.4
- 4.2.3
- 4.2.2
- 4.2.1
- 4.2.0
- 4.1.0
- 4.0.2
- 4.0.0
- 3.5.3
- 3.5.2
- 3.5.1
- 3.5.0
- 3.4.2
- 3.4.1
- 3.4.0
- 3.3.4
- 3.3.2
- 3.3.1
- 3.3.0
- 3.2.3
- 3.2.2
- 3.2.1
- 3.2.0
- 3.1.3
- 3.1.2
- 3.1.1
- 3.1.0
- 3.0.3
- 3.0.2
- 3.0.1
- 3.0.0
- 2.7.6
- 2.7.5
- 2.7.4
- 2.7.3
- 2.7.2
- 2.7.1
- 2.7.0
- 2.6.2
- 2.6.0
- 2.5.5
- 2.5.3
- 2.5.2
- 2.5.0
- 2.4.6
- 2.4.5
- 2.4.2
- 2.4.1
- 2.4.0