5.1.1
Highlights
We are delighted to announce remarkable enhancements and updates in our latest release of Spark NLP for Healthcare. This release comes with the first clinical NER models in 5 new languages as well as 41 new clinical pretrained models and pipelines.
- Introducing a new State-of-The-Art
Text2SQL
model supporting custom database schemas with single tables - 5 new clinical NER models for extracting clinical entities in the Japanese, Vietnamese, Norwegian, Danish and Swedish languages
- 4 new Arabic De-Identification NER models
- 2 new classification models for social determinants of healthcare concepts within financial and food insecurity contexts
- Introducing a new BioBERT-based drug adverse event classifier
- 18 new augmented NER models by leveraging the capabilities of the
LangTest
library to boost their robustness significantly - 10 new NER-based pretrained pipelines, designed to streamline NER solutions with a single line of code
- Leveraging the power of Spark NLP with AWS Glue and EMR with practical examples and support
- Various core improvements; bug fixes, enhanced overall robustness and reliability of Spark NLP for Healthcare
- ContextualParser metadata update: renaming
confidenceValue
toconfidence
- Updated English
profession
faker name list
- ContextualParser metadata update: renaming
- New and updated demos
- New Clinical NER Demo for the most popular clinical NER models
- New ICD-10-CM Medicare Severity-Diagnosis Related Group Demo with new icd10cm mapper and resolver models
- Updated Multi Language Clinical NER Demo with new 5 new Japanese, Vietnamese, Norwegian, Danish, and Swedish language models
- Updated Social Determinants Ner Demo with augmented SDOH NER models
- Updated Arabic Demographics NER Demo with new
arabert
andcamelbert
models - Updated Social Determinants Classification Generic Demo updated financial and food insecurity models
- Updated Voice of Patient Demo with new assertion models
- Updated Social Determinants of Health Demo with new assertion models
- Updated VOP SIDE EFFECT CLASSIFICATION demo with new Adverse Drug Event models
- The addition and update of numerous new clinical models and pipelines continue to reinforce our offering in the healthcare domain
These enhancements will elevate your experience with Spark NLP for Healthcare, enabling more efficient, accurate, and streamlined healthcare-related natural language data analysis.
Introducing a New State-of-The-Art Text2SQL Model Supporting Custom Database Schemas with Single Tables
We are excited to introduce the new State-of-the-Art (SOTA) Large Language Model (LLM) designed to convert natural language questions into SQL queries, with support for custom database schemas containing single tables. This model has demonstrated superior performance compared to the current SOTA model (Defog’s SQLCoder) by a margin of 6 points (0.86 to 0.92) when evaluated on a novel dataset that was not included in training (specifically tailored for the clinical domain). The model is obtained by finetuning an LLM on an augmented dataset containing schemas with single tables.
Example:
query_schema = {
"medical_treatment":
["patient_id","patient_name","age","gender","diagnosis","treatment","doctor_name","hospital_name","admission_date","discharge_date"]
}
text2sql = Text2SQL.pretrained("text2sql_with_schema_single_table_augmented", "en", "clinical/models")\
.setMaxNewTokens(200)\
.setSchema(query_schema)\
.setInputCols(["document"])\
.setOutputCol("sql")
question = "What is the average age of male patients with 'Diabetes'?"
Result:
[SELECT AVG(age) FROM medical_treatment WHERE gender = 'male' AND diagnosis = 'diabetes']
please check: Model Card and Text2SQL Generation Notebook for more information
5 New Clinical NER Models for Extracting Clinical Entities in the Japanese, Vietnamese, Norwegian, Danish, and Swedish Languages
5 new Clinical NER models provide valuable tools for processing and analyzing multi-language clinical texts. They assist in automating the extraction of important clinical information, facilitating research, medical documentation, and other applications within the multi-language healthcare domain.
Model Name | Predicted Entities | Language |
---|---|---|
ner_clinical | PROBLEM TEST TREATMENT |
da |
ner_clinical | PROBLEM TEST TREATMENT |
sv |
ner_clinical | PROBLEM TEST TREATMENT |
no |
ner_clinical | PROBLEM TEST TREATMENT |
ja |
ner_clinical | PROBLEM TEST TREATMENT |
vi |
Example:
ner_model = MedicalNerModel.pretrained("ner_clinical", "sv", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
sample_text = """Patienten hade inga ytterligare klagomål och den 10 mars 2012 var hans vita blodkroppar 2,3, neutrofiler 50%, band 2%, lymfocyter 5% , monocyter 40% och blaster 1%. instruktioner i 250 ml långsam IV-infusion över en timme."""
Result:
chunk | begin | end | ner_label |
---|---|---|---|
ytterligare klagomål | 20 | 39 | PROBLEM |
hans vita blodkroppar | 66 | 86 | TEST |
neutrofiler | 93 | 103 | TEST |
band | 110 | 113 | TEST |
lymfocyter | 119 | 128 | TEST |
monocyter | 135 | 143 | TEST |
blaster | 153 | 159 | TEST |
långsam IV-infusion | 188 | 206 | TREATMENT |
please check Multi Language Clinical NER Demo
4 New Arabic De-Identification NER Models
We’re thrilled to present our newly integrated Arabic deidentification Named Entity Recognition (NER) models, featuring two diverse approaches. The first model provides granular entity recognition with 17 entities, while the other offers a more generic approach, identifying 8 entities with AraBERT Arabic Embeddings. These models are accompanied by corresponding pretrained pipelines that can be deployed in a streamlined one-liner format.
Designed explicitly for deidentification tasks in the Arabic language, these models leverage our proprietary dataset curation and specialized augmentation methods. This expansion broadens the linguistic scope of our toolset, underscoring our commitment to providing comprehensive solutions for global healthcare NLP needs.
NER model | predicted entities |
---|---|
ner_deid_subentity_arabert |
PATIENT , HOSPITAL , DATE , ORGANIZATION , CITY , STREET , USERNAME , SEX , IDNUM , EMAIL , ZIP , MEDICALRECORD , PROFESSION , PHONE , COUNTRY , DOCTOR , AGE |
ner_deid_generic_arabert |
CONTACT , NAME , DATE , ID , SEX , LOCATION , PROFESSION , AGE |
ner_deid_subentity_camelbert |
PATIENT , HOSPITAL , DATE , ORGANIZATION , CITY , STREET , USERNAME , SEX , IDNUM , EMAIL , ZIP , MEDICALRECORD , PROFESSION , PHONE , COUNTRY , DOCTOR , AGE |
ner_deid_generic_camelbert |
CONTACT , NAME , DATE , ID , SEX , LOCATION , PROFESSION , AGE |
Example:
embeddings = BertEmbeddings.pretrained("bert_embeddings_bert_base_arabert","ar") \
.setInputCols(["document", "token"]) \
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_deid_subentity_arabert", "ar", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
text = """
عالج الدكتور محمد المريض أحمد البالغ من العمر 55 سنة في 15/05/2000 في مستشفى مدينة الرباط. رقم هاتفه هو 0610948234 وبريده الإلكتروني
abcd@gmail.com.
"""
Result:
chunk | ner_label |
---|---|
الدكتور محمد المريض | DOCTOR |
55 سنة | AGE |
15/05/2000 | DATE |
مستشفى مدينة الرباط | HOSPITAL |
abcd@gmail.com |
please check Arabic Ner Demographics Demo
2 New Classification Models for Healthcare Social Determinants of Healthcare Concepts within Financial and Food Insecurity Contexts
Introducing two cutting-edge classification models tailored to address critical social determinants of healthcare: financial and food insecurity. These models, genericclassifier_sdoh_financial_insecurity_mpnet and genericclassifier_sdoh_food_insecurity_mpnet have been meticulously designed to categorize healthcare-related text into key classifications.
Example:
embeddings = MPNetEmbeddings.pretrained("mpnet_embedding_nli_mpnet_base_v2", "en")\
.setInputCols(["document"])\
.setOutputCol("sentence_embeddings")\
features_asm = FeaturesAssembler()\
.setInputCols(["sentence_embeddings"])\
.setOutputCol("features")
gen_clf = GenericClassifierModel.pretrained("genericclassifier_sdoh_financial_insecurity_mpnet", "en", "clinical/models")\
.setInputCols("features")\
.setOutputCol("prediction")\
text_list = [
"Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatment plan that includes surgery, chemotherapy, and radiation therapy.",
"The patient a 35-year-old woman, visited her healthcare provider with concerns about her health. She bravely shared that she was facing financial difficultie, which was affecting her ability to afford necessary medical care and prescriptions. The caring healthcare provider listened attentively and discussed various options. They helped Sarah explore low-cost alternatives for her medications and connected her with local resources that could assist with healthcare expenses. By addressing the financial aspect, Sarah's healthcare provider ensured that she could receive the care she needed without further straining her finances. Leaving the appointment, Sarah felt relieved and grateful for the support in managing her health amidst her financial challenges."
]
Result:
text | result |
---|---|
Patient B is a 40-year-old female who was diagnosed with breast cancer. She h… | No_Financial_Insecurity_Or_Unknown |
The patient a 35-year-old woman, visited her healthcare provider with concern… | Financial_Insecurity |
Please check Social Determinant Classification Generic Demo
Introducing a New BioBERT-Based Drug Adverse Event Classifier
This bert_sequence_classifier_vop_adverse_event model is specialized for analyzing adverse events related to drugs in health documents. Trained on in-house annotated health text, it classifies text into two categories:
True
: Signifying the presence of unfavorable, unintended, or harmful signs or symptoms in patients receiving pharmaceutical products or medical devices.
False
: Denoting the absence of unfavorable experiences during the course of treatment.
Example:
sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_vop_adverse_event", "en", "clinical/models")\
.setInputCols(["document", "token"])\
.setOutputCol("prediction")
text_list = [
"I am taking this medication once a day for the last 3 days. I am feeling very bad, pressure on my head, some chest pain, cramps on my neck and feel very weird. I want to reduce my blood pressure naturally. Can I stop this medication? I only took it for 5 days. I was reading here, that a lot of people has been losing weight and exercise and now they have a normal blood pressure. Please let me know, what I can do. The sides effects are horrible",
"I go the pub about 3-4 times a week and drink quite a bit. I like socialising, been doing so for years now.Recently been getting this occasional pain from the liver area (under right ribs).It comes and goes. Could this be a sign of liver damage?When i get this pain i am usually in the pub drinking.If i press the area under my right rib cage about half way across i can feel pain. Is that pain in the Liver?"
]
Result:
text | result |
---|---|
I am taking this medication once a day for the last 3 days. I am feeling very bad, pressure on my head, some chest pain, cramps on my neck and feel… | True |
I go the pub about 3-4 times a week and drink quite a bit. I like socialising, been doing so for years now.Recently been getting this occasional pa… | False |
Please check VOP SIDE EFFECT CLASSIFICATION demo
18 New Augmented NER Models by Leveraging the Capabilities of the LangTest Library to Boost Their Robustness Significantly
Newly introduced augmented NER models are powered by the innovative LangTest library. This cutting-edge NLP toolkit is at the forefront of language processing advancements, incorporating state-of-the-art techniques and algorithms to enhance the capabilities of our models significantly.
Model Name | Predicted Entities |
---|---|
ner_vop_anatomy_langtest |
BodyPart , Laterality |
ner_vop_clinical_dept_langtest |
AdmissionDischarge , ClinicalDept , MedicalDevice |
ner_vop_demographic_langtest |
Gender , Employment , RaceEthnicity , Age , Substance , RelationshipStatus , SubstanceQuantity |
ner_vop_problem_langtest |
PsychologicalCondition , Disease , Symptom , HealthStatus , Modifier , InjuryOrPoisoning |
ner_vop_problem_reduced_langtest |
Problem , HealthStatus , Modifier |
ner_vop_temporal_langtest |
DateTime , Duration , Frequency |
ner_vop_test_langtest |
VitalTest , Test , Measurements , TestResult |
ner_vop_treatment_langtest |
Drug , Form , Dosage , Frequency , Route , Duration , Procedure , Treatment |
ner_oncology_unspecific_posology_langtest |
Cancer_Therapy , Posology_Information |
ner_oncology_therapy_langtest |
Cancer_Surgery , Chemotherapy , Dosage , Hormonal_Therapy , Immunotherapy , Line_Of_Therapy , Radiotherapy , Radiation_Dose , Response_To_Treatment , Targeted_Therapy , Unspecific_Therapy … |
ner_oncology_tnm_langtest |
Cancer_Dx , Lymph_Node , Lymph_Node_Modifier , Metastasis , Staging , Tumor , Tumor_Description |
ner_oncology_test_langtest |
Biomarker , Biomarker_Result , Imaging_Test , Oncogene , Pathology_Test |
ner_oncology_diagnosis_langtest |
Adenopathy , Cancer_Dx , Cancer_Score , Grade , Histological_Type , Invasion , Metastasis , Pathology_Result , Performance_Status , Staging , Tumor_Finding , Tumor_Size |
ner_oncology_biomarker_langtest |
Biomarker , Biomarker_Result |
ner_eu_clinical_condition_langtest |
clinical_condition |
-
These models are strengthened against various perturbations (lowercase, uppercase, title case, punctuation removal, etc.).
-
The table below shows the robustness of overall test results for 15 different models.
model names | original robustness | new robustness |
---|---|---|
ner_clinical_langtest | 71.72% | 84.75% |
ner_deid_subentity_augmented_langtest | 95.78% | 97.73% |
ner_deid_generic_augmented_langtest | 95.09% | 97.13% |
ner_vop_anatomy_langtest | 79.87% | 89.70% |
ner_vop_clinical_dept_langtest | 67.99% | 84.43% |
ner_vop_demographic_langtest | 74.84% | 91.34% |
ner_vop_problem_langtest | 62.17% | 81.63% |
ner_vop_problem_reduced_langtest | 74.89% | 84.75% |
ner_vop_temporal_langtest | 67.76% | 83.73% |
ner_vop_test_langtest | 61.52% | 81.86% |
ner_vop_treatment_langtest | 69.58% | 84.33% |
ner_oncology_unspecific_posology_langtest | 63.69% | 87.47% |
ner_oncology_therapy_langtest | 62.03% | 86.15% |
ner_oncology_tnm_langtest | 81.22% | 90.33% |
ner_oncology_test_langtest | 82.13% | 91.72% |
ner_oncology_diagnosis_langtest | 72.44% | 83.98% |
ner_oncology_biomarker_langtest | 93.79% | 95.28% |
ner_eu_clinical_condition_langtest | 86.08% | 91.68% |
10 New NER-based Pretrained Pipelines, Designed to Streamline Solutions with a Single Line of Code
We have 10 new named entity recognition pipelines that are meticulously designed to enhance your solutions by efficiently identifying entities and their resolutions within the clinical note. You can easily integrate this advanced capability using just a single line of code.
Model Name | Predicted Entities |
---|---|
ner_posology_langtest_pipeline | DOSAGE, DRUG, DURATION, FORM, FREQUENCY, ROUTE, STRENGTH |
ner_ade_clinical_langtest_pipeline | DRUG , ADE |
ner_events_clinical_langtest_pipeline | DATE , TIME , PROBLEM , TEST , TREATMENT , OCCURENCE , CLINICAL_DEPT , EVIDENTIAL , DURATION , FREQUENCY , ADMISSION , DISCHARGE |
ner_jsl_langtest_pipeline | Hyperlipidemia , BMI , Kidney_Disease , Oncological , Heart_Disease , Obesity , Symptom , Treatment , Substance , Allergen , Diabetes , Modifier , Hypertension … |
ner_oncology_anatomy_general_langtest_pipeline | Anatomical_Site , Direction |
ner_oncology_anatomy_granular_langtest_pipeline | Direction , Site_Bone , Site_Brain , Site_Breast , Site_Liver , Site_Lung , Site_Lymph_Node , Site_Other_Body_Part |
ner_oncology_demographics_langtest_pipeline | Age , Gender , Race_Ethnicity , Smoking_Status |
ner_oncology_posology_langtest_pipeline | Cancer_Surgery, Cancer_Therapy, Cycle_Count, Cycle_Day, Cycle_Number, Dosage, Duration, Frequency, Radiotherapy, Radiation_Dose, Rout |
ner_oncology_response_to_treatment_langtest_pipeline | Line_Of_Therapy , Response_To_Treatment , Size_Trend |
ner_sdoh_langtest_pipeline | Alcohol , Disability , Food_Insecurity , Housing , Income , Insurance_Status , Mental_Health , Obesity , Smoking , Social_Support , Substance_Use , Violence_Or_Abuse … |
Example:
from sparknlp.pretrained import PretrainedPipeline
ner_pipeline = PretrainedPipeline("ner_oncology_anatomy_granular_langtest_pipeline", "en", "clinical/models")
text = """The patient presented a mass in her left breast, and a possible metastasis in her lungs and in her liver."""
Result:
chunks | begin | end | entities |
---|---|---|---|
left | 36 | 39 | Direction |
breast | 41 | 46 | Site_Breast |
lungs | 82 | 86 | Site_Lung |
liver | 99 | 103 | Site_Liver |
Leveraging the Power of SparkNLP with AWS Glue and EMR with Practical Examples and Support
Explore the seamless integration of SparkNLP with AWS Glue and EMR notebooks in this comprehensive guide. Discover how SparkNLP, a cutting-edge natural language processing library, can supercharge your data processing and analysis workflows on AWS. With step-by-step examples, learn how to harness the combined capabilities of Healthcare SparkNLP and AWS services to unlock new insights from your medical data. Whether you’re a data engineer, data scientist, or NLP enthusiast, this resource will empower you to leverage the full potential of SparkNLP within the AWS ecosystem.
Various Core Improvements; Bug Fixes, Enhanced Overall Robustness and Reliability of Spark NLP for Healthcare
- ContextualParser Metadata Update: Renaming
confidenceValue
toconfidence
- Updated English Profession Faker Name List
Updated Notebooks And Demonstrations For making Spark NLP For Healthcare Easier To Navigate And Understand
- New Clinical NER Demo for the most known NER models
- New ICD-10-CM Medicare Severity-Diagnosis Related Group Demo with new icd10cm mapper and resolver models
- Updated Multi Language Clinical NER Demo with new 5 new Japanese, Vietnamese, Norwegian, Danish, and Swedish language models
- Updated Social Determinants Ner Demo with augmented SDOH NER models
- Updated Arabic Demographics NER Demo with new arabert and camelbert models
- Updated Social Determinants Classification Generic Demo updated financial and food insecurity models
- Updated Voice of Patient Demo with new assertion models
- Updated Social Determinants of Health Demo with new assertion models
- Updated VOP SIDE EFFECT CLASSIFICATION demo with new Adverse Drug Event models
We Have Added And Updated A Substantial Number Of New Clinical Models And Pipelines, Further Solidifying Our Offering In The Healthcare Domain.
text2sql_with_schema_single_table_augmented
ner_clinical
->da
ner_clinical
->nv
ner_clinical
->no
ner_clinical
->ja
ner_clinical
->vi
ner_deid_generic_arabert
->ar
ner_deid_generic_camelbert
->ar
ner_deid_subentity_arabert
->ar
ner_deid_subentity_camelbert
->ar
bert_sequence_classifier_vop_adverse_event
genericclassifier_sdoh_food_insecurity_mpnet
genericclassifier_sdoh_financial_insecurity_mpnet
ner_posology_langtest_pipeline
ner_ade_clinical_langtest_pipeline
ner_events_clinical_langtest_pipeline
ner_jsl_langtest_pipeline
ner_oncology_anatomy_general_langtest_pipeline
ner_oncology_anatomy_granular_langtest_pipeline
ner_oncology_demographics_langtest_pipeline
ner_oncology_posology_langtest_pipeline
ner_oncology_response_to_treatment_langtest_pipeline
ner_sdoh_langtest_pipeline
ner_clinical_langtest
ner_deid_subentity_augmented_langtest
ner_deid_generic_augmented_langtest
ner_vop_anatomy_langtest
ner_vop_clinical_dept_langtest
ner_vop_demographic_langtest
ner_vop_problem_langtest
ner_vop_problem_reduced_langtest
ner_vop_temporal_langtest
ner_vop_test_langtest
ner_vop_treatment_langtest
ner_oncology_unspecific_posology_langtest
ner_oncology_therapy_langtest
ner_oncology_tnm_langtest
ner_oncology_test_langtest
ner_oncology_diagnosis_langtest
ner_oncology_biomarker_langtest
ner_eu_clinical_condition_langtest
For all Spark NLP for Healthcare models, please check: Models Hub Page
Versions
- 5.5.0
- 5.4.1
- 5.4.0
- 5.3.3
- 5.3.2
- 5.3.1
- 5.3.0
- 5.2.1
- 5.2.0
- 5.1.4
- 5.1.3
- 5.1.2
- 5.1.1
- 5.1.0
- 5.0.2
- 5.0.1
- 5.0.0
- 4.4.4
- 4.4.3
- 4.4.2
- 4.4.1
- 4.4.0
- 4.3.2
- 4.3.1
- 4.3.0
- 4.2.8
- 4.2.4
- 4.2.3
- 4.2.2
- 4.2.1
- 4.2.0
- 4.1.0
- 4.0.2
- 4.0.0
- 3.5.3
- 3.5.2
- 3.5.1
- 3.5.0
- 3.4.2
- 3.4.1
- 3.4.0
- 3.3.4
- 3.3.2
- 3.3.1
- 3.3.0
- 3.2.3
- 3.2.2
- 3.2.1
- 3.2.0
- 3.1.3
- 3.1.2
- 3.1.1
- 3.1.0
- 3.0.3
- 3.0.2
- 3.0.1
- 3.0.0
- 2.7.6
- 2.7.5
- 2.7.4
- 2.7.3
- 2.7.2
- 2.7.1
- 2.7.0
- 2.6.2
- 2.6.0
- 2.5.5
- 2.5.3
- 2.5.2
- 2.5.0
- 2.4.6
- 2.4.5
- 2.4.2
- 2.4.1
- 2.4.0