5.1.1
Highlights
We are delighted to announce remarkable enhancements and updates in our latest release of Spark NLP for Healthcare. This release comes with the first clinical NER models in 5 new languages as well as 41 new clinical pretrained models and pipelines.
- Introducing a new State-of-The-Art
Text2SQLmodel supporting custom database schemas with single tables - 5 new clinical NER models for extracting clinical entities in the Japanese, Vietnamese, Norwegian, Danish and Swedish languages
- 4 new Arabic De-Identification NER models
- 2 new classification models for social determinants of healthcare concepts within financial and food insecurity contexts
- Introducing a new BioBERT-based drug adverse event classifier
- 18 new augmented NER models by leveraging the capabilities of the
LangTestlibrary to boost their robustness significantly - 10 new NER-based pretrained pipelines, designed to streamline NER solutions with a single line of code
- Leveraging the power of Spark NLP with AWS Glue and EMR with practical examples and support
- Various core improvements; bug fixes, enhanced overall robustness and reliability of Spark NLP for Healthcare
- ContextualParser metadata update: renaming
confidenceValuetoconfidence - Updated English
professionfaker name list
- ContextualParser metadata update: renaming
- New and updated demos
- New Clinical NER Demo for the most popular clinical NER models
- New ICD-10-CM Medicare Severity-Diagnosis Related Group Demo with new icd10cm mapper and resolver models
- Updated Multi Language Clinical NER Demo with new 5 new Japanese, Vietnamese, Norwegian, Danish, and Swedish language models
- Updated Social Determinants Ner Demo with augmented SDOH NER models
- Updated Arabic Demographics NER Demo with new
arabertandcamelbertmodels - Updated Social Determinants Classification Generic Demo updated financial and food insecurity models
- Updated Voice of Patient Demo with new assertion models
- Updated Social Determinants of Health Demo with new assertion models
- Updated VOP SIDE EFFECT CLASSIFICATION demo with new Adverse Drug Event models
- The addition and update of numerous new clinical models and pipelines continue to reinforce our offering in the healthcare domain
These enhancements will elevate your experience with Spark NLP for Healthcare, enabling more efficient, accurate, and streamlined healthcare-related natural language data analysis.
Introducing a New State-of-The-Art Text2SQL Model Supporting Custom Database Schemas with Single Tables
We are excited to introduce the new State-of-the-Art (SOTA) Large Language Model (LLM) designed to convert natural language questions into SQL queries, with support for custom database schemas containing single tables. This model has demonstrated superior performance compared to the current SOTA model (Defog’s SQLCoder) by a margin of 6 points (0.86 to 0.92) when evaluated on a novel dataset that was not included in training (specifically tailored for the clinical domain). The model is obtained by finetuning an LLM on an augmented dataset containing schemas with single tables.
Example:
query_schema = {
"medical_treatment":
["patient_id","patient_name","age","gender","diagnosis","treatment","doctor_name","hospital_name","admission_date","discharge_date"]
}
text2sql = Text2SQL.pretrained("text2sql_with_schema_single_table_augmented", "en", "clinical/models")\
.setMaxNewTokens(200)\
.setSchema(query_schema)\
.setInputCols(["document"])\
.setOutputCol("sql")
question = "What is the average age of male patients with 'Diabetes'?"
Result:
[SELECT AVG(age) FROM medical_treatment WHERE gender = 'male' AND diagnosis = 'diabetes']
please check: Model Card and Text2SQL Generation Notebook for more information
5 New Clinical NER Models for Extracting Clinical Entities in the Japanese, Vietnamese, Norwegian, Danish, and Swedish Languages
5 new Clinical NER models provide valuable tools for processing and analyzing multi-language clinical texts. They assist in automating the extraction of important clinical information, facilitating research, medical documentation, and other applications within the multi-language healthcare domain.
| Model Name | Predicted Entities | Language |
|---|---|---|
| ner_clinical | PROBLEM TEST TREATMENT |
da |
| ner_clinical | PROBLEM TEST TREATMENT |
sv |
| ner_clinical | PROBLEM TEST TREATMENT |
no |
| ner_clinical | PROBLEM TEST TREATMENT |
ja |
| ner_clinical | PROBLEM TEST TREATMENT |
vi |
Example:
ner_model = MedicalNerModel.pretrained("ner_clinical", "sv", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
sample_text = """Patienten hade inga ytterligare klagomål och den 10 mars 2012 var hans vita blodkroppar 2,3, neutrofiler 50%, band 2%, lymfocyter 5% , monocyter 40% och blaster 1%. instruktioner i 250 ml långsam IV-infusion över en timme."""
Result:
| chunk | begin | end | ner_label |
|---|---|---|---|
| ytterligare klagomål | 20 | 39 | PROBLEM |
| hans vita blodkroppar | 66 | 86 | TEST |
| neutrofiler | 93 | 103 | TEST |
| band | 110 | 113 | TEST |
| lymfocyter | 119 | 128 | TEST |
| monocyter | 135 | 143 | TEST |
| blaster | 153 | 159 | TEST |
| långsam IV-infusion | 188 | 206 | TREATMENT |
please check Multi Language Clinical NER Demo
4 New Arabic De-Identification NER Models
We’re thrilled to present our newly integrated Arabic deidentification Named Entity Recognition (NER) models, featuring two diverse approaches. The first model provides granular entity recognition with 17 entities, while the other offers a more generic approach, identifying 8 entities with AraBERT Arabic Embeddings. These models are accompanied by corresponding pretrained pipelines that can be deployed in a streamlined one-liner format.
Designed explicitly for deidentification tasks in the Arabic language, these models leverage our proprietary dataset curation and specialized augmentation methods. This expansion broadens the linguistic scope of our toolset, underscoring our commitment to providing comprehensive solutions for global healthcare NLP needs.
| NER model | predicted entities |
|---|---|
ner_deid_subentity_arabert |
PATIENT, HOSPITAL, DATE, ORGANIZATION, CITY, STREET, USERNAME, SEX, IDNUM, EMAIL, ZIP, MEDICALRECORD, PROFESSION, PHONE, COUNTRY, DOCTOR, AGE |
ner_deid_generic_arabert |
CONTACT, NAME, DATE, ID, SEX, LOCATION, PROFESSION, AGE |
ner_deid_subentity_camelbert |
PATIENT, HOSPITAL, DATE, ORGANIZATION, CITY, STREET, USERNAME, SEX, IDNUM, EMAIL, ZIP, MEDICALRECORD, PROFESSION, PHONE, COUNTRY, DOCTOR, AGE |
ner_deid_generic_camelbert |
CONTACT, NAME, DATE, ID, SEX, LOCATION, PROFESSION, AGE |
Example:
embeddings = BertEmbeddings.pretrained("bert_embeddings_bert_base_arabert","ar") \
.setInputCols(["document", "token"]) \
.setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_deid_subentity_arabert", "ar", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
text = """
عالج الدكتور محمد المريض أحمد البالغ من العمر 55 سنة في 15/05/2000 في مستشفى مدينة الرباط. رقم هاتفه هو 0610948234 وبريده الإلكتروني
abcd@gmail.com.
"""
Result:
| chunk | ner_label |
|---|---|
| الدكتور محمد المريض | DOCTOR |
| 55 سنة | AGE |
| 15/05/2000 | DATE |
| مستشفى مدينة الرباط | HOSPITAL |
| abcd@gmail.com |
please check Arabic Ner Demographics Demo
2 New Classification Models for Healthcare Social Determinants of Healthcare Concepts within Financial and Food Insecurity Contexts
Introducing two cutting-edge classification models tailored to address critical social determinants of healthcare: financial and food insecurity. These models, genericclassifier_sdoh_financial_insecurity_mpnet and genericclassifier_sdoh_food_insecurity_mpnet have been meticulously designed to categorize healthcare-related text into key classifications.
Example:
embeddings = MPNetEmbeddings.pretrained("mpnet_embedding_nli_mpnet_base_v2", "en")\
.setInputCols(["document"])\
.setOutputCol("sentence_embeddings")\
features_asm = FeaturesAssembler()\
.setInputCols(["sentence_embeddings"])\
.setOutputCol("features")
gen_clf = GenericClassifierModel.pretrained("genericclassifier_sdoh_financial_insecurity_mpnet", "en", "clinical/models")\
.setInputCols("features")\
.setOutputCol("prediction")\
text_list = [
"Patient B is a 40-year-old female who was diagnosed with breast cancer. She has received a treatment plan that includes surgery, chemotherapy, and radiation therapy.",
"The patient a 35-year-old woman, visited her healthcare provider with concerns about her health. She bravely shared that she was facing financial difficultie, which was affecting her ability to afford necessary medical care and prescriptions. The caring healthcare provider listened attentively and discussed various options. They helped Sarah explore low-cost alternatives for her medications and connected her with local resources that could assist with healthcare expenses. By addressing the financial aspect, Sarah's healthcare provider ensured that she could receive the care she needed without further straining her finances. Leaving the appointment, Sarah felt relieved and grateful for the support in managing her health amidst her financial challenges."
]
Result:
| text | result |
|---|---|
| Patient B is a 40-year-old female who was diagnosed with breast cancer. She h… | No_Financial_Insecurity_Or_Unknown |
| The patient a 35-year-old woman, visited her healthcare provider with concern… | Financial_Insecurity |
Please check Social Determinant Classification Generic Demo
Introducing a New BioBERT-Based Drug Adverse Event Classifier
This bert_sequence_classifier_vop_adverse_event model is specialized for analyzing adverse events related to drugs in health documents. Trained on in-house annotated health text, it classifies text into two categories:
True: Signifying the presence of unfavorable, unintended, or harmful signs or symptoms in patients receiving pharmaceutical products or medical devices.
False: Denoting the absence of unfavorable experiences during the course of treatment.
Example:
sequenceClassifier = MedicalBertForSequenceClassification.pretrained("bert_sequence_classifier_vop_adverse_event", "en", "clinical/models")\
.setInputCols(["document", "token"])\
.setOutputCol("prediction")
text_list = [
"I am taking this medication once a day for the last 3 days. I am feeling very bad, pressure on my head, some chest pain, cramps on my neck and feel very weird. I want to reduce my blood pressure naturally. Can I stop this medication? I only took it for 5 days. I was reading here, that a lot of people has been losing weight and exercise and now they have a normal blood pressure. Please let me know, what I can do. The sides effects are horrible",
"I go the pub about 3-4 times a week and drink quite a bit. I like socialising, been doing so for years now.Recently been getting this occasional pain from the liver area (under right ribs).It comes and goes. Could this be a sign of liver damage?When i get this pain i am usually in the pub drinking.If i press the area under my right rib cage about half way across i can feel pain. Is that pain in the Liver?"
]
Result:
| text | result |
|---|---|
| I am taking this medication once a day for the last 3 days. I am feeling very bad, pressure on my head, some chest pain, cramps on my neck and feel… | True |
| I go the pub about 3-4 times a week and drink quite a bit. I like socialising, been doing so for years now.Recently been getting this occasional pa… | False |
Please check VOP SIDE EFFECT CLASSIFICATION demo
18 New Augmented NER Models by Leveraging the Capabilities of the LangTest Library to Boost Their Robustness Significantly
Newly introduced augmented NER models are powered by the innovative LangTest library. This cutting-edge NLP toolkit is at the forefront of language processing advancements, incorporating state-of-the-art techniques and algorithms to enhance the capabilities of our models significantly.
| Model Name | Predicted Entities |
|---|---|
ner_vop_anatomy_langtest |
BodyPart, Laterality |
ner_vop_clinical_dept_langtest |
AdmissionDischarge, ClinicalDept, MedicalDevice |
ner_vop_demographic_langtest |
Gender, Employment, RaceEthnicity, Age, Substance, RelationshipStatus, SubstanceQuantity |
ner_vop_problem_langtest |
PsychologicalCondition, Disease, Symptom, HealthStatus, Modifier, InjuryOrPoisoning |
ner_vop_problem_reduced_langtest |
Problem, HealthStatus, Modifier |
ner_vop_temporal_langtest |
DateTime, Duration, Frequency |
ner_vop_test_langtest |
VitalTest, Test, Measurements, TestResult |
ner_vop_treatment_langtest |
Drug, Form, Dosage, Frequency, Route, Duration, Procedure, Treatment |
ner_oncology_unspecific_posology_langtest |
Cancer_Therapy, Posology_Information |
ner_oncology_therapy_langtest |
Cancer_Surgery, Chemotherapy, Dosage, Hormonal_Therapy, Immunotherapy, Line_Of_Therapy, Radiotherapy, Radiation_Dose, Response_To_Treatment, Targeted_Therapy, Unspecific_Therapy … |
ner_oncology_tnm_langtest |
Cancer_Dx, Lymph_Node, Lymph_Node_Modifier, Metastasis, Staging, Tumor, Tumor_Description |
ner_oncology_test_langtest |
Biomarker, Biomarker_Result, Imaging_Test, Oncogene, Pathology_Test |
ner_oncology_diagnosis_langtest |
Adenopathy, Cancer_Dx, Cancer_Score, Grade, Histological_Type, Invasion, Metastasis, Pathology_Result, Performance_Status, Staging, Tumor_Finding, Tumor_Size |
ner_oncology_biomarker_langtest |
Biomarker, Biomarker_Result |
ner_eu_clinical_condition_langtest |
clinical_condition |
-
These models are strengthened against various perturbations (lowercase, uppercase, title case, punctuation removal, etc.).
-
The table below shows the robustness of overall test results for 15 different models.
| model names | original robustness | new robustness |
|---|---|---|
| ner_clinical_langtest | 71.72% | 84.75% |
| ner_deid_subentity_augmented_langtest | 95.78% | 97.73% |
| ner_deid_generic_augmented_langtest | 95.09% | 97.13% |
| ner_vop_anatomy_langtest | 79.87% | 89.70% |
| ner_vop_clinical_dept_langtest | 67.99% | 84.43% |
| ner_vop_demographic_langtest | 74.84% | 91.34% |
| ner_vop_problem_langtest | 62.17% | 81.63% |
| ner_vop_problem_reduced_langtest | 74.89% | 84.75% |
| ner_vop_temporal_langtest | 67.76% | 83.73% |
| ner_vop_test_langtest | 61.52% | 81.86% |
| ner_vop_treatment_langtest | 69.58% | 84.33% |
| ner_oncology_unspecific_posology_langtest | 63.69% | 87.47% |
| ner_oncology_therapy_langtest | 62.03% | 86.15% |
| ner_oncology_tnm_langtest | 81.22% | 90.33% |
| ner_oncology_test_langtest | 82.13% | 91.72% |
| ner_oncology_diagnosis_langtest | 72.44% | 83.98% |
| ner_oncology_biomarker_langtest | 93.79% | 95.28% |
| ner_eu_clinical_condition_langtest | 86.08% | 91.68% |
10 New NER-based Pretrained Pipelines, Designed to Streamline Solutions with a Single Line of Code
We have 10 new named entity recognition pipelines that are meticulously designed to enhance your solutions by efficiently identifying entities and their resolutions within the clinical note. You can easily integrate this advanced capability using just a single line of code.
| Model Name | Predicted Entities |
|---|---|
| ner_posology_langtest_pipeline | DOSAGE, DRUG, DURATION, FORM, FREQUENCY, ROUTE, STRENGTH |
| ner_ade_clinical_langtest_pipeline | DRUG, ADE |
| ner_events_clinical_langtest_pipeline | DATE, TIME, PROBLEM, TEST, TREATMENT, OCCURENCE, CLINICAL_DEPT, EVIDENTIAL, DURATION, FREQUENCY, ADMISSION, DISCHARGE |
| ner_jsl_langtest_pipeline | Hyperlipidemia, BMI, Kidney_Disease, Oncological, Heart_Disease, Obesity, Symptom, Treatment, Substance, Allergen, Diabetes, Modifier, Hypertension … |
| ner_oncology_anatomy_general_langtest_pipeline | Anatomical_Site, Direction |
| ner_oncology_anatomy_granular_langtest_pipeline | Direction, Site_Bone, Site_Brain, Site_Breast, Site_Liver, Site_Lung, Site_Lymph_Node, Site_Other_Body_Part |
| ner_oncology_demographics_langtest_pipeline | Age, Gender, Race_Ethnicity, Smoking_Status |
| ner_oncology_posology_langtest_pipeline | Cancer_Surgery, Cancer_Therapy, Cycle_Count, Cycle_Day, Cycle_Number, Dosage, Duration, Frequency, Radiotherapy, Radiation_Dose, Rout |
| ner_oncology_response_to_treatment_langtest_pipeline | Line_Of_Therapy, Response_To_Treatment, Size_Trend |
| ner_sdoh_langtest_pipeline | Alcohol, Disability, Food_Insecurity, Housing, Income, Insurance_Status, Mental_Health, Obesity, Smoking, Social_Support, Substance_Use, Violence_Or_Abuse … |
Example:
from sparknlp.pretrained import PretrainedPipeline
ner_pipeline = PretrainedPipeline("ner_oncology_anatomy_granular_langtest_pipeline", "en", "clinical/models")
text = """The patient presented a mass in her left breast, and a possible metastasis in her lungs and in her liver."""
Result:
| chunks | begin | end | entities |
|---|---|---|---|
| left | 36 | 39 | Direction |
| breast | 41 | 46 | Site_Breast |
| lungs | 82 | 86 | Site_Lung |
| liver | 99 | 103 | Site_Liver |
Leveraging the Power of SparkNLP with AWS Glue and EMR with Practical Examples and Support
Explore the seamless integration of SparkNLP with AWS Glue and EMR notebooks in this comprehensive guide. Discover how SparkNLP, a cutting-edge natural language processing library, can supercharge your data processing and analysis workflows on AWS. With step-by-step examples, learn how to harness the combined capabilities of Healthcare SparkNLP and AWS services to unlock new insights from your medical data. Whether you’re a data engineer, data scientist, or NLP enthusiast, this resource will empower you to leverage the full potential of SparkNLP within the AWS ecosystem.
Various Core Improvements; Bug Fixes, Enhanced Overall Robustness and Reliability of Spark NLP for Healthcare
- ContextualParser Metadata Update: Renaming
confidenceValuetoconfidence - Updated English Profession Faker Name List
Updated Notebooks And Demonstrations For making Spark NLP For Healthcare Easier To Navigate And Understand
- New Clinical NER Demo for the most known NER models
- New ICD-10-CM Medicare Severity-Diagnosis Related Group Demo with new icd10cm mapper and resolver models
- Updated Multi Language Clinical NER Demo with new 5 new Japanese, Vietnamese, Norwegian, Danish, and Swedish language models
- Updated Social Determinants Ner Demo with augmented SDOH NER models
- Updated Arabic Demographics NER Demo with new arabert and camelbert models
- Updated Social Determinants Classification Generic Demo updated financial and food insecurity models
- Updated Voice of Patient Demo with new assertion models
- Updated Social Determinants of Health Demo with new assertion models
- Updated VOP SIDE EFFECT CLASSIFICATION demo with new Adverse Drug Event models
We Have Added And Updated A Substantial Number Of New Clinical Models And Pipelines, Further Solidifying Our Offering In The Healthcare Domain.
text2sql_with_schema_single_table_augmentedner_clinical->daner_clinical->nvner_clinical->noner_clinical->janer_clinical->viner_deid_generic_arabert->arner_deid_generic_camelbert->arner_deid_subentity_arabert->arner_deid_subentity_camelbert->arbert_sequence_classifier_vop_adverse_eventgenericclassifier_sdoh_food_insecurity_mpnetgenericclassifier_sdoh_financial_insecurity_mpnetner_posology_langtest_pipelinener_ade_clinical_langtest_pipelinener_events_clinical_langtest_pipelinener_jsl_langtest_pipelinener_oncology_anatomy_general_langtest_pipelinener_oncology_anatomy_granular_langtest_pipelinener_oncology_demographics_langtest_pipelinener_oncology_posology_langtest_pipelinener_oncology_response_to_treatment_langtest_pipelinener_sdoh_langtest_pipelinener_clinical_langtestner_deid_subentity_augmented_langtestner_deid_generic_augmented_langtestner_vop_anatomy_langtestner_vop_clinical_dept_langtestner_vop_demographic_langtestner_vop_problem_langtestner_vop_problem_reduced_langtestner_vop_temporal_langtestner_vop_test_langtestner_vop_treatment_langtestner_oncology_unspecific_posology_langtestner_oncology_therapy_langtestner_oncology_tnm_langtestner_oncology_test_langtestner_oncology_diagnosis_langtestner_oncology_biomarker_langtestner_eu_clinical_condition_langtest
For all Spark NLP for Healthcare models, please check: Models Hub Page
Versions
- 6.1.1
- 6.1.0
- 6.0.4
- 6.0.3
- 6.0.2
- 6.0.1
- 6.0.0
- 5.5.3
- 5.5.2
- 5.5.1
- 5.5.0
- 5.4.1
- 5.4.0
- 5.3.3
- 5.3.2
- 5.3.1
- 5.3.0
- 5.2.1
- 5.2.0
- 5.1.4
- 5.1.3
- 5.1.2
- 5.1.1
- 5.1.0
- 5.0.2
- 5.0.1
- 5.0.0
- 4.4.4
- 4.4.3
- 4.4.2
- 4.4.1
- 4.4.0
- 4.3.2
- 4.3.1
- 4.3.0
- 4.2.8
- 4.2.4
- 4.2.3
- 4.2.2
- 4.2.1
- 4.2.0
- 4.1.0
- 4.0.2
- 4.0.0
- 3.5.3
- 3.5.2
- 3.5.1
- 3.5.0
- 3.4.2
- 3.4.1
- 3.4.0
- 3.3.4
- 3.3.2
- 3.3.1
- 3.3.0
- 3.2.3
- 3.2.2
- 3.2.1
- 3.2.0
- 3.1.3
- 3.1.2
- 3.1.1
- 3.1.0
- 3.0.3
- 3.0.2
- 3.0.1
- 3.0.0
- 2.7.6
- 2.7.5
- 2.7.4
- 2.7.3
- 2.7.2
- 2.7.1
- 2.7.0
- 2.6.2
- 2.6.0
- 2.5.5
- 2.5.3
- 2.5.2
- 2.5.0
- 2.4.6
- 2.4.5
- 2.4.2
- 2.4.1
- 2.4.0