Spark NLP for Healthcare Release Notes

 

5.4.0

Highlights

We are delighted to announce remarkable enhancements and updates in our latest release of Spark NLP for Healthcare. This release comes with a brand new LLM Loader to load and run any size of LLMs in gguf format, a few-shot assertion classifier, contextual assertion detection, and a demo to showcase accuracy differences between Healthcare NLP and GPT-4 for information extraction tasks as well as the first Menopause-specific medical models, 81 new and updated clinical pretrained models, and pipelines.

  • Introducing a brand new LLMLoader annotator to load and run large language models in gguf format. We also announce 9 LLMs at various sizes and quantization (3x small size medical summarizer and QA model, 3x medium size general model and 3x small size zero shot entity extractor)
  • Introducing a brand new FewshotAssertionClassifier annotator to train assertion detection models using a few samples with better accuracy
  • Introducing a rule-based ContextualAssertion annotator to detect assertion status using patterns and rules without any training or annotation
  • Introducing VectorDBPostProcessor annotator to filter and sort the document splits returned by vector databases in a RAG application
  • Introducing ContextSplitAssembler annotator to assemble the document post-processed splits as a context into an LLM stage in a RAG application
  • SNOMED entity resolver model for Veterinary domains
  • Voice of the Patients named entity recognition (NER) model
  • New rule-based entity matcher models to customize De-IDentification pipelines
  • New NER, assertion, relation extraction, and classification models to identify Alcohol and Smoking related Medical Entities
  • New NER and assertion models to extract Menopause related entities
  • Clinical document analysis with one-liner pretrained pipelines for specific clinical tasks and concepts
  • Formal release of oncological assertion status detection and relation extraction models
  • 11 new fine-tuned sentence embedding models finetuned with medical assertion datasets
  • Significantly faster vector-db based entity resolution models than existing Sentence Entity Resolver models
  • RxNorm code mapping benchmarks and cost comparisons: Healthcare NLP, GPT-4, and Amazon Comprehend Medical
  • New blog posts on using NLP in opioid research and healthcare: harnessing NLP, knowledge graphs, and regex techniques for critical insights
  • New notebooks for medication and resolutions concept
  • Updated Udemy MOOC (our online courses) notebooks
  • Various core improvements; bug fixes, enhanced overall robustness and reliability of Spark NLP for Healthcare
    • Resolved broken links in healthcare demos
    • Added a unique ID field for each entity into the result of the pipeline_ouput_parser module
    • Fixed deidentification AGE obfuscation hanging issue
    • Added DatasetInfo parameter into the MedicalNERModel annotator
  • Updated notebooks and demonstrations for making Spark NLP for Healthcare easier to navigate and understand
  • The addition and update of numerous new clinical models and pipelines continue to reinforce our offering in the healthcare domain

These enhancements will elevate your experience with Spark NLP for Healthcare, enabling more efficient, accurate, and streamlined analysis of healthcare-related natural language data.

Introducing a Brand New LLMLoader Annotator to Load and Run Large Language Models in GGUF format

LLMLoader is designed to interact with a LLMs that are converted into gguf format. This module allows using John Snow Labs’ licensed LLMs at various sizes that are finetuned on medical context for certain tasks. It provides various methods for setting parameters, loading models, generating text, and retrieving metadata. The LLMLoader includes methods for setting various parameters such as input prefix, suffix, cache prompt, number of tokens to predict, sampling techniques, temperature, penalties, and more. Overall, the LLMLoader provides a flexible and extensible framework for interacting with language models in a Python and Scala environment using PySpark and Java.

Model Name Description
JSL_MedS_q16_v1 Summarization and Q&A
JSL_MedS_q8_v1 Summarization and Q&A
JSL_MedS_q4_v1 Summarization and Q&A
JSL_MedM_q16_v1 Summarization, Q&A, RAG, and Chat
JSL_MedM_q8_v1 Summarization, Q&A, RAG, and Chat
JSL_MedM_q4_v1 Summarization, Q&A, RAG, and Chat
JSL_MedSNer_ZS_q16_v1 Extract and link medical named entities
JSL_MedSNer_ZS_q8_v1 Extract and link medical named entities
JSL_MedSNer_ZS_q4_v1 Extract and link medical named entities

We recommend using 8b quantized versions of the models as the qualitative performance difference between q16 and q8 versions is very negligible.

Example:


from sparknlp_jsl.llm import LLMLoader

llm_loader_pretrained = LLMLoader(spark).pretrained("jsl_meds_q16_v1", "en", "clinical/models")

llm_loader_pretrained.generate("What is the indication for the drug Methadone?")

Result:

Methadone is used to treat opioid addiction. It is a long-acting opioid agonist that is used to help individuals who are addicted to short-acting opioids such as heroin or other illicit opioids. It is also used to treat chronic pain in patients who have developed tolerance to other opioids.

Please check the LLMLoaderNotebook for more information

Introducing a Brand New FewshotAssertionClassifier Annotator to Train Assertion Detection Models Using a Few Samples with Better Accuracy

The newly refactored FewShotAssertionClassifierModel and FewShotAssertionClassifierApproach simplify assertion annotation in clinical and biomedical texts. By leveraging sentence embeddings, these models deliver precise assertion annotations and integrate seamlessly with any SparkNLP sentence embedding model.

A key feature is the FewShotAssertionSentenceConverter, an annotator that formats documents/sentences and NER chunks for assertion classification, requiring an additional step in the pipeline.

This comprehensive approach significantly enhances the extraction, analysis, and processing of assertion-related data, making it an indispensable tool for healthcare text annotation.

The following table demonstrates the enhanced results achieved using the FewShot Assertion model compared to the traditional AssertionDL model across various datasets. The FewShot Assertion model showcases significant improvements in accuracy scores, particularly in complex medical domains.

Dataset Name AssertionDL FewShot Assertion
radiology 0.91 0.93
i2b2 0.86 0.93
oncology 0.55 0.90
jsl_augmented 0.85 0.90
smoking 0.67 0.96
sdoh 0.76 0.85

Annotator to Train Assertion Detection Models

FewShot Assertion Model Name Predicted Classed
fewhot_assertion_jsl_e5_base_v2_jsl Present, Absent, Possible, Planned, Past, Family, Hypothetical, SomeoneElse
fewhot_assertion_e5_base_v2 absent, associated_with_someone_else, conditional, hypothetical, possible, present
fewhot_assertion_sdoh_e5_base_v2_sdoh Absent, Past, Present, Someone_Else, Hypothetical, Possible
fewhot_assertion_smoking_e5_base_v2_smoking Present, Absent, Past
fewhot_assertion_oncology_e5_base_v2_oncology Absent, Past, Present, Family, Hypothetical, Possible
fewhot_assertion_radiology_e5_base_v2_radiology Confirmed, Negative, Suspected

Example:

few_shot_assertion_converter = FewShotAssertionSentenceConverter()\
    .setInputCols(["sentence","token", "ner_jsl_chunk"])\
    .setOutputCol("assertion_sentence")

e5_embeddings = E5Embeddings.pretrained("e5_base_v2_embeddings_medical_assertion_oncology", "en", "clinical/models")\
    .setInputCols(["assertion_sentence"])\
    .setOutputCol("assertion_embedding")

few_shot_assertion_classifier = FewShotAssertionClassifierModel()\
    .pretrained("fewhot_assertion_oncology_e5_base_v2_oncology", "en", "clinical/models")\
    .setInputCols(["assertion_embedding"])\
    .setOutputCol("assertion")

sample_text= """The patient is suspected to have colorectal cancer. Her family history is positive for other cancers. The result of the biopsy was positive. A CT scan was ordered to rule out metastases."""

Result:

  chunks begin end entities assertion confidence
0 colorectal cancer 33 49 Cancer_Dx Possible 0.581282
1 cancers 93 99 Cancer_Dx Family 0.234656
2 biopsy 120 125 Pathology_Test Past 0.957321
3 positive 131 138 Pathology_Result Present 0.956439
4 CT scan 143 149 Imaging_Test Past 0.95717
5 metastases 175 184 Metastasis Possible 0.549866

Please check the FewShot Assertion Classifier Notebook for more information

Introducing a Rule-Based ContextualAssertion Annotator to Detect Assertion Status Using Patterns and Rules without any Training or Annotation

Introducing Contextual Assertion which identifies contextual cues within text data, such as negation, uncertainty, etc. It is used for clinical assertion detection, etc. It annotates text chunks with assertions based on configurable rules, prefix and suffix patterns, and exception patterns.

  • Dataset: 253 Clinical Texts from in-house dataset
Assertion Label Contextual Assertion AssertionDL
Absent 0.88 0.78
Past 0.77 0.65
  • Dataset: Used in-house jsl_augmented dataset
Assertion Label Contextual Assertion AssertionDL
Absent 0.82 0.90
Family 0.63 0.73
Hypothetical 0.51 0.69
Past 0.73 0.77
Planned 0.57 0.62
Possible 0.49 0.74
SomeoneElse 0.61 0.81

Contextual Assertion, a powerful component within Spark NLP, extends beyond mere negation detection. Its ability to identify and classify a diverse range of contextual cues, including uncertainty, temporality, and sentiment, empowers healthcare professionals to extract deeper meaning from complex medical records.

Model Name Description
contextual_assertion_someone_else Identifies contextual cues within text data to detect someone else assertions
contextual_assertion_absent Identifies contextual cues within text data to detect absent assertions
contextual_assertion_past Identifies contextual cues within text data to detect past assertions

Example:

contextual_assertion = ContextualAssertion() \
    .setInputCols(["sentence", "token", "ner_chunk"]) \
    .setOutputCol("assertion") \
    .setPrefixKeywords(["no", "not"]) \
    .setSuffixKeywords(["unlikely", "negative", "no"]) \
    .setPrefixRegexPatterns(["\\b(no|without|denies|never|none|free of|not include)\\b"]) \
    .setSuffixRegexPatterns(["\\b(free of|negative for|absence of|not|rule out)\\b"]) \
    .setExceptionKeywords(["without"]) \
    .setExceptionRegexPatterns(["\\b(not clearly)\\b"]) \
    .addPrefixKeywords(["negative for", "negative"]) \
    .addSuffixKeywords(["absent", "neither"]) \
    .setCaseSensitive(False) \
    .setPrefixAndSuffixMatch(False) \
    .setAssertion("absent") \
    .setScopeWindow([2, 2])\
    .setIncludeChunkToScope(True)\

example_text = """Patient resting in bed. Patient given azithromycin without any difficulty. Patient has audible wheezing, states chest tightness.
No evidence of hypertension. Patient denies nausea at this time. zofran declined. Patient is also having intermittent sweating associated with pneumonia.
"""

Result:

ner_chunk begin end ner_label Assertion
any difficulty 59 72 PROBLEM absent
hypertension 149 160 PROBLEM absent
nausea 178 183 PROBLEM absent
zofran 199 204 TREATMENT absent

Please check the Contextual Assertion Notebook for more information

Introducing VectorDBPostProcessor Annotator to Filter and Sort the Document Splits Returned by VectorDB in a RAG Application

The VectorDBPostProcessor is a powerful tool designed to filter and sort output from the VectorDBModel (our own VectorDB implementations will be released soon). This processor refines VECTOR_SIMILARITY_RANKINGS input annotations and outputs enhanced VECTOR_SIMILARITY_RANKINGS annotations based on specified criteria.

Key Parameters:

  • filterBy (str): Select and prioritize filter options (metadata, diversity_by_threshold). Options can be given as a comma-separated string, determining the filtering order. Default: metadata
  • sortBy (str): Select sorting option (ascending, descending, lost_in_the_middle, diversity). Default: ascending
  • caseSensitive (bool): Determines if string operators’ criteria are case-sensitive. Default: False
  • diversityThreshold (float): Sets the threshold for the diversity_by_threshold filter. Default: 0.01
  • maxTopKAfterFiltering (int): Limits the number of annotations returned after filtering. Default: 20
  • allowZeroContentAfterFiltering (bool): Determines whether zero annotations are allowed after filtering. Default: False

This processor ensures precise and customizable annotation management, making it an essential component for advanced data processing workflows.

Example:

post_processor = VectorDBPostProcessor() \
    .setInputCols("vector_db") \
    .setOutputCol("post") \
    .setSortBy("ascending")
    .setMaxTopKAfterFiltering(5)
    .setFilterBy("metadata") \
    .setMetadataCriteria([
        {"field": "pubdate", "fieldType": "date", "operator": "greater_than", "value": "2017 May 11", "dateFormats": ["yyyy MMM dd", "yyyy MMM d"], "converterFallback": "filter"},
        {"field": "distance", "fieldType": "float", "operator": "less_than", "value": "0.5470"},
        {"field": "title", "fieldType": "string", "operator": "contains", "matchMode": "any", "matchValues": ["diabetes", "immune system"]}
      ])

Please check the VectorDB and PostProcessor for RAG Generative AI Notebook for more information

Introducing ContextSplitAssembler Annotator to Assemble the Document Post-processed Splits as a Context into an LLM Stage in a RAG Application

The ContextSplitAssembler is a versatile tool designed to work seamlessly with vector databases (our own VectorDB implementations will be released soon) and VectorDBPostProcessor. It combines and organizes annotation results with customizable delimiters and optional splitting.

Key Parameters:

  • joinString (str): Specifies the delimiter string inserted between annotations when combining them into a single result. Ensures proper separation and organization. Default: “ “
  • explodeSplits (bool): Determines whether to split the annotations into separate entries. Default: False

This assembler enhances the management and presentation of annotations, making it an essential tool for advanced data processing workflows.

Example:

context_split_assembler = ( ContextSplitAssembler()
  .setInputCols("vector_db")
  .setOutputCol("document")
  .setJoinString("\n")
  .setExplodeSplits(False))

Please check the VectorDB and PostProcessor for RAG Generative AI Notebook for more information

SNOMED Entity Resolver Model for Veterinary Domains

This advanced model facilitates the mapping of veterinary-related entities and concepts to SNOMED codes using sbiobert_base_cased_mli Sentence BERT embeddings. It is trained with an enhanced dataset derived from the sbiobertresolve_snomed_veterinary_wip model. The model ensures precise and reliable resolution of veterinary terms to standardized SNOMED codes, aiding in consistent and comprehensive veterinary data documentation and analysis.

Example:

snomed_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_snomed_veterinary", "en", "clinical/models") \
    .setInputCols(["sentence_embeddings"]) \
    .setOutputCol("snomed_code")\
    .setDistanceFunction("EUCLIDEAN")
text = "The veterinary team is closely monitoring the patient for signs of lymphoblastic lymphoma, a malignant neoplasm of lymphoid origin. They are also treating the patient's osteoarthritis, a degenerative joint disease. Additionally, the team is vigilantly observing the facility for potential outbreaks of mink distemper."

Result:

ner_chunk entity snomed_code description
lymphoblastic lymphoma PROBLEM 312281000009102 lymphoblastic lymphoma
a malignant neoplasm of lymphoid origin PROBLEM 443495005 neoplasm of lymphoid system structure
the patient’s osteoarthritis PROBLEM 201826000 erosive osteoarthrosis
a degenerative joint disease PROBLEM 201819000 degenerative joint disease involving multiple joints
mink distemper PROBLEM 348361000009108 mink distemper

Please check the model card for more information

Voice of the Patients Named Entity Recognition (NER) Model

The Voice of the Patients NER Model is designed to extract healthcare-related terms from patient-generated documents. This model processes the natural language used by patients to identify and categorize medical terms, facilitating better understanding and documentation of patient-reported information.

Example:

ner_model = MedicalNerModel.pretrained("ner_vop_v2", "en", "clinical/models")\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")

text = "Hello,I'm 20 year old girl. I'm diagnosed with hyperthyroid 1 month ago. I was feeling weak, light headed,poor digestion, panic attacks, depression, left chest pain, increased heart rate, rapidly weight loss, from 4 months. Because of this, I stayed in the hospital and just discharged from hospital. I had many other blood tests, brain mri, ultrasound scan, endoscopy because of some dumb doctors bcs they were not able to diagnose actual problem."

Result:

chunk begin end ner_label
20 year old 15 25 Age
girl 27 30 Gender
hyperthyroid 52 63 Disease
1 month ago 65 75 DateTime
weak 92 95 Symptom
light 98 102 Symptom
panic attacks 127 139 PsychologicalCondition
depression 142 151 PsychologicalCondition
left 154 157 Laterality
chest 159 163 BodyPart
pain 165 168 Symptom
increased 171 179 TestResult
heart rate 181 190 VitalTest
rapidly 193 199 Modifier
weight loss 201 211 Symptom
4 months 220 227 Duration
hospital 263 270 ClinicalDept
discharged 281 290 AdmissionDischarge
hospital 297 304 ClinicalDept
blood tests 324 334 Test
brain 337 341 BodyPart
mri 343 345 Test
ultrasound scan 348 362 Test
endoscopy 365 373 Procedure

Please check the model card for more information

New Rule-Based Entity Matcher Models to Customise De-IDentification Pipelines

We introduce a suite of text and regex matchers, specifically designed to enhance the deidentification and clinical document understanding process with rule-based methods.

Model Name Description
cancer_diagnosis_matcher This model extracts cancer diagnoses in clinical notes using a rule-based TextMatcherInternal annotator.
country_matcher This model extracts countries in clinical notes using a rule-based TextMatcherInternal annotator.
email_matcher This model extracts emails in clinical notes using a rule-based RegexMatcherInternal annotator.
phone_matcher This model extracts phone entities in clinical notes using a rule-based RegexMatcherInternal annotator.
state_matcher This model extracts states in clinical notes using a rule-based RegexMatcherInternal annotator.
zip_matcher This model extracts zip codes in clinical notes using a rule-based RegexMatcherInternal annotator.
city_matcher This model extracts city names in clinical notes using a rule-based TextMatcherInternal annotator.

Example:

text_matcher = TextMatcherInternalModel.pretrained("cancer_diagnosis_matcher", "en", "clinical/models") \
    .setInputCols(["sentence", "token"])\
    .setOutputCol("cancer_dx")\
    .setMergeOverlapping(True)

example_text = """A 65-year-old woman had a history of debulking surgery, bilateral oophorectomy with omentectomy, total anterior hysterectomy with radical pelvic lymph nodes dissection due to ovarian carcinoma (mucinous-type carcinoma, stage Ic) 1 year ago. The patient's medical compliance was poor and failed to complete her chemotherapy (cyclophosphamide 750 mg/m2, carboplatin 300 mg/m2). Recently, she noted a palpable right breast mass, 15 cm in size which nearly occupied the whole right breast in 2 months. Core needle biopsy revealed metaplastic carcinoma. Neoadjuvant chemotherapy with the regimens of Taxotere (75 mg/m2), Epirubicin (75 mg/m2), and Cyclophosphamide (500 mg/m2) was given for 6 cycles with poor response, followed by a modified radical mastectomy (MRM) with dissection of axillary lymph nodes and skin grafting. Postoperatively, radiotherapy was done with 5000 cGy in 25 fractions. The histopathologic examination revealed a metaplastic carcinoma with squamous differentiation associated with adenomyoepithelioma. Immunohistochemistry study showed that the tumor cells are positive for epithelial markers-cytokeratin (AE1/AE3) stain, and myoepithelial markers, including cytokeratin 5/6 (CK 5/6), p63, and S100 stains.  Expressions of hormone receptors, including ER, PR, and Her-2/Neu, were all negative."""

Result:

chunk begin end label
ovarian carcinoma 176 192 Cancer_dx
mucinous-type carcinoma 195 217 Cancer_dx
metaplastic carcinoma 528 548 Cancer_dx
metaplastic carcinoma 937 957 Cancer_dx
adenomyoepithelioma 1005 1023 Cancer_dx

A suite of models designed for the identification and analysis of alcohol and smoking related entities in text data. These models include Named Entity Recognition (NER), assertion status, relation extraction, and classification, providing a comprehensive toolkit for analyzing substance use information.

  • NER Model
Model Name Predicted Entities Description
ner_alcohol_smoking Drinking_Status, Alcohol_Type, Smoking_Status, Smoking_Type, Substance_Duration, Substance_Frequency, Substance_Quantity, Cardiovascular_Issues, Respiratory_Issues, GUT_Issues, Neurologic_Issues, Psychiatric_Issues, Other_Health_Issues, Drinking_Environment, Cessation_Treatment, Withdrawal_Treatment Detects alcohol and smoking related entities within text data

This ner_alcohol_smoking model is designed to detect and label alcohol and smoking-related entities within text data. Alcohol refers to beverages containing ethanol, a psychoactive substance that is widely consumed for its pleasurable effects. Smoking typically involves inhaling smoke from burning tobacco, a highly addictive substance. The model has been trained using advanced deep learning techniques on a diverse range of text sources and can accurately recognize and classify a wide range of entities related to alcohol and smoking. The model’s accuracy and precision have been carefully validated against expert-labeled data to ensure reliable and consistent results.

Example:


ner_model = MedicalNerModel.pretrained("ner_alcohol_smoking", "en", "clinical/models")\
    .setInputCols(["sentence", "token","embeddings"])\
    .setOutputCol("ner")

sample_texts = ["""The outpatient clinic addressed a complaint from the patient regarding severe anxiety and withdrawal symptoms. 
He disclosed a history of alcohol addiction, with weekly episodes of intense binge drinking over the past decade. 
However, due to recent challenges in his personal and professional life, he decided to quit drinking cold turkey a week ago. 
Since then, he has been experiencing escalating symptoms including tremors, sweating, nausea, and severe anxiety. 
The patient denies recent use drugs or smoking, focusing her struggles solely on alcohol.
He was placed on CIWA protocol w/ lorazepam for management. Scheduled for cognitive-behavioral therapy (CBT)."""]

Result:

chunk begin end ner_label
anxiety 78 84 Psychoneurologic_Issue
alcohol addiction 138 154 Drinking_Status
weekly 162 167 Substance_Frequency
binge 189 193 Substance_Quantity
drinking 195 202 Drinking_Status
over the past decade 204 223 Substance_Duration
drinking 319 326 Drinking_Status
tremors 420 426 Psychoneurologic_Issue
sweating 429 436 Other_Health_Issues
nausea 439 444 GUT_Issues
anxiety 458 464 Psychoneurologic_Issue
smoking 507 513 Smoking_Status
alcohol 549 555 Drinking_Status
CIWA 575 578 Withdrawal_Treatment
lorazepam 592 600 Withdrawal_Treatment
cognitive-behavioral therapy 632 659 Cessation_Treatment
CBT 662 664 Cessation_Treatment
  • Assertion Models
Model Name Assertion Status Description
assertion_alcohol_smoking_wip Absent, Hypothetical_Possible, Past_History, Present_Planned This model detects the assertion status of entities related to alcohol-smoking.
assertion_alcohol_smoking_general_symptoms_wip Overdose_Symptom, Withdrawal_Symptom This model detects the assertion status of general symptoms entity related to alcohol-smoking.

Example:

assertion = AssertionDLModel.pretrained("assertion_alcohol_smoking_wip", "en", "clinical/models")\
    .setInputCols(["sentence", "ner_chunk", "embeddings"]) \
    .setOutputCol("assertion")

sample_texts = ["""Per the patient, the last drink was on ___, prior to admission. The patient admits to having experienced tremors, palpitations, and diaphoresis during the past alcohol withdrawals, but he denies ever having experienced seizures. Mr. ___ did not report experiencing any symptoms of withdrawal throughout his hospital stay, and an examination revealed no evidence of withdrawal.""",
               """SUBSTANCE ABUSE: The patient admitted to occasional binge drinking, but admitted to normally consuming one pint of liquor a day in the week before her admission. Before she attempted suicide, she was heavily intoxicated and had a high blood alcohol level (BAL). Attending the AA meetings and expressing a desire to keep going to AA to support sobriety were two ways the patient showed motivation to stop drinking. The patient was put on the CIWA protocol upon admission, but no PRN Valium was needed for alcohol withdrawal."""]

Result:

chunk begin end ner_label assertion confidence
drink 26 30 Drinking_Status Past_History 0.8507
tremors 105 111 Psychoneurologic_Issue Past_History 0.9315
palpitations 114 125 Cardiovascular_Issues Past_History 0.9251
diaphoresis 132 142 Other_Health_Issues Past_History 0.9181
alcohol 160 166 Drinking_Status Past_History 0.9109
seizures 219 226 Psychoneurologic_Issue Absent 0.5359
binge 52 56 Substance_Quantity Present_Planned 0.5528
drinking 58 65 Drinking_Status Present_Planned 0.5704
one pint 103 110 Substance_Quantity Present_Planned 0.6838
liquor 115 120 Alcohol_Type Present_Planned 0.6879
a day 122 126 Substance_Frequency Present_Planned 0.8029
suicide 183 189 Psychoneurologic_Issue Past_History 0.731
intoxicated 208 218 Psychoneurologic_Issue Past_History 0.7832
alcohol 241 247 Drinking_Status Past_History 0.507
AA 276 277 Cessation_Treatment Present_Planned 0.4559
AA 329 330 Cessation_Treatment Present_Planned 0.5112
drinking 404 411 Drinking_Status Present_Planned 0.5385
CIWA 441 444 Withdrawal_Treatment Present_Planned 0.5693
Valium 482 487 Withdrawal_Treatment Absent 0.553
alcohol 504 510 Drinking_Status Present_Planned 0.5135
  • Relation Extraction Model
Model Name Predicted Entities Description
re_alcohol_smoking_clinical_wip is_caused_by, is_used_for It recognizes relations between treatment cessation and withdrawal with drinking and smoking status, as well as relations between various health issues (Neurologic, Psychiatric, Cardiovascular, Respiratory, GUT, and Other Health Issues) and drinking and smoking status.

Example:

clinical_re_Model = RelationExtractionModel()\
    .pretrained("re_alcohol_smoking_clinical_wip", "en", "clinical/models")\
    .setInputCols(["embeddings", "pos_tags", "ner_chunk", "dependencies"])\
    .setOutputCol("relations")\
    .setMaxSyntacticDistance(4)\
    .setRelationPairs(["Cessation_Treatment-Drinking_Status",
    "Cessation_Treatment-Smoking_Status",
    "Respiratory_Issues-Drinking_Status",
    "Respiratory_Issues-Smoking_Status"])

text = ["""Pulmonary Function Tests: Demonstrates airflow limitation consistent with chronic obstructive pulmonary disease
 (COPD). Diagnosis: Acute exacerbation of COPD secondary to smoking.
 Diagnosis: Alcoholic fatty liver disease and smoking-related respiratory symptoms.Management: The patient received alcohol cessation counseling and support services to address her alcohol use disorder. She was also provided with smoking cessation pharmacotherapy and behavioral interventions to help her quit smoking."""]

Result:

  sentence entity1_begin entity1_end chunk1 entity1 entity2_begin entity2_end chunk2 entity2 relation confidence
0 2 154 157 COPD Respiratory_Issues 172 178 smoking Smoking_Status is_caused_by 0.999902
2 4 297 303 alcohol Drinking_Status 305 324 cessation counseling Cessation_Treatment is_used_for 0.999512
3 4 297 303 alcohol Drinking_Status 330 345 support services Cessation_Treatment is_used_for 0.933377
4 5 411 417 smoking Smoking_Status 419 443 cessation pharmacotherapy Cessation_Treatment is_used_for 0.996433
5 5 411 417 smoking Smoking_Status 449 472 behavioral interventions Cessation_Treatment is_used_for 0.9565
  • Classification Models
Model Name Assertion Status Description
genericclassifier_alcohol_mpnet_wip Current_Drinker, Others The primary goal of the model is to categorize texts into two main label categories: ‘Current_Drinker’ and ‘Others.’ (past or non-smoker)
genericclassifier_smoking_mpnet_wip Current_Smoker, Others The primary goal of the model is to categorize texts into two main label categories: ‘Current_Smoker’ and ‘Others.’ (past or non-smoker)

Example:

generic_classifier = GenericClassifierModel.pretrained('genericclassifier_alcohol_mpnet_wip', 'en', 'clinical/models')\
    .setInputCols("features")\
    .setOutputCol("prediction")

text_list = [
             "The patient, with a history of COPD and alcohol dependence, was initially admitted due to a COPD exacerbation and community-acquired pneumonia. The situation was further complicated by alcohol withdrawal. He was later transferred to another facility for treatment of left hand cellulitis, which raised concerns for necrotizing fasciitis.",
             "Until recently, the patient had maintained stability on his antidepressant regimen. However, he experienced a notable worsening of depressive symptoms last week, leading him to engage in heavy binge drinking as an ineffective way to suppress his emotional distress and feelings of despair.",
             "Ms. Jane Doe, a 60-year-old retired teacher, presented to the emergency department complaining of severe abdominal pain and vomiting. She has a history of gallstones but has been asymptomatic for years. Currently, she does not smoke or drink alcohol, focusing on a healthy lifestyle.",
             "Mr. John Smith, a 45-year-old accountant, came to the clinic reporting intense chest pain and shortness of breath. He has a history of hypertension but has managed it well with medication. He currently does not smoke or drink alcohol, maintaining a healthy lifestyle."]

Result:

text result
The patient, with a history of COPD and alcohol dependence, was initially admitted due to a COPD … Current_Drinker
Until recently, the patient had maintained stability on his antidepressant regimen. However, he e… Current_Drinker
Ms. Jane Doe, a 60-year-old retired teacher, presented to the emergency department complaining of… Others
Mr. John Smith, a 45-year-old accountant, came to the clinic reporting intense chest pain and sho… Others

A set of sophisticated models aimed at extracting and analyzing menopause-related entities in text data. These models include a Named Entity Recognition (NER) model and assertion models, which identify and determine the status of various menopause-related terms, aiding in comprehensive menopause data analysis.

  • NER Model
Model Name Predicted Entities Description
ner_menopause_core Perimenopause, Menopause, Gynecological_Symptom, Gynecological_Disease, Other_Symptom, Irregular_Menstruation, G_P, Hypertension, Osteoporosis, Oncological, Fracture, Hormone_Replacement_Therapy, Osteporosis_Therapy, Antidepressants, Procedure, Hormone_Testing, Vaginal_Swab, Age, Test_Result Detects menopause related entities within text data

This ner_menopause_core model is designed to detect and label core entities related to menopause and associated conditions within text data. Menopause-related terms and conditions are crucial factors that influence individuals’ health outcomes, especially among women undergoing the menopausal transition. The model has been trained using advanced machine-learning techniques on a diverse range of text sources. It can accurately recognize and classify a wide range of menopause-related entities. The model’s accuracy and precision have been carefully validated against expert-labeled data to ensure reliable and consistent results.

Example:


ner_model = MedicalNerModel.pretrained("ner_menopause_core", "en", "clinical/models"))\
    .setInputCols(["sentence", "token","embeddings"])\
    .setOutputCol("ner")

sample_texts = ["""The patient is a 52-year-old female, G3P2, who presents with complaints of irregular menstruation and symptoms suggestive of perimenopause. She reports experiencing hot flashes, night sweats, and vaginal dryness. Her medical history includes polycystic ovary syndrome (PCOS), fatigue, mood swings, hypertension diagnosed 5 years ago and currently managed with medication, and osteoporosis diagnosed 2 years ago with ongoing treatment. 
Current medications include estradiol for hormone replacement therapy, alendronate for osteoporosis therapy, and fluoxetine for depressive symptoms related to menopause. Recent tests and procedures include a bone density scan to monitor osteoporosis, blood tests for estradiol and follicle-stimulating hormone (FSH) levels, and a vaginal swab collected for routine infection screening. Test results showed elevated FSH levels indicating menopause."""]

Result:

chunk begin end ner_label
irregular menstruation 76 97 Irregular_Menstruation
perimenopause 126 138 Perimenopause
hot flashes 166 176 Other_Symptom
night sweats 179 190 Other_Symptom
vaginal dryness 197 211 Gynecological_Symptom
polycystic ovary syndrome 243 267 Gynecological_Disease
PCOS 270 273 Gynecological_Disease
fatigue 277 283 Other_Symptom
hypertension 299 310 Hypertension
osteoporosis 377 388 Osteoporosis
estradiol 466 474 Hormone_Replacement_Therapy
hormone replacement therapy 480 506 Hormone_Replacement_Therapy
alendronate 509 519 Osteporosis_Therapy
osteoporosis 525 536 Osteoporosis
fluoxetine 551 560 Antidepressants
menopause 597 605 Menopause
osteoporosis 675 686 Osteoporosis
estradiol 705 713 Hormone_Testing
follicle-stimulating hormone 719 746 Hormone_Testing
FSH 749 751 Hormone_Testing
vaginal swab 768 779 Vaginal_Swab
elevated 844 851 Test_Result
FSH 853 855 Hormone_Testing
menopause 875 883 Menopause
  • Assertion Models
Model Name Assertion Status Description
assertion_menopause_wip Present, Absent, Possible, Past, Hypothetical, Planned, Family, Menarche_Age, Menopause_Age This model detects the assertion status of menopause-related entities.

Example:

assertion = AssertionDLModel.pretrained("assertion_menopause_wip", "en", "clinical/models")\
    .setInputCols(["sentence", "ner_chunk", "embeddings"]) \
    .setOutputCol("assertion")

sample_texts = ["""A 50-year-old woman, G2P1, presents with symptoms of perimenopause including night sweats, irregular menstruation, and fatigue.She has previously been diagnosed with hypertension. She is taking hormone replacement therapy with estradiol and norethindrone acetate. Recent tests included a bone density scan, which confirmed osteoporosis and showed elevated FSH levels. She also underwent a vaginal swab test for routine screening. Her mother has a history of breast cancer. Her menarche age was 11."""]

Result:

chunk begin end ner_label assertion confidence
G2P1 21 24 G_P Present 0.9999
perimenopause 53 65 Perimenopause Present 0.9999
night sweats 77 88 Other_Symptom Present 0.9997
irregular menstruation 91 112 Irregular_Menstruation Present 0.9997
fatigue 119 125 Other_Symptom Present 0.9954
hypertension 166 177 Hypertension Past 0.9916
hormone replacement therapy 194 220 Hormone_Replacement_Therapy Present 0.9988
estradiol 227 235 Hormone_Replacement_Therapy Present 0.9696
norethindrone acetate 241 261 Hormone_Replacement_Therapy Present 0.9984
osteoporosis 323 334 Osteoporosis Present 1.0
elevated 347 354 Test_Result Present 1.0
FSH 356 358 Hormone_Testing Present 0.9999
vaginal swab 389 400 Vaginal_Swab Present 1.0
breast cancer 458 470 Oncological Family 0.9843
11 494 495 Age Menarche_Age 0.9891

Clinical Document Analysis with One-Liner Pretrained Pipelines for Specific Clinical Tasks and Concepts

We introduce a suite of advanced, hybrid pretrained pipelines, specifically designed to streamline the clinical document analysis process. These pipelines are built upon multiple state-of-the-art (SOTA) pretrained models, delivering a comprehensive solution for quickly extracting vital information.

Model Name Description
ner_deid_context_nameAugmented_pipeline In this pipeline, there are ner_deid_generic_augmented, ner_deid_subentity_augmented, ner_deid_name_multilingual_clinical NER models and several ContextualParser, RegexMatcher, and TextMatcher models were used
ner_profiling_vop This pipeline can be used to simultaneously evaluate various pre-trained named entity recognition (NER) models, enabling comprehensive analysis of text data pertaining to patient perspectives and experiences, also known as the “Voice of Patients”.
ner_profiling_sdoh This pipeline can be used to simultaneously evaluate various pre-trained named entity recognition (NER) models, enabling comprehensive analysis of text data pertaining to the social determinants of health (SDOH). When you run this pipeline over your text, you will end up with the predictions coming out of each pretrained clinical NER model trained with the embeddings_clinical, which are specifically designed for clinical and biomedical text.

Example:

from sparknlp.pretrained import PretrainedPipeline

pipeline_sdoh = PretrainedPipeline("ner_profiling_sdoh", "en", "clinical/models")

text = """
The patient reported experiencing symptoms of anxiety and depression, which have been affecting his quality of life.
He reported a history of childhood trauma related to violence and abuse in his household, which has contributed to his smoking, alcohol use and current mental health struggles.
He denied any recent substance use or sexual activity and reported being monogamous in his relationship with his wife.
The patient is an immigrant and speaks English as a second language.
He reported difficulty accessing healthcare due to lack of medical insurance.
He has a herniated disc, hypertension, coronary artery disease (CAD) and diabetes mellitus.
The patient has a manic disorder, is presently psychotic and shows impulsive behavior. He has been disabled since 2001.
"""

Results:


******************** ner_sdoh_substance_usage_wip Model Results ******************** 

('smoking', 'Smoking') ('alcohol use', 'Alcohol') ('substance use', 'Substance_Use')

******************** ner_sdoh_health_behaviours_problems_wip Model Results ******************** 

('anxiety', 'Mental_Health') ('depression', 'Mental_Health') ('quality of life', 'Quality_Of_Life') ('mental health struggles', 'Mental_Health') ('sexual activity', 'Sexual_Activity') ('monogamous', 'Sexual_Activity') ('herniated disc', 'Other_Disease') ('hypertension', 'Hypertension') ('coronary artery disease', 'Other_Disease') ('CAD', 'Other_Disease') ('diabetes mellitus', 'Other_Disease') ('manic disorder', 'Mental_Health') ('psychotic', 'Mental_Health') ('impulsive behavior', 'Mental_Health') ('disabled', 'Disability')

******************** ner_jsl_greedy Model Results ******************** 

('anxiety', 'Psychological_Condition') ('depression', 'Psychological_Condition') ('smoking', 'Smoking') ('alcohol', 'Alcohol') ('substance', 'Substance') ('sexual activity', 'Symptom') ('difficulty accessing healthcare', 'Symptom') ('herniated disc', 'Disease_Syndrome_Disorder') ('hypertension', 'Hypertension') ('coronary artery disease', 'Heart_Disease') ('CAD', 'Heart_Disease') ('diabetes mellitus', 'Diabetes') ('manic disorder', 'Psychological_Condition') ('psychotic', 'Psychological_Condition') ('impulsive behavior', 'Symptom')

******************** ner_jsl_enriched Model Results ******************** 

('anxiety', 'Psychological_Condition') ('depression', 'Psychological_Condition') ('smoking', 'Smoking') ('alcohol', 'Alcohol') ('substance', 'Substance') ('difficulty accessing healthcare', 'Symptom') ('lack of medical insurance', 'Symptom') ('herniated disc', 'Disease_Syndrome_Disorder') ('hypertension', 'Hypertension') ('coronary artery disease', 'Heart_Disease') ('CAD', 'Heart_Disease') ('diabetes mellitus', 'Diabetes') ('manic disorder', 'Psychological_Condition') ('psychotic', 'Symptom') ('impulsive behavior', 'Symptom') ('disabled', 'Symptom')

Please check the Task Based Clinical Pretrained Pipelines Notebook for more information

Formal Release of Oncological Assertion Status Detection and Relation Extraction Models

We are releasing the formal version of the “work-in-progress (WIP)” assertion status detection and relation extraction models in the Oncology domain.

Here is the reference table:

WIP Version Formal Version Task
assertion_oncology_demographic_binary_wip assertion_oncology_demographic_binary Assertion Status Detection
assertion_oncology_family_history_wip assertion_oncology_family_history Assertion Status Detection
assertion_oncology_problem_wip assertion_oncology_problem Assertion Status Detection
assertion_oncology_response_to_treatment_wip assertion_oncology_response_to_treatment Assertion Status Detection
assertion_oncology_smoking_status_wip assertion_oncology_smoking_status Assertion Status Detection
assertion_oncology_test_binary_wip assertion_oncology_test_binary Assertion Status Detection
assertion_oncology_treatment_binary_wip assertion_oncology_treatment_binary Assertion Status Detection
assertion_oncology_wip assertion_oncology Assertion Status Detection
re_oncology_biomarker_result_wip re_oncology_biomarker_result Relation Extraction
re_oncology_granular_wip re_oncology_granular Relation Extraction
re_oncology_location_wip re_oncology_location Relation Extraction
re_oncology_size_wip re_oncology_size Relation Extraction
re_oncology_temporal_wip re_oncology_temporal Relation Extraction
re_oncology_test_result_wip re_oncology_test_result Relation Extraction
re_oncology_wip re_oncology Relation Extraction (DL)
redl_oncology_biobert_wip redl_oncology_biobert Relation Extraction (DL)
redl_oncology_biomarker_result_biobert_wip redl_oncology_biomarker_result_biobert Relation Extraction (DL)
redl_oncology_granular_biobert_wip redl_oncology_granular_biobert Relation Extraction (DL)
redl_oncology_location_biobert_wip redl_oncology_location_biobert Relation Extraction (DL)
redl_oncology_size_biobert_wip redl_oncology_size_biobert Relation Extraction (DL)
redl_oncology_temporal_biobert_wip redl_oncology_temporal_biobert Relation Extraction (DL)
redl_oncology_test_result_biobert_wip redl_oncology_test_result_biobert Relation Extraction (DL)

11 New Fine-Tuned Sentence Embedding Models finetuned with medical assertion datasets

Discover our new fine-tuned transformer-based sentence embedding models, meticulously trained on a curated list of clinical and biomedical datasets. These models are specifically optimized for Few-Shot Assertion tasks but are versatile enough to be utilized for other applications, such as Classification and Retrieval-Augmented Generation (RAG). Our collection offers precise and reliable embeddings tailored for various medical domains, significantly enhancing the extraction, analysis, and processing of assertion-related data in healthcare texts.

Model Name Description
mpnet_embeddings_medical_assertion Fine-tuned on the in-house dataset using the MPNet architecture.
mpnet_embeddings_medical_assertion_jsl Fine-tuned on the in-house dataset using the MPNet architecture.
mpnet_embeddings_medical_assertion_oncology Fine-tuned on the oncology dataset using the MPNet architecture.
mpnet_embeddings_medical_assertion_sdoh Fine-tuned on the social determinants of health dataset using the MPNet architecture.
e5_base_v2_embeddings_medical_assertion_base Fine-tuned on the in-house dataset using the E5 architecture.
e5_base_v2_embeddings_medical_assertion_jsl Fine-tuned on the in-house dataset using the E5 architecture.
e5_base_v2_embeddings_medical_assertion Fine-tuned on the the in-house dataset using the E5 architecture.
e5_base_v2_embeddings_medical_assertion_sdoh Fine-tuned on the social determinants of health dataset using the E5 architecture.
e5_base_v2_embeddings_medical_assertion_smoking Fine-tuned on the smoking dataset using the E5 architecture.
e5_base_v2_embeddings_medical_assertion_oncology Fine-tuned on the oncology dataset using the E5 architecture.
e5_base_v2_embeddings_medical_assertion_radiology Fine-tuned on the radiology dataset using the E5 architecture.

Example:

mpnet_embedding = MPNetEmbeddings.pretrained("mpnet_embeddings_medical_assertion_sdoh", "en", "clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("mpnet_embeddings")

text = [
    ["I feel a bit drowsy after taking an insulin."],
    ["Peter Parker is a nice lad and lives in New York"]
]

Result:

embeddings
[{sentence_embeddings, 0, 43, I feel a bit drowsy after taking an insulin., {sentence -> 0}, [-0.09830807, 0.0137982415, -0.051585164, -0.0023749713, -0.017916167, 0.017543513, 0.025593378, 0.05106…
[{sentence_embeddings, 0, 47, Peter Parker is a nice lad and lives in New York, {sentence -> 0}, [-0.10453681, 0.010062916, -0.024983741, 0.009945293, -0.01242009, 0.018787898, 0.039723188, 0.04624…

Significantly Faster Vector-DB Based Entity Resolution Models Than Existing Sentence Entity Resolver Models

We have developed vector database-based entity resolution models that are 10x faster on GPU and 2x as fast on CPU compared to the existing Sentence Entity Resolver models.

NOTE: These models are not available on the Models Hub page yet and cannot be used like the other Spark NLP for Healthcare models. They will be integrated into the marketplace and made available there soon.

RxNorm Code Mapping Benchmarks and Cost Comparisons: John Snow Labs, GPT-4, and Amazon Comprehend Medical

We have prepared an accuracy benchmark and the cost analysis between Healthcare NLP, GPT-4, and Amazon Comprehend Medical for mapping medications to their RxNorm terms. Here are the notes:

  • For the ground truth dataset, we used 79 in-house clinical notes annotated by the medical experts of John Snow Labs.
  • John Snow Labs: We used sbiobertresolve_rxnorm_augmented and biolordresolve_rxnorm_augmented models for this benchmark. These models can return up to 25 closest results sorted by their distances.
  • GPT-4: Both GPT-4 (Turbo) and GPT-4o models are used. According to the official announcement, the performance of GPT-4 and GPT-4o is almost identical, and we used both versions for the accuracy calculation. Additionally, the GPT-4 returns only one result, which means you will see the same results in both evaluation approaches.
  • Amazon Comprehend Medical: The RxNorm tool of this service is used, and it returns up to 5 closest matches sorted by their distances.
  • We adopted two approaches for evaluating these tools, given that the model outputs may not precisely match the annotations:
    • Top-3: Compare the annotations to see if they appear in the first three results.
    • Top-5: Compare the annotations to see if they appear in the first five results.

Here are the accuracy results:

Top-3 Results:

top_3

Top-5 Results:

top_5

Conclusion:

Based on the evaluation results:

  • The sbiobertresolve_rxnorm_augmented model of John Snow Labs consistently provides the most accurate results in each top_k comparison.
  • The biolordresolve_rxnorm_augmented model of John Snow Labs outperforms Amazon Comprehend Medical and GPT-4 in mapping terms to their RxNorm codes.
  • The GPT-4 could only return one result, reflected similarly in both charts, and has proven to be the least accurate.

If you want to process 1M documents and extract RxNorm codes for medication entities (excluding the NER stage), the total cost:

  • With John Snow Labs is about $4,500, including the infrastructure costs.
  • $24,250 with Amazon Comprehend Medical
  • $44,000 with the GPT-4 and $22,000 with the GPT-4o model.

Therefore, John Snow Labs is almost 5 times cheaper than its closest alternative, not to mention the accuracy differences (Top 3: John Snow Labs 82.7% vs Amazon 55.8% vs GPT-4 8.9%).

Accuracy & Cost Table
Top-3 Accuracy Top-5 Accuracy Cost
John Snow Labs 82.7% 84.6% $4,500
Amazon Comprehend Medical 55.8% 56.2% $24,250
GPT-4 (Turbo) 8.9% 8.9% $44,000
GPT-4o 8.9% 8.9% $22,000

If you want to see more details, please check Benchmarks Page and State-of-the-art RxNorm Code Mapping with NLP: Comparative Analysis between the tools by John Snow Labs, Amazon, and GPT-4 blog post.

New Blogposts on Using NLP in Opioid Research and Healthcare: Harnessing NLP, Knowledge Graphs, and Regex Techniques for Critical Insights

Explore the latest developments in healthcare NLP and Knowledge Graphs through our new blog posts, where we take a deep dive into the innovative technologies and methodologies transforming the medical field. These posts offer insights into how the latest tools are being used to analyze large amounts of unstructured data, identify critical medical assets, and extract meaningful patterns and correlations. Learn how these advances are not only improving our understanding of complex health issues but also contributing to more effective prevention, diagnosis, and treatment strategies.

New Notebooks for Medication and Resolutions Concept

To better understand the Medication and Resolutions Concept, the following notebooks have been developed:

  • New Clinical Medication Use Case notebook: This notebook is designed to extract and analyze medication information from a clinical dataset. Its purpose is to identify commonly used medications, gather details on dosage, frequency, strength, and route, determine current and past usage, understand pharmacological actions, identify treatment purposes, retrieve relevant codes (RxNorm, NDC, UMLS, SNOMED), and find associated adverse events.
  • New Resolving Medical Terms to Terminology Codes Directly notebook: In this notebook, you will find how to optimize the process to get SentenceEntityResolverModel model outputs.
  • New Analyse Veterinary Documents with Healthcare NLP notebook: In this notebook, we use Spark NLP for Healthcare to process veterinary documents. We focus on Named Entity Recognition (NER) to identify entities, Assertion Status to confirm their condition, Relation Extraction to understand their relationships, and Entity Resolution to standardize terms. This helps us efficiently extract and analyze critical information from unstructured veterinary texts.

Updated Udemy MOOC (Our Online Courses) Notebooks

Recently updated Udemy MOOC (Massive Online Course) notebooks that focus on using Spark NLP annotators for healthcare applications. These notebooks provide practical examples and exercises for learning how to implement and utilize various Spark NLP tools and techniques specifically designed for processing and analyzing healthcare-related text data. The update might include new features, improvements, or additional content to enhance the learning experience for students and professionals in the healthcare field.

Please check the Spark_NLP_Udemy_MOOC folder for the all Healthcare MOOC Notebooks

Various Core Improvements; Bug Fixes, Enhanced Overall Robustness, and Reliability of Spark NLP for Healthcare

  • Resolved broken links in healthcare demos
  • Added a unique ID field for each entity into the result of the pipeline_ouput_parser module
  • Fixed deidentification AGE obfuscation hanging issue
  • Added DatasetInfo parameter into the MedicalNERModel annotator

Updated Notebooks And Demonstrations For making Spark NLP For Healthcare Easier To Navigate And Understand

We Have Added And Updated A Substantial Number Of New Clinical Models And Pipelines, Further Solidifying Our Offering In The Healthcare Domain.

  • pdf_deid_subentity_context_augmented_pipeline
  • ner_deid_context_nameAugmented_pipeline
  • ner_profiling_vop
  • ner_vop_v2
  • ner_alcohol_smoking
  • sbiobertresolve_snomed_veterinary
  • cancer_diagnosis_matcher
  • country_matcher
  • email_matcher
  • phone_matcher
  • state_matcher
  • zip_matcher
  • contextual_assertion_someone_else
  • contextual_assertion_absent
  • contextual_assertion_past
  • ner_alcohol_smoking
  • assertion_alcohol_smoking_wip
  • assertion_alcohol_smoking_general_symptoms_wip
  • re_alcohol_smoking_clinical_wip
  • genericclassifier_alcohol_mpnet_wip
  • genericclassifier_smoking_mpnet_wip
  • ner_menopause_core
  • assertion_menopause_wip
  • fewhot_assertion_jsl_e5_base_v2_jsl
  • fewhot_assertion_e5_base_v2
  • fewhot_assertion_sdoh_e5_base_v2_sdoh
  • fewhot_assertion_smoking_e5_base_v2_smoking
  • fewhot_assertion_oncology_e5_base_v2_oncology
  • fewhot_assertion_radiology_e5_base_v2_radiology
  • mpnet_embeddings_medical_assertion
  • mpnet_embeddings_medical_assertion_jsl
  • mpnet_embeddings_medical_assertion_oncology
  • mpnet_embeddings_medical_assertion_sdoh
  • e5_base_v2_embeddings_medical_assertion_base
  • e5_base_v2_embeddings_medical_assertion_jsl
  • e5_base_v2_embeddings_medical_assertion
  • e5_base_v2_embeddings_medical_assertion_sdoh
  • e5_base_v2_embeddings_medical_assertion_smoking
  • e5_base_v2_embeddings_medical_assertion_oncology
  • e5_base_v2_embeddings_medical_assertion_radiology
  • ner_profiling_vop
  • ner_profiling_sdoh
  • ner_profiling_oncology
  • explain_clinical_doc_granular
  • explain_clinical_doc_radiology
  • explain_clinical_doc_medication
  • medication_resolver_pipeline
  • medication_resolver_transform_pipeline
  • rxnorm_resolver_pipeline
  • rxnorm_mapper
  • biolordresolve_rxnorm_augmented
  • sbiobertresolve_rxnorm_augmented
  • sbiobertresolve_umls_clinical_drugs
  • sbiobertresolve_umls_disease_syndrome
  • sbiobertresolve_umls_drug_substance
  • sbiobertresolve_umls_findings
  • sbiobertresolve_umls_general_concepts
  • sbiobertresolve_umls_major_concepts
  • clinical_deidentification
  • clinical_deidentification_multi_mode_output
  • classifierml_ade
  • assertion_dl_radiology
  • ner_oncology_wip
  • ner_sdoh_access_to_healthcare_wip
  • ner_sdoh_community_condition_wip
  • ner_sdoh_demographics_wip
  • ner_sdoh_health_behaviours_problems_wip
  • ner_sdoh_income_social_status_wip
  • ner_sdoh_slim_wip
  • ner_sdoh_social_environment_wip
  • ner_sdoh_substance_usage_wip
  • ner_sdoh_wip
  • JSL_MedS_q16_v1
  • JSL_MedS_q8_v1
  • JSL_MedS_q4_v1
  • JSL_MedM_q16_v1
  • JSL_MedM_q8_v1
  • JSL_MedM_q4_v1
  • JSL_MedSNer_ZS_q16_v1
  • JSL_MedSNer_ZS_q8_v1
  • JSL_MedSNer_ZS_q4_v1

For all Spark NLP for Healthcare models, please check: Models Hub Page

Previous versions

Last updated