6.0.0
Highlights
We’re excited to introduce significant advancements in the latest release of Spark NLP for Healthcare. This update features a new FHIR Dedidentification module, a new Medical LLM finetuned for SOAP note generation to support advanced clinical documentation, the Assertion classification based on Bert architecture along with new contextual assertion models to enhance assertion status detection. Additionally, we are releasing new models and utility tools for Human Phenotype term extraction and mapping using official terminology vocabularies. This release includes 29 new clinical pretrained models and pipelines to expand coverage across a range of healthcare tasks.
- Introducing a new Assertion classification architectures based on Bert for sequence classification to improve assertion status detection contextually.
- Scalable and flexible FHIR deidentification module deidentify FHIR records across various resource types.
- Small size Medical LLM for generating clinical SOAP (Subjective, Objective, Assessment, Plan) note
- Human Phenotype term extraction and mapping using official terminology vocabularies
- Inferring missing PHI entities via previously detected NER outputs within Deidentification
- Improved Name Consistency Handling in Deidentification
- Clinical document analysis with one-liner pretrained-pipelines for specific clinical tasks and concepts
- Cloud Provider & LLM Benchmarks for Information Extraction tasks
- Enabling flexible annotation transformations with
AnnotationConverter
- Introducing 2 new contextual assertion models to detect assertion status given a larger context and scenarious.
- New German Oncology named entity recognition model and ICD10 entity resolver model
- Enabling exception handling within the pipelines for specified or all eligible stages
- Interactive chat support for Healthcare NLP documentation
- New peer-reviewed papers and blog posts on various topics
- Various core improvements, bug fixes, enhanced overall robustness, and reliability of Spark NLP for Healthcare
- Resolved log file creation on Databricks for training
- Added
consistentAcrossNameParts
param toLightDeIdentification
andStructuredDeIdentification
- Updated notebooks and demonstrations for making Spark NLP for Healthcare easier to navigate and understand
- New Spanish Healthcare Models Notebook
- New PRR ROR EBGM for Signal Processing of Drug Events Notebook
- New BertForAssertionClassification Notebook
- New NER Performance Comparison Of Healthcare NLP VS Cloud Solutions Notebook
- New Fhir DeIdentification Notebook
- New Human Phenotype Extraction And HPO Code Mapping Notebook
- Updated German Clinical Deidentification Notebook
- Updated German Healthcare Models Notebook
- Updated Deidentification Custom Pretrained Pipelines Notebook
- Updated Clinical_DeIdentification Notebook
- Updated Analyse Veterinary Documents with Healthcare NLP Notebook
- Updated Loading Medical and Open Souce LLMs Notebook
- The addition and update of numerous new clinical models and pipelines continue to reinforce our offering in the healthcare domain
These enhancements will elevate your experience with Spark NLP for Healthcare, enabling more efficient, accurate, and streamlined analysis of healthcare-related natural language data.
Introducing a New Assertion Classification Architectures Based On Bert for Sequence Classification to Improve Assertion Status Detection Contextually
BertForAssertionClassification
extracts the assertion status from text by analyzing both the extracted entities and their surrounding context. This classifier leverages pre-trained BERT models fine-tuned on biomedical text (e.g., BioBERT) and applies a sequence classification/regression head (a linear layer on the pooled output) to support multi-class document classification.
Model Name | Predicted Entities |
---|---|
assertion_bert_classification_radiology |
Confirmed , Suspected , Negative |
assertion_bert_classification_jsl |
Present , Planned , SomeoneElse , Past , Family , Absent , Hypothetical , Possible |
assertion_bert_classification_clinical |
absent , present , conditional , associated_with_someone_else , hypothetical , possible |
Example:
...
ner = medical.NerModel.pretrained("ner_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = medical.NerConverterInternal()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")\
.setWhiteList(["PROBLEM"])
assertion_classifier = medical.BertForAssertionClassification.pretrained("assertion_bert_classification_clinical", "en", "clinical/models")\
.setInputCols(["sentence", "ner_chunk"])\
.setOutputCol("assertion_class")
text = """Patient with severe fever and sore throat.
He shows no stomach pain and he maintained on an epidural and PCA for pain control.
He also became short of breath with climbing a flight of stairs.
After CT, lung tumor located at the right lower lobe. Father with Alzheimer.
"""
data = spark.createDataFrame([[text]]).toDF("text")
result = pipeline.fit(data).transform(data)
# show results
result.selectExpr("explode(assertion_class) as result")\
.selectExpr("result.metadata['ner_chunk'] as ner_chunk",
"result.begin as begin",
"result.begin as end",
"result.metadata['ner_label'] as ner_chunk",
"result.result as assertion").show(truncate=False)
Result:
ner_chunk | begin | end | ner_chunk | assertion |
---|---|---|---|---|
severe fever | 13 | 13 | PROBLEM | present |
sore throat | 30 | 30 | PROBLEM | present |
stomach pain | 55 | 55 | PROBLEM | absent |
pain control | 113 | 113 | PROBLEM | hypothetical |
short of breath | 142 | 142 | PROBLEM | conditional |
lung tumor | 202 | 202 | PROBLEM | present |
Alzheimer | 258 | 258 | PROBLEM | associated_with_someone_else |
Please check the BertForAssertionClassification Notebook for more information
Scalable and Flexible FHIR De-identification Module Deidentify FHIR Records Across Various Resource Types
We’re excited to introduce a major upgrade to FHIR deidentification in Spark NLP for Healthcare. This new feature brings generic, path-based de-identification, eliminating the need for resource-specific configurations and enabling seamless processing across various FHIR document types. With full support for FHIR R4, basic support for R5, and legacy compatibility with STU3, it offers broad coverage for modern healthcare systems. The de-identification tool works with both JSON and XML formats, and is compatible with both Spark pipelines and direct usage—making it more powerful and flexible than ever for protecting patient data at scale.
To use FHIR deidentification module, please set fhir_deid=True
in the start() function.
Example:
rules = {
"Patient.birthDate" : "DATE",
"Patient.name.given" : "FIRST_NAME",
"Patient.name.family" : "LAST_NAME",
"Patient.telecom.value" : "EMAIL",
"Patient.gender" : "GENDER"
}
fhir = (
FhirDeIdentification()
.setInputCol("text")
.setOutputCol("deid")
.setMode("obfuscate")
.setMappingRules(rules)
.setFhirVersion("R4")
.setParserType("JSON")
.setDays(20)
.setSeed(88)
.setCustomFakers(
{
"GENDER": ["female", "other"]
}
)
.setObfuscateRefSource("both")
)
john_doe = """{
"resourceType": "Patient",
"id": "example",
"name": [
{
"use": "official",
"family": "Doe",
"given": [
"John",
"Michael"
]
}
],
"telecom": [
{
"system": "email",
"value": "john.doe@example.com"
},
{
"system": "url",
"value": "http://johndoe.com"
}
],
"birthDate": "1970-01-01",
"gender": "male"
}"""
Result:
{
'resourceType': 'Patient',
'id': 'example',
'name': [
{
'use': 'official',
'family': 'Cease',
'given': [
'Mylene',
'Anola'
]
}
],
'telecom': [
{
'system': 'email',
'value': 'Bryton@yahoo.com'
},
{
'system': 'url',
'value': 'https://aurora.com'
}
],
'birthDate': '1970-01-21',
'gender': 'other'
}
Small Size Medical LLM for Generating Clinical SOAP (Subjective, Objective, Assessment, Plan) Note
This large language model is used by a medical expert to generate structured SOAP (Subjective, objective, assessment, plan) summaries from patient-provider dialogues.It ensures medically accurate documentation using standardized terminology and professional clinical formatting suitable for real-world healthcare communication. The model strictly follows SOAP conventions without using special formatting or markdown.
Example:
medical_llm = medical.AutoGGUFModel.pretrained("jsl_meds_text2soap_v1", "en", "clinical/models")\
.setInputCols("document")\
.setOutputCol("completions")\
.setBatchSize(1)\
.setNPredict(100)\
.setUseChatTemplate(True)\
.setTemperature(0)
pipeline = nlp.Pipeline(
stages = [
document_assembler,
medical_llm
])
prompt = """
...
A 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years prior to presentation and subsequent type two diabetes mellitus ( T2DM ), one prior episode of HTG-induced pancreatitis three years prior to presentation , and associated with an acute hepatitis , presented with a one-week history of polyuria , poor appetite , and vomiting .
She was on metformin , glipizide , and dapagliflozin for T2DM and atorvastatin and gemfibrozil for HTG . She had been on dapagliflozin for six months at the time of presentation .
Physical examination on presentation was significant for dry oral mucosa ; significantly , her abdominal examination was benign with no tenderness , guarding , or rigidity . Pertinent laboratory findings on admission were : serum glucose 111 mg/dl , creatinine 0.4 mg/dL , triglycerides 508 mg/dL , total cholesterol 122 mg/dL , and venous pH 7.27 .
"""
Result:
S: The patient is a 28-year-old female with a history of gestational diabetes mellitus diagnosed eight years ago and type two diabetes mellitus (T2DM), with a prior episode of HTG-induced pancreatitis three years ago. She presented with a one-week history of polyuria, poor appetite, and vomiting. She was on metformin, glipizide, dapagliflozin, atorvastatin, and gemfibrozil for her conditions.
O: Physical examination showed dry oral mucosa but was otherwise benign with no tenderness, guarding, or rigidity. Laboratory findings included serum glucose 111 mg/dL, creatinine 0.4 mg/dL, triglycerides 508 mg/dL, total cholesterol 122 mg/dL, and venous pH 7.27. She was on dapagliflozin for six months prior to presentation.
A: The primary diagnosis is diabetic ketoacidosis (DKA) based on the patient's symptoms of polyuria, poor appetite, vomiting, and elevated triglycerides and total cholesterol. Differential diagnoses could include other causes of polyuria such as diabetes insipidus or a complication from her underlying diabetes. The patient's history of HTG and recent use of dapagliflozin are also relevant.
P: The management plan includes close monitoring of her glucose levels and electrolyte balance. She will be started on intravenous insulin and hydration to treat DKA. Dietary adjustments will be made to manage her diabetes and HTG. Regular follow-up appointments will be scheduled to monitor her glucose levels and adjust her medications as necessary. Education on the importance of adherence to the treatment plan and monitoring for signs of complications such as worsening of symptoms or new symptoms will be provided.
Please check the Loading Medical and Open Souce LLMs Notebook for more information
Human Phenotype Term Extraction and Mapping Using Official Terminology Vocabularies
Model Name | Description |
---|---|
hpo_mapper |
This model is designed to map extracted phenotype entities from clinical or biomedical text to their corresponding Human Phenotype Ontology (HPO) codes. It ensures that observed symptoms, signs, and clinical abnormalities are standardized using HPO terminology. |
hpo_matcher |
This model is a text matcher designed to automatically extract mentions of phenotypic abnormalities associated with human diseases from clinical or biomedical text. |
stopwords_removal_hpo |
This model is designed to remove stop words from clinical phenotype descriptions, particularly in the context of Human Phenotype Ontology (HPO). |
hpo_mapper_pipeline |
This pipeline is designed to map extracted phenotype entities from clinical or biomedical text to their corresponding Human Phenotype Ontology (HPO) codes. It ensures that observed symptoms, signs, and clinical abnormalities are standardized using HPO terminology. |
- Example of Models:
entityExtractor = TextMatcherInternalModel().pretrained("hpo_matcher","en", "clinical/models")\
.setInputCols(["sentence", "clean_tokens"])\
.setOutputCol("hpo_term")\
.setCaseSensitive(False)\
.setMergeOverlapping(False)
mapper = ChunkMapperModel().pretrained("hpo_mapper","en", "clinical/models")\
.setInputCols(["hpo_term"])\
.setOutputCol("hpo_code")\
.setLowerCase(True)
text = '''Presumed apnea of prematurity since < 34 wks gestation at birth.
HYPERBILIRUBINEMIA: At risk for hyperbilirubinemia d/t prematurity.
1/25-1/30: Received Amp/Gent while undergoing sepsis evaluation. '''
data = spark.createDataFrame([[text]]).toDF("text")
Result:
chunk | begin | end | label | hpo_code |
---|---|---|---|---|
apnea | 9 | 13 | HPO | HP:0002104 |
HYPERBILIRUBINEMIA | 59 | 76 | HPO | HP:0002904 |
hyperbilirubinemia | 84 | 101 | HPO | HP:0002904 |
sepsis | 160 | 165 | HPO | HP:0100806 |
- Example of Pipeline:
pipeline = PretrainedPipeline("hpo_mapper_pipeline", "en", "clinical/models")
result = pipeline.fullAnnotate("""Presumed apnea of prematurity since < 34 wks gestation at birth.
HYPERBILIRUBINEMIA: At risk for hyperbilirubinemia d/t prematurity.
1/25-1/30: Received Amp/Gent while undergoing sepsis evaluation.""")
Result:
chunk | begin | end | label | hpo_code |
---|---|---|---|---|
apnea | 9 | 13 | HPO | HP:0002104 |
HYPERBILIRUBINEMIA | 59 | 76 | HPO | HP:0002904 |
hyperbilirubinemia | 84 | 101 | HPO | HP:0002904 |
sepsis | 160 | 165 | HPO | HP:0100806 |
Please check the Human Phenotype Extraction And HPO Code Mapping Notebook for more information
Inferring Missing PHI Entities via Previously Detected NER Outputs Within Deidentification
setChunkMatching
This feature allows chunks like “NAME” and “DATE” to be inferred in contexts where they are not directly detected by NER, by referencing previously identified chunks within the same group. For example, if “NAME” was identified earlier in the document, similar text later in the group can be tagged accordingly—even if the NER model missed it due to lack of context.
setChunkMatching({entity: threshold})
enables chunk-level entity inference across multiple rows.- Especially useful when entity annotations (e.g.,
"NAME"
,"DATE"
) are partially missing but present in other rows of the same group. - Requires
setGroupByCol("your_grouping_column")
for context-based propagation.
Example:
text = """
Created On : 10/22/2024 1:21PM Electronically Reviewed by: Kelly Goetz, M.A. on DOB 10/22/2024.Performed on Oct 22, 2024 12:48.
"""
deidentification = DeIdentification() \
.setInputCols(["ner_chunk", "token", "sentence"]) \
.setOutputCol("deidentified") \
.setMode("masked")
Before Chunk Matching:
chunk | begin | end | masked | |
---|---|---|---|---|
0 | 10/22/2024 | 13 | 18 | DATE |
1 | Kelly Goetz | 55 | 62 | DOCTOR |
2 | Oct 22, 2024 | 101 | 106 | DATE |
deidentification = DeIdentification() \
.setInputCols(["ner_chunk", "token", "sentence"]) \
.setOutputCol("deidentified") \
.setMode("masked") \
.setGroupByCol("filename") \
.setChunkMatching({"NAME": 0.50, "DATE": 0.50})
Results with Chunk Matching:
chunk | begin | end | masked | |
---|---|---|---|---|
0 | 10/22/2024 | 13 | 18 | DATE |
1 | Kelly Goetz | 55 | 62 | DOCTOR |
2 | 10/22/2024 | 77 | 82 | DATE |
3 | Oct 22, 2024 | 97 | 102 | DATE |
groupByCol
A new capability has been added to support consistent obfuscation within groups using the groupByCol
parameter in conjunction with consistentObfuscation
and ConsistentAcrossNameParts
.
- The
groupByCol
parameter allows partitioning the dataset into groups based on the values of a specified column. - When used with
consistentObfuscation=True
, it ensures consistent entity replacements within each group (e.g., per document or per patient). - Default:
""
(empty string) — grouping is disabled. - Type: Must be a valid column of type
StringType
in the input DataFrame. - The specified column must exist in the input data and must be of type
StringType
. - Using this parameter may affect the row order of the dataset.
- This feature is not supported in
LightPipeline
.
Example:
data = [
("file_1", "Patient John Doe visited on 2024-09-12 with complaints of chest pain. John Doe "),
("file_1", "Johm Doe was seen at St. Mary's Hospital in Boston. John Doe "),
("file_2", "Johm Doe has blood test results on 2025-01-05 were abnormal."),
("file_2", "Patient ID 8843921 experienced nausea and dizziness. John Doe "),
("file_3", "Michael Johnson was diagnosed with hypertension in New York."),
("file_4", "Maria Garcia came to the clinic on 2023-12-20 for a routine checkup."),
("file_4", "Contact number listed as (555) 123-4567."),
("file_4", "A follow-up was scheduled using the email maria.garcia@example.com, with a call planned for 2023-12-20."),
]
schema = StructType([
StructField("id", StringType(), True),
StructField("text", StringType(), True),
])
df = spark.createDataFrame(data, schema=schema)
obfuscation = DeIdentification()\
.setInputCols(["sentence", "token", "ner_chunk"]) \
.setOutputCol("deidentified") \
.setMode("obfuscate")\
.setConsistentObfuscation(True)\
.setGroupByCol("id")
ConsistentAcrossNameParts
The ConsistentAcrossNameParts parameter controls whether name obfuscation is applied consistently across different parts of a name entity, even when they appear separately in the text.
ConsistentAcrossNameParts
(default:True
) ensures that all parts of a name entity (e.g., first name, last name) are consistently obfuscated, even when they appear separately in the text.
Behavior
- When set to
True
:- Full names and their individual components are mapped consistently.
- Example:
- “John Smith” → “Liam Brown”
- “John” → “Liam”
- “Smith” → “Brown”
Improved Name Consistency Handling in De-Identification
In this release, we have improved the way fake names are generated and assigned during the de-identification (De-ID) process. Previously, name tokens (e.g., first name and last name) were treated as a single chunk when generating a fake name. Now, each component of a name (first name, middle name, last name, etc.) is processed individually and then recombined into a full fake name. This enhancement allows for more consistent obfuscation within documents, even when only partial name mentions appear (e.g., “John Smith” vs. “Mr. Smith”). As a result, the same fake name components will be used throughout a document for the same real-world identity, improving coherence and readability while preserving privacy.
Note: This change may lead to different fake name outputs compared to previous versions, especially for historical data. If users would like to retain the previous behavior (treating the full name as a single chunk and relying on previously generated fake names from earlier versions), they can set the corresponding configuration parameter to False.
Clinical Document Analysis with One-Liner Pretrained Pipelines for Specific Clinical Tasks and Concepts
We introduce a suite of advanced, hybrid pretrained pipelines, specifically designed to streamline the clinical document analysis process. These pipelines are built upon multiple state-of-the-art (SOTA) pretrained models, delivering a comprehensive solution for quickly extracting vital information.
What sets this release apart is the elimination of complexities typically involved in building and chaining models. Users no longer need to navigate the intricacies of constructing intricate pipelines from scratch or the uncertainty of selecting the most effective model combinations. Our new pretrained pipelines simplify these processes, offering a seamless, user-friendly experience.
Model Name | Description |
---|---|
ner_deid_nameAugmented_docwise_pipeline |
This pipeline can be used to extract PHI information such as ACCOUNT , AGE , BIOID , CITY , CONTACT , COUNTRY , DATE , DEVICE , DLN , EMAIL , FAX , HEALTHPLAN , IDNUM , IP , LICENSE , LOCATION , MEDICALRECORD , NAME , PHONE , PLATE , PROFESSION , SSN , STATE , STREET , URL , VIN , ZIP entities. |
ner_deid_nameAugmented_pipeline_v3 |
This pipeline can be used to extract PHI information such as ACCOUNT , AGE , BIOID , CITY , CONTACT , COUNTRY , DATE , DEVICE , DLN , EMAIL , FAX , HEALTHPLAN , IDNUM , IP , LICENSE , LOCATION , MEDICALRECORD , NAME , ORGANIZATION , PHONE , PLATE , PROFESSION , SSN , STATE , STREET , VIN , ZIP entities. |
ner_deid_subentity_docwise_augmented_pipeline_v2 |
This pipeline can be used to extract PHI information such as LOCATION , CONTACT , PROFESSION , NAME , DATE , AGE , MEDICALRECORD , ORGANIZATION , HEALTHPLAN , DOCTOR , USERNAME , LOCATION-OTHER , URL , DEVICE , CITY , ZIP , STATE , PATIENT , COUNTRY , STREET , PHONE , HOSPITAL , EMAIL , IDNUM , BIOID , FAX , LOCATION_OTHER , DLN , SSN , ACCOUNT , PLATE , VIN , LICENSE , IP entities. |
clinical_deidentification_nameAugmented_v3 |
This pipeline can be used to extract PHI information such as ACCOUNT , AGE , BIOID , CITY , CONTACT , COUNTRY , DATE , DEVICE , DLN , EMAIL , FAX , HEALTHPLAN , IDNUM , IP , LICENSE , LOCATION , MEDICALRECORD , NAME , ORGANIZATION , PHONE , PLATE , PROFESSION , SSN , STATE , STREET , VIN , ZIP entities. |
clinical_deidentification_subentity_enriched_ar |
This pipeline can be used to extract PHI information such as MEDICALRECORD , ORGANIZATION , PROFESSION , DOCTOR , USERNAME , URL , DEVICE , CITY , DATE , ZIP , STATE , PATIENT , COUNTRY , STREET , PHONE , HOSPITAL , EMAIL , IDNUM , AGE , LOCATION , DLN , SSN , PLATE , VIN , LICENSE , IP , ACCOUNT , DOB entities. |
clinical_deidentification_nameAugmented_docwise |
This pipeline can be used to extract PHI information such as ACCOUNT , AGE , BIOID , CITY , CONTACT , COUNTRY , DATE , DEVICE , DLN , EMAIL , FAX , HEALTHPLAN , IDNUM , IP , LICENSE , LOCATION , MEDICALRECORD , NAME , PHONE , PLATE , PROFESSION , SSN , STATE , STREET , URL , VIN , ZIP entities. |
Example:
from sparknlp.pretrained import PretrainedPipeline
deid_pipeline = PretrainedPipeline("ner_deid_nameAugmented_docwise_pipeline", "en", "clinical/models")
text = """Record date : 2093-01-13, Name : Hendrickson ORA. 25 years-old, MRN #719435.
IP: 203.120.223.13, the driver's license no:A334455B. The SSN: 324598674 and e-mail: hale@gmail.com.
Patient's VIN : 1HGBH41JXMN109286. Date : 01/13/93, PCP : David Hale."""
Result:
result | begin | end | entity |
---|---|---|---|
2093-01-13 | 14 | 23 | DATE |
Hendrickson ORA | 33 | 47 | NAME |
25 | 50 | 51 | AGE |
#719435 | 68 | 74 | MEDICALRECORD |
203.120.223.13 | 81 | 94 | IP |
no:A334455B | 118 | 128 | DLN |
324598674 | 140 | 148 | SSN |
hale@gmail.com | 162 | 175 | |
1HGBH41JXMN109286 | 194 | 210 | VIN |
01/13/93 | 220 | 227 | DATE |
David Hale | 236 | 245 | NAME |
Please check the Pretrained Clinical Pipelines model for more information
Cloud Provider & LLM Benchmarks for Information Extraction Tasks
We conducted an extensive benchmark comparing Spark NLP for Healthcare against leading cloud-based NLP services and popular Large Language Models (LLMs). The evaluation covered a range of NLP tasks, including Entity Recognition (NER), Assertion Detection, Medical Terminology Mapping, and more.
The benchmark focused on assessing performance across several key clinical entity types. For each task, we measured critical metrics like accuracy, coverage, precision, recall, and F1-score, ensuring a comprehensive understanding of each system’s strengths and weaknesses.
Key Achievements:
- Superior Accuracy & Coverage: Spark NLP consistently outperformed cloud providers such as AWS Comprehend Medical, Google Cloud Healthcare API, Azure Text Analytics for Health, and large language models like Anthropic Claude - Sonnet 3.7 and OpenAI GPT-4.5.
- Assertion Detection Excellence: Spark NLP demonstrated a clear edge in detecting contextual assertions (e.g., negation, uncertainty) in medical texts.
- Medical Terminology Mapping: Our solution excelled in mapping complex medical terminology, ensuring that nuanced healthcare concepts were accurately identified.
- NER Performance: Spark NLP achieved the highest accuracy in recognizing clinical entities like diseases, procedures, and medications, surpassing other services in precision and recall.
- Clinical Relevance: The benchmark highlighted Spark NLP’s ability to extract medically relevant entities with better context understanding, making it a reliable choice for healthcare NLP needs.
These findings were presented at the NLP Summit, reinforcing Spark NLP’s position as a highly reliable and effective solution for extracting complex medical entities from clinical documents, and making it the preferred choice for healthcare organizations looking for top-tier NLP performance.
You can reproduce the results using the NER Performance Comparison Of Healthcare NLP VS Cloud Solutions notebook.
For a detailed presentation of the benchmark results, you can watch the NLP Summit session here: Benchmarks That Matter: Evaluating Medical Language Models for Real-World Applications
We prepared 9 distinct pretrained pipelines using Spark NLP for Healthcare’s customization capabilities, all of which were utilized in this benchmark to ensure optimized performance across a range of clinical entity types.
Model Name | Description |
---|---|
ner_admission_discharge_benchmark_pipeline |
This pipeline can be used to detect clinical Admission Discharge in medical text, with a focus on admission entities. |
ner_body_part_benchmark_pipeline |
This pipeline can be used to extract all types of anatomical references in medical text. It is a single-entity pipeline and generalizes all anatomical references to a single entity. |
ner_drug_benchmark_pipeline |
This pipeline can be used to extract posology information in medical text. |
ner_procedure_benchmark_pipeline |
This pipeline can be used to extract procedure mentions in medical text. |
ner_test_benchmark_pipeline |
This pipeline can be used to extract test mentions in medical text. |
ner_treatment_benchmark_pipeline |
This pipeline can be used to extract treatments mentioned in medical text. |
ner_consumption_benchmark_pipeline |
This pipeline can be used to extract Consumption (Alcohol, Smoking/Tobacco, and Substance Usage) related information in medical text. |
ner_grade_stage_severity_benchmark_pipeline |
This pipeline can be used to extract biomarker, grade stage, and severity-related information in medical text. GRADE_STAGE_SEVERITY : Mentions of pathological grading, staging, severity, and modifier of the diseases/cancers. |
ner_problem_benchmark_pipeline |
This pipeline can be used to extract problem (diseases, disorders, injuries, symptoms, signs .etc) information in medical text. |
Enabling flexible annotation transformations with AnnotationConverter
A versatile converter that transforms DataFrame annotations through custom logic. By defining conversion functions (f
), users can modify annotations to perform transformations such as:
- Assertion outputs → Chunk outputs
- LLM outputs → Document outputs
- rule-based outputs → Updated outputs
The converter integrates with PySpark NLP-style pipelines (e.g., DocumentAssembler, Tokenizer) but operates purely in Python.
Example:
test_data = spark.createDataFrame([
(1, """I like SparkNLP annotators such as MedicalBertForSequenceClassification and BertForAssertionClassification."""),
]).toDF("id", "text")
document_assembler = DocumentAssembler().setInputCol('text').setOutputCol('document')
tokenizer = Tokenizer().setInputCols('document').setOutputCol('token')
def myFunction(annotations):
new_annotations = []
pattern = r"(?<=[a-z])(?=[A-Z])"
for annotation in annotations:
text = annotation.result
import re
parts = re.split(pattern, text)
begin = annotation.begin
for part in parts:
end = begin + len(part) - 1
new_annotations.append(
Annotation(
annotatorType="token",
begin=begin,
end=end,
result=part,
metadata=annotation.metadata,
embeddings=annotation.embeddings,
)
)
begin = end + 1
return new_annotations
camel_case_tokenizer = AnnotationConverter(f=myFunction)\
.setInputCol("token")\
.setOutputCol("camel_case_token")\
.setOutputAnnotatorType("token")
pipeline = Pipeline(stages=[document_assembler, tokenizer, camel_case_tokenizer])
model = pipeline.fit(test_data)
df = model.transform(test_data)
df.selectExpr("explode(camel_case_token) as tokens").show(truncate=False)
Result:
df.selectExpr(“explode(camel_case_token) as tokens”).show(truncate=False)
tokens |
---|
{token, 0, 0, I, {sentence -> 0}, []} |
{token, 2, 5, like, {sentence -> 0}, []} |
{token, 7, 11, Spark, {sentence -> 0}, []} |
{token, 12, 14, NLP, {sentence -> 0}, []} |
{token, 16, 25, annotators, {sentence -> 0}, []} |
{token, 27, 30, such, {sentence -> 0}, []} |
{token, 32, 33, as, {sentence -> 0}, []} |
{token, 35, 41, Medical, {sentence -> 0}, []} |
{token, 42, 45, Bert, {sentence -> 0}, []} |
{token, 46, 48, For, {sentence -> 0}, []} |
{token, 49, 56, Sequence, {sentence -> 0}, []} |
{token, 57, 70, Classification, {sentence -> 0}, []} |
{token, 72, 74, and, {sentence -> 0}, []} |
{token, 76, 79, Bert, {sentence -> 0}, []} |
{token, 80, 82, For, {sentence -> 0}, []} |
{token, 83, 91, Assertion, {sentence -> 0}, []} |
{token, 92, 105, Classification, {sentence -> 0}, []} |
{token, 106, 106, ., {sentence -> 0}, []} |
Introducing 2 New Contextual Assertion Models to Detect Assertion Status Given a Larger Context and Scenarious
These models identify contextual cues within text data, such as negation, uncertainty, etc. They are used for clinical assertion detection and annotate text chunks with assertions.
Model Name | Description |
---|---|
contextual_assertion_conditional |
This model identifies contextual cues within text data to detect conditional assertions. |
contextual_assertion_possible |
This model identifies contextual cues within text data to detect possible assertions. |
Example:
clinical_ner = MedicalNerModel \
.pretrained("ner_clinical", "en", "clinical/models") \
.setInputCols(["sentence", "token", "embeddings"]) \
.setOutputCol("ner")
ner_converter = NerConverterInternal() \
.setInputCols(["sentence", "token", "ner"]) \
.setOutputCol("ner_chunk")
contextual_assertion_conditional = ContextualAssertion.pretrained("contextual_assertion_conditional","en","clinical/models")\
.setInputCols("sentence", "token", "ner_chunk") \
.setOutputCol("assertion_conditional")
contextual_assertion_possible = ContextualAssertion.pretrained("contextual_assertion_possible","en","clinical/models")\
.setInputCols("sentence", "token", "ner_chunk") \
.setOutputCol("assertion_possible")
assertion_merger = AssertionMerger() \
.setInputCols("assertion_conditional", "assertion_possible") \
.setOutputCol("assertion_merger")
text = """The patient presents with symptoms suggestive of pneumonia, including fever, productive cough, and mild dyspnea.
Chest X-ray findings are compatible with a possible early-stage infection, though bacterial pneumonia cannot be entirely excluded.
The patient reports intermittent chest pain when engaging in physical activity, particularly on exertion. Symptoms appear to be contingent upon increased stress levels and heavy meals."""
Result:
chunk | begin | end | ner_label | assertion |
---|---|---|---|---|
symptoms | 26 | 33 | PROBLEM | possible |
pneumonia | 49 | 57 | PROBLEM | possible |
fever | 70 | 74 | PROBLEM | possible |
productive cough | 77 | 92 | PROBLEM | possible |
mild dyspnea | 99 | 110 | PROBLEM | conditional |
Chest X-ray | 113 | 123 | TEST | conditional |
Please check the Contextual Assertion model for more information
New German Medical Named Entity Recognition Model and Medical Entity Resolver Model
NER Model
This model extracts more than 40 oncology-related entities, including therapies, tests and staging in German.
Example:
ner_model = MedicalNerModel.pretrained("ner_oncology_wip", "de", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
text = """Patientin: 67 Jahre, weiblich
Diagnose: Invasives duktales Mammakarzinom links (G3)
Befunde:
Schmerzloser, tastbarer Knoten im oberen äußeren Quadranten links
Hautrötung, eingezogene Mamille, tastbare axilläre Lymphknoten
Histologie:
ER + , PR + , HER2-negativ, Ki-67 bei 35 %
Bildgebung:
Mammographie: Tumor 2,8 cm, Mikroverkalkungen
MRT: Tumorgröße 3,1 cm mit Drüseninfiltration
PET-CT: Keine Fernmetastasen
Therapie:
Neoadjuvante Chemotherapie
Brusterhaltende Operation mit Sentinel-Lymphknotenbiopsie
Adjuvante Strahlentherapie
Hormontherapie mit Letrozol (5 Jahre)
Beurteilung:
Fortgeschrittenes lokal begrenztes Mammakarzinom, günstiger Hormonrezeptorstatus. Therapie gemäß Tumorkonferenz empfohlen."""
Result:
chunk | begin | end | ner_label |
---|---|---|---|
67 Jahre | 11 | 18 | Age |
weiblich | 21 | 28 | Gender |
Invasives | 41 | 49 | Histological_Type |
duktales | 51 | 58 | Histological_Type |
Mammakarzinom links | 60 | 78 | Cancer_Dx |
Knoten | 120 | 125 | Tumor_Finding |
oberen äußeren Quadranten links | 130 | 160 | Direction |
Mamille | 186 | 192 | Site_Other_Body_Part |
axilläre Lymphknoten | 204 | 223 | Site_Lymph_Node |
Histologie | 225 | 234 | Pathology_Test |
ER | 238 | 239 | Biomarker |
+ | 241 | 241 | Biomarker_Result |
PR | 245 | 246 | Biomarker |
+ | 248 | 248 | Biomarker_Result |
HER2-negativ | 252 | 263 | Biomarker |
Ki-67 | 266 | 270 | Biomarker |
35 % | 276 | 279 | Biomarker_Result |
Bildgebung | 281 | 290 | Imaging_Test |
Mammographie | 294 | 305 | Imaging_Test |
Tumor | 308 | 312 | Tumor_Finding |
2,8 cm | 314 | 319 | Tumor_Size |
MRT | 340 | 342 | Imaging_Test |
Tumorgröße | 345 | 354 | Tumor_Finding |
3,1 cm | 356 | 361 | Tumor_Size |
Drüseninfiltration | 367 | 384 | Histological_Type |
PET-CT | 386 | 391 | Imaging_Test |
Neoadjuvante Chemotherapie | 426 | 451 | Chemotherapy |
Brusterhaltende Operation | 453 | 477 | Cancer_Surgery |
Sentinel-Lymphknotenbiopsie | 483 | 509 | Pathology_Test |
Adjuvante Strahlentherapie | 511 | 536 | Radiotherapy |
Hormontherapie | 538 | 551 | Hormonal_Therapy |
Letrozol | 557 | 564 | Hormonal_Therapy |
Fortgeschrittenes lokal | 589 | 611 | Staging |
Mammakarzinom | 624 | 636 | Cancer_Dx |
Resolver Model
This model maps German medical entities and concepts to ICD-10-GM codes using the sent_xlm_roberta_biolord_2023_m Sentence Embeddings. It also returns the official resolution text within the brackets inside the metadata.
Example:
icd_resolver = SentenceEntityResolverModel.pretrained("biolordresolve_icd10gm_augmented", "de", "clinical/models") \
.setInputCols(["biolord_embeddings"]) \
.setOutputCol("icd_code")\
.setDistanceFunction("EUCLIDEAN")
text = """Patientin: 67 Jahre, weiblich
Diagnose: Invasives duktales Mammakarzinom links (G3)
Histologie:
ER + , PR + , HER2-negativ, Ki-67 bei 35 %
Bildgebung:
Mammographie: Tumor 2,8 cm, Mikroverkalkungen
PET-CT: Keine Fernmetastasen
Therapie:
Neoadjuvante Chemotherapie
Brusterhaltende Operation mit Sentinel-Lymphknotenbiopsie
Adjuvante Strahlentherapie
Hormontherapie mit Letrozol (5 Jahre)
Beurteilung:
Fortgeschrittenes lokal begrenztes Mammakarzinom, günstiger Hormonrezeptorstatus. Therapie gemäß Tumorkonferenz empfohlen."""
Result:
chunk | begin | end | ner_label | icd | resolution | all_resolution | all_result |
---|---|---|---|---|---|---|---|
Mammakarzinom links | 60 | 78 | Cancer_Dx | C50 | Brustdrüsenkarzinom [Bösartige Neubildung der Brustdrüse [Mamma]] | Brustdrüsenkarzinom [Bösartige Neubildung der Brustdrüse [Mamma]]:::Krebs in der Brustdrüse, nicht näher klassifiziert [Brustdrüse, nicht näher bezeichnet]:::Mammakarzinom [Brustdrüse [Mamma]]:::Neoplasma malignum: UQ der Brustdrüse [Unterer äußerer Quadrant der Brustdrüse]:::Bösartige Mammaerkrankung [Bösartige Neubildung der Brustdrüse [Mamma] in der Familienanamnese]:::Bösartige Tumoren der Brustregion [Thorax] | C50:::C50.9:::D48.6:::C50.5:::Z80.3:::C76.1 |
Tumor | 169 | 173 | Tumor_Finding | D36.9 | Neubildung ohne spezifische Lokalisation [Gutartige Neubildung an nicht näher bezeichneter Lokalisation] | Neubildung ohne spezifische Lokalisation [Gutartige Neubildung an nicht näher bezeichneter Lokalisation]:::Neubildung ohne spezifische Angaben zu Lokalisationen [Neubildung unsicheren oder unbekannten Verhaltens an sonstigen und nicht näher bezeichneten Lokalisationen]:::Neubildung unspezifischen Verhaltens, ohne weitere Angaben [Neubildung unsicheren oder unbekannten Verhaltens, nicht näher bezeichnet]:::Neubildung… | D36.9:::D48:::D48.9:::D48.7:::D36.7:::C80:::C80.9:::D36 |
Adjuvante Strahlentherapie | 326 | 351 | Radiotherapy | Z51.0 | Radiotherapie-Sitzung [Strahlentherapie-Sitzung] | Radiotherapie-Sitzung [Strahlentherapie-Sitzung]:::Nachsorge nach Radiotherapie [Nachuntersuchung nach Strahlentherapie wegen anderer Krankheitszustände]:::Nachsorge nach Strahlentherapie bei malignen Tumoren [Nachuntersuchung nach Strahlentherapie wegen bösartiger Neubildung]:::Rehabilitation nach Strahlentherapie [Rekonvaleszenz nach Strahlentherapie]:::Strahlen- und Chemotherapie bei bösartigen Tumoren… | Z51.0:::Z09.1:::Z08.1:::Z54.1:::Z51.82 |
Mammakarzinom | 439 | 451 | Cancer_Dx | D48.6 | Mammakarzinom [Brustdrüse [Mamma]] | Mammakarzinom [Brustdrüse [Mamma]]:::Mamma-Karzinom [Bösartige Neubildung der Brustdrüse [Mamma]]:::Bösartige Mammaerkrankung [Bösartige Neubildung der Brustdrüse [Mamma] in der Familienanamnese]:::Krebs in der Brustdrüse, nicht näher klassifiziert [Brustdrüse, nicht näher bezeichnet]:::Bösartige Tumoren der Brustregion [Thorax]:::Neoplasma malignum: UQ der Brustdrüse [Unterer äußerer Quadrant der Brustdrüse] | D48.6:::C50:::Z80.3:::C50.9:::C76.1:::C50.5 |
Please check the German Healthcare Models model for more information
Enabling Exception Handling within the Pipelines for Specified or All Eligible Stages
This utility module allows automated exception handling in pipelines by enabling setDoExceptionHandling
on specific stages or all stages that support it. Users can choose stages manually or apply it globally, with an option to save the updated pipeline to a custom path.
Example:
from sparknlp_jsl.utils import apply_exception_handling
from sparknlp.pretrained import PretrainedPipeline
oncology_pipeline = PretrainedPipeline("oncology_biomarker_pipeline", "en", "clinical/models")
handled_pipeline = apply_exception_handling(oncology_pipeline)
Result:
📢 Summary Report:
✅ Total modified stages: 15
- Stage 4: MedicalNerModel
- Stage 5: NerConverterInternalModel
- Stage 6: MedicalNerModel
- Stage 7: NerConverterInternalModel
- Stage 8: MedicalNerModel
- Stage 9: NerConverterInternalModel
- Stage 10: MedicalNerModel
- Stage 11: NerConverterInternalModel
- Stage 13: ChunkMergeModel
- Stage 14: ChunkMergeModel
- Stage 15: AssertionDLModel
- Stage 16: ChunkFilterer
- Stage 17: AssertionDLModel
- Stage 21: RelationExtractionModel
- Stage 22: RelationExtractionModel
⚠️ Total skipped stages: 9
- Stage 0: DocumentAssembler (No exception handling support)
- Stage 1: SentenceDetectorDLModel (No exception handling support)
- Stage 2: TokenizerModel (No exception handling support)
- Stage 3: WordEmbeddingsModel (No exception handling support)
- Stage 12: TextMatcherInternalModel (No exception handling support)
- Stage 18: AssertionMerger (No exception handling support)
- Stage 19: PerceptronModel (No exception handling support)
- Stage 20: DependencyParserModel (No exception handling support)
- Stage 23: AnnotationMerger (No exception handling support)
Interactive Chat Support for Healthcare NLP Documentation
We’re introducing a conversational assistant powered by Healthcare NLP documentation and repositories. This new tool enables users to easily explore which components to use when solving specific problems, understand the functionalities of different modules, and get explanations for code found across our repositories — all through a simple chat interface.
Just type your question into the chatbox and get instant guidance on how to leverage Healthcare NLP more effectively. Whether you’re debugging, exploring features, or looking for best practices, this assistant helps you find the right answers faster.
You can check the chat support page for spark-nlp-workshop repository.
New Peer-Reviewed Papers and Blog Posts On Various Topics
- Peer-Reviewed Papers:
- Can Zero-Shot Commercial API’s Deliver Regulatory-Grade Clinical Text De-Identification? We evaluate the performance of four leading solutions for de-identification of unstructured medical text - Azure Health Data Services, AWS Comprehend Medical, OpenAI GPT-4o, and John Snow Labs - on a ground truth dataset of 48 clinical documents annotated by medical experts. The analysis, conducted at both entity-level and token-level, suggests that John Snow Labs’ Medical Language Models solution achieves the highest accuracy, with a 96% F1-score in protected health information (PHI) detection, outperforming Azure (91%), AWS (83%), and GPT-4o (79%). John Snow Labs is not only the only solution which achieves regulatory-grade accuracy (surpassing that of human experts) but is also the most cost-effective solution: It is over 80% cheaper compared to Azure and GPT-4o, and is the only solution not priced by token. Its fixed-cost local deployment model avoids the escalating per-request fees of cloud-based services, making it a scalable and economical choice.
- Beyond Negation Detection: Comprehensive Assertion Detection Models for Clinical NLP Assertion status detection is a critical yet often overlooked component of clinical NLP, essential for accurately attributing extracted medical facts. Past studies narrowly focused on negation detection, resulting in underperforming commercial solutions such as AWS Medical Comprehend, Azure AI Text Analytics, and GPT-4o due to their limited domain adaptation. To address this gap, we developed state-of-the-art assertion detection models, including fine-tuned LLMs, transformer-based classifiers, few-shot classifiers, and deep learning (DL) approaches and evaluated our models against cloud-based commercial API solutions and legacy rule-based NegEx approach as well as GPT-4o. Our fine-tuned LLM achieves the highest overall accuracy (0.962), outperforming GPT-4o (0.901) and commercial APIs by a notable margin, particularly excelling in Present (+4.2%), Absent (+8.4%), and Hypothetical (+23.4%) assertions. Our DL-based models surpass commercial solutions in Conditional (+5.3%) and Associated with Someone Else (+10.1%), while few-shot classifier offers a lightweight yet highly competitive alternative (0.929), making it ideal for resource-constrained environments. Integrated within Spark NLP, our models consistently outperform black-box commercial solutions while enabling scalable inference and seamless integration with medical NER, Relation Extraction, and Terminology Resolution. These results reinforce the importance of domain-adapted, transparent, and customizable clinical NLP solutions over generalpurpose LLMs and proprietary APIs.
- Blog Posts:
- Contextual Entity Ruler: Enhance Named Entity Recognition with Contextual Rules This blog post mention that Contextual Entity Ruler in Spark NLP refines entity recognition by applying context-aware rules to detected entities. It updates entities using customizable patterns, regex, and scope windows. It boosts accuracy by reducing false positives and adapting to niche contexts. Key features include token/character-based windows, prefix/suffix rules, and modes. Integrate it post-NER to enhance precision without retraining models
- Pretrained Zero-Shot Models for Clinical Entity Recognition: A Game Changer in Healthcare NLP This blog post discover how Pretrained Zero-Shot Named Entity Recognition (NER) models simplify the recognition of entities in multiple domains with minimal configuration. Recent advancements in pretrained zero-shot NER models and their ease in being applied across different datasets and applications are described in the article.
- Task-Based Clinical NLP: Unlocking Insights with One-Liner Pipelines This blog post explores Healthcare NLP’s Task-Based Clinical Pretrained Pipelines, showcasing how they streamline clinical text analysis with just one-liner codes. By demonstrating the explain_clinical_doc_granular pipeline in a real-world scenario, we illustrate its capabilities in Named Entity Recognition (NER), Assertion Status, and Relation Extraction. These pipelines provide an efficient way to extract medical insights from unstructured clinical text, offering valuable tools for healthcare professionals and researchers.
- How Good Are Open-Source LLM-Based De-identification Tools in a Medical Context? It’s often assumed that all PII and PHI entities are the same, meaning de-identification tools should work equally well across all unstructured text. While this seems logical in theory, each domain-specific dataset has distinct characteristics, requiring customized approaches for effective PII detection. In this blog post, we highlight how LLM-based de-identification models, such as GLiNER PII and OpenPipe’s PII-Redact, excel in general text — achieving macro-average F1 scores of 0.62 for GLiNER and 0.98 for OpenPipe — but face significant performance drops on clinical datasets, where their scores fall to 0.41 and 0.42 respectively.
- Beyond Negation Detection: Comprehensive Assertion Detection Models for Clinical NLP Assertion status detection is critical in clinical NLP but often overlooked, leading to underperformance in commercial solutions like AWS Medical Comprehend, Azure AI Text Analytics, and GPT-4o. We developed advanced assertion detection models, including fine-tuned LLMs, transformers, few-shot classifiers, deep learning (DL) and rule -based approaches. Our fine-tuned LLM achieves 0.962 accuracy, outperforming GPT-4o (0.901) and commercial APIs, with notable improvements in Present (+4.2%), Absent (+8.4%), and Hypothetical (+23.4%) assertions. Our DL models also excel in Conditional (+5.3%) and Associated with Someone Else (+10.1%) categories. The few-shot classifier (0.929 accuracy) provides a lightweight alternative for resource-limited environments. Integrated with Spark NLP, our models offer scalable, transparent, and domain-adapted solutions that surpass black-box commercial APIs in medical NLP tasks.
Various Core Improvements: Bug Fixes, Enhanced Overall Robustness, and Reliability of Spark NLP for Healthcare
- Resolved log file creation on Databricks for training
- Added
consistentAcrossNameParts
param toLightDeIdentification
andStructuredDeIdentification
Updated Notebooks And Demonstrations For Making Spark NLP For Healthcare Easier To Navigate And Understand
- New Spanish Healthcare Models Notebook
- New PRR ROR EBGM for Signal Processing of Drug Events Notebook
- New BertForAssertionClassification Notebook
- New NER Performance Comparison Of Healthcare NLP VS Cloud Solutions Notebook
- New Fhir DeIdentification Notebook
- New Human Phenotype Extraction And HPO Code Mapping Notebook
- Updated German Clinical Deidentification Notebook
- Updated German Healthcare Models Notebook
- Updated Deidentification Custom Pretrained Pipelines Notebook
- Updated Clinical_DeIdentification Notebook
- Updated Analyse Veterinary Documents with Healthcare NLP Notebook
- Updated Loading Medical and Open Souce LLMs Notebook
We Have Added And Updated A Substantial Number Of New Clinical Models And Pipelines, Further Solidifying Our Offering In The Healthcare Domain.
jsl_meds_text2soap_v1
assertion_bert_classification_radiology
assertion_bert_classification_jsl
assertion_bert_classification_clinical
ner_deid_nameAugmented_docwise_pipeline
ner_deid_nameAugmented_pipeline_v3
ner_deid_subentity_docwise_augmented_pipeline_v2
clinical_deidentification_docwise_wip
clinical_deidentification_nameAugmented_v3
clinical_deidentification_subentity_enriched_ar
clinical_deidentification_nameAugmented_docwise
clinical_deidentification_docwise_benchmark
ner_admission_discharge_benchmark_pipeline
ner_body_part_benchmark_pipeline
ner_drug_benchmark_pipeline
ner_procedure_benchmark_pipeline
ner_test_benchmark_pipeline
ner_treatment_benchmark_pipeline
ner_consumption_benchmark_pipeline
ner_grade_stage_severity_benchmark_pipeline
ner_problem_benchmark_pipeline
contextual_assertion_conditional
contextual_assertion_possible
ner_oncology_wip
-> debiolordresolve_icd10gm_augmented
-> dehpo_matcher
hpo_mapper
stopwords_removal_hpo
hpo_mapper_pipeline
For all Spark NLP for Healthcare models, please check Models Hub Page
Previous versions
- 6.0.0
- 5.5.3
- 5.5.2
- 5.5.1
- 5.5.0
- 5.4.1
- 5.4.0
- 5.3.3
- 5.3.2
- 5.3.1
- 5.3.0
- 5.2.1
- 5.2.0
- 5.1.4
- 5.1.3
- 5.1.2
- 5.1.1
- 5.1.0
- 5.0.2
- 5.0.1
- 5.0.0
- 4.4.4
- 4.4.3
- 4.4.2
- 4.4.1
- 4.4.0
- 4.3.2
- 4.3.1
- 4.3.0
- 4.2.8
- 4.2.4
- 4.2.3
- 4.2.2
- 4.2.1
- 4.2.0
- 4.1.0
- 4.0.2
- 4.0.0
- 3.5.3
- 3.5.2
- 3.5.1
- 3.5.0
- 3.4.2
- 3.4.1
- 3.4.0
- 3.3.4
- 3.3.2
- 3.3.1
- 3.3.0
- 3.2.3
- 3.2.2
- 3.2.1
- 3.2.0
- 3.1.3
- 3.1.2
- 3.1.1
- 3.1.0
- 3.0.3
- 3.0.2
- 3.0.1
- 3.0.0
- 2.7.6
- 2.7.5
- 2.7.4
- 2.7.3
- 2.7.2
- 2.7.1
- 2.7.0
- 2.6.2
- 2.6.0
- 2.5.5
- 2.5.3
- 2.5.2
- 2.5.0
- 2.4.6
- 2.4.5
- 2.4.2
- 2.4.1
- 2.4.0