6.3.0
Highlights
We are delighted to announce remarkable enhancements and updates in our latest major release of Healthcare NLP. This release expands broader cloud environment support with Scala 2.13 and Java 17 compatibility, extends support for the latest LLM families, and delivers 9 new Medical Vision LLMs that advance clinical AI with multimodal text and image understanding, alongside 67 new one-liner pretrained pipelines and models for end-to-end clinical document understanding. The release also introduces scalable and flexible HL7 CDA (XML) de-identification and a new De-identification MCP Server (Preview) for seamless integration with external AI agents and IDE clients. Comprehensive benchmark evaluations are included, featuring large-scale clinical de-identification speed–accuracy benchmarks across infrastructures and comparative Medical Vision LLM benchmark results against leading foundation models. In addition, this version delivers memory-optimized MedicalNer training for large-scale datasets, enhanced entity resolution with new UMLS CUI and LOINC models, expanded rule-based and neural NER coverage, core robustness improvements across the pipeline ecosystem, newly created notebooks, and published technical blog posts and deep dives to further strengthen Healthcare NLP.
- Scala 2.13 & Java 17 Compatibility with Broader Cloud Environment Support
- 9 Medical Vision LLMs Extending Clinical AI to Multimodal Text and Image Understanding
- Scalable and Flexible CDA (HL7) De-identification Support for XML Clinical Documents
- De-identification MCP Server (Preview) for Clinical Text De-identification via External AI Agents and IDE Clients
- Upgraded Llama.cpp Backend with Expanded Compatibility for the Latest LLM Families
- Memory-Optimized MedicalNer Training for Large-Scale Datasets
- Clinical De-Identification at Scale: Pipeline Design and Speed–Accuracy Benchmarks Across Infrastructures
- Comparative Medical Vision LLM Benchmark Results Against Leading Foundation Models
- Advanced clinical one-liner pretrained pipelines for PHI detection and extraction
- Enhanced 8 new entity resolver models for mapping clinical terms to standardized UMLS CUI codes and LOINC
- Introduced 17 new rule-based entity extraction models for identifying structured codes and patterns in clinical texts
- Introduced 7 new NER models for PHI deidentification and drug entity extraction
- Enhanced 8 new contextual assertion models to classify clinical entities by assertion status
- Added new ONNX-based multilingual clinical NER models for Italian, Spanish, and Romanian, covering disease, procedure, medication, and symptom entity extraction
- New Blog Posts & Technical Deep Dives
- Various core improvements, bug fixes, enhanced overall robustness and reliability of Healthcare NLP
- Improved PipelineTracer coverage by adding support for PretrainedZeroShotNERChunker, ContextualEntityRuler, ContextualEntityFilterer
- Added replaceDict to AssertionMerger to allow custom replacement of assertion labels.
- PipelineOutputParser improvements to support mappings output in clinical_deidentification pipelines
- Updated notebooks and demonstrations for making Healthcare NLP easier to navigate and understand
- New CDA DeIdentification Notebook
- The addition and update of numerous new clinical models and pipelines continue to reinforce our offering in the healthcare domain
These enhancements will elevate your experience with Healthcare NLP, enabling more efficient, accurate, and streamlined analysis of healthcare-related natural language data.
Scala 2.13 & Java 17 Compatibility with Broader Cloud Environment Support
With Healthcare NLP 6.3.0, we introduce official support for Scala 2.13 while continuing to support Scala 2.12 through a dual-JAR distribution strategy. Scala 2.13 JAR is built and tested against Java 17, enabling smoother adoption of modern JVM runtimes and reducing friction when deploying on managed cloud Spark platforms.
Databricks
- Recommended runtimes: Databricks Runtime 16.4 LTS (Scala 2.13 image / Spark 3.5.2).
Databricks provides both Scala 2.12 and Scala 2.13 images for this runtime, allowing customers to validate and migrate at their own pace. - Fully compatible with Databricks Runtime 16.x (Spark 3.5.x) environments running either Scala 2.12 or Scala 2.13.
Google Cloud (Dataproc)
- Cluster-based Dataproc: Compatible with Dataproc 2.x image families (including 2.2 and 2.3) that provide Scala 2.13 and Java 17.
- Dataproc Serverless for Spark: Supported on 2.2 and newer runtimes, which include Scala 2.13 and Java 17 in their execution environment.
These updates significantly broaden support across cloud environments, enabling Healthcare NLP to run on a wider range of instance types and managed runtime configurations on both AWS and GCP.
Practical notes
- Use the Scala 2.13 JAR on environments that provide Scala 2.13 (e.g. Databricks Scala 2.13 images, Dataproc 2.x runtimes).
- Use the Scala 2.12 JAR on environments that are still based on Scala 2.12.
- Spark 4 is not supported in 6.3.0 and will be addressed in a future release.
9 Medical Vision LLMs Extending Clinical AI to Multimodal Text and Image Understanding
In this release, we are expanding our Medical Vision LLM (VLM) family with additional models specifically finetuned for medical tasks. These models extend large language model capabilities with integrated visual language understanding, enabling multimodal clinical analysis by combining textual and image inputs.
The new VLMs provide strong performance for tasks such as diagnostic image interpretation, image-to-text summarization, and integrated documentation analysis — continuing our mission to advance clinical AI with robust, domain-specific multimodal solutions.
| Model Name | Quantization Options |
|---|---|
| jsl_meds_vlm_7b_v1 | q4, q8, q16 |
| jsl_meds_vlm_4b_v1 | q4, q8, q16 |
| jsl_meds_vlm_reasoning_8b_v1 | q4, q8, q16 |
Example: jsl_meds_vlm_reasoning_8b_q4_v1

prompt = """
Extract madication and demographic patient information from the document and return strictly as JSON:
{
"patient": {"name": string, "age": string, "sex": string, "hospital_no": string, "episode_no": string, "episode_date": string},
"diagnoses": [string],
"symptoms": [string],
"treatment": [{"med": string, "dose": string, "freq": string}]
}
"""
input_df = vision_llm_preprocessor(
spark=spark,
images_path="images",
prompt=prompt,
output_col_name="prompt"
)
document_assembler = DocumentAssembler() \
.setInputCol("prompt") \
.setOutputCol("caption_document")
image_assembler = ImageAssembler() \
.setInputCol("image") \
.setOutputCol("image_assembler")
medicalVisionLLM = (
MedicalVisionLLM.pretrained("jsl_meds_vlm_reasoning_8b_q4_v1", "en", "clinical/models")
.setInputCols(["caption_document", "image_assembler"])
.setOutputCol("completions")
.setNCtx(8*4096)
.setNPredict(-1)
.setTemperature(0.1)
)
pipeline = Pipeline().setStages([
document_assembler,
image_assembler,
medicalVisionLLM
])
model = pipeline.fit(input_df)
result = model.transform(input_df)
Result:
{
"patient": {
"name": "Ms RUKHSANA SHAHEEN",
"age": "56 yrs",
"sex": "Female",
"hospital_no": "MH005990453",
"episode_no": "030000528270",
"episode_date": "02/07/2021 08:31AM"
},
"diagnoses": [
"systemic lupus erythematosus",
"scleroderma overlap",
"interstitial lung disease"
],
"symptoms": [
"tightness of skin of the fists",
"ulcers on the pulp of the fingers"
],
"treatment": [
{
"med": "Linezolid",
"dose": "600 mg",
"freq": "twice a day for 5 Days"
},
{
"med": "Clopidogrel",
"dose": "75 mg",
"freq": "once a day after meals"
},
{
"med": "Amlodipine",
"dose": "5 mg",
"freq": "once a day"
},
{
"med": "Domperidone",
"dose": "10 mg",
"freq": "twice a day before meals"
},
{
"med": "Omeprazole",
"dose": "20 Mg",
"freq": "Twice a Day before Meal"
},
{
"med": "Bosentan",
"dose": "62.5 mg",
"freq": "twice a day after meals"
},
{
"med": "Sildenafil Citrate",
"dose": "0.5 mg",
"freq": "twice a day after meals"
},
{
"med": "Prednisolone",
"dose": "5 mg",
"freq": "once a day after breakfast"
},
{
"med": "Mycophenolate mofetil",
"dose": "500 mg 2 tablets",
"freq": "twice a day"
},
{
"med": "L-methylfolate calcium",
"dose": "400 µg 1 tablet",
"freq": "once a day"
},
{
"med": "ciprofloxacin",
"dose": "250 mg",
"freq": "twice a day"
}
]
}
Overall, this release strengthens our Medical Vision LLM portfolio by enabling robust, production-ready multimodal analysis for real-world clinical documents and images.
Scalable and Flexible CDA (HL7) De-identification Support for XML Clinical Documents
We’re introducing native de-identification support for HL7 CDA (Clinical Document Architecture) reports in XML format within Healthcare NLP. This module enables secure, large-scale processing of CDA R2 documents by combining structure-aware XML de-identification with NLP-based free-text anonymization, all delivered through a single, scalable Spark transformer.
Unlike flat text approaches, CDA de-identification operates directly on the XML document structure, allowing precise targeting of sensitive fields using XPath-like path expressions, while preserving the integrity and schema of the original clinical document.
Key Capabilities
-
Path-based CDA de-identification Target and de-identify specific CDA elements using dot (
.) or slash (/) notation, enabling fine-grained control over clinical fields such as patient demographics, authors, organizations, and identifiers. -
Attribute-level access Seamlessly de-identify XML attributes (e.g. telecom values and identifiers) using intuitive attribute notation, without requiring manual XML parsing or preprocessing.
-
Namespace-aware processing Automatically handles the HL7 v3 namespace, reducing configuration overhead and simplifying integration with existing CDA workflows.
-
Structured field obfuscation Deterministically or randomly obfuscate structured fields such as names, dates, addresses, phone numbers, and identifiers while preserving XML validity and document structure.
-
NLP-powered free-text de-identification Apply Spark NLP de-identification pipelines to narrative sections (e.g. clinical notes and section text) embedded within CDA documents, ensuring comprehensive PHI coverage across both structured and unstructured content.
-
Unified Spark execution model Fully compatible with Spark pipelines and batch processing workflows, enabling de-identification at scale without separating structured XML processing from free-text NLP pipelines.
Designed for Real-World CDA Workflows
The CDA De-identification module allows organizations to process heterogeneous CDA documents—where structured XML fields and free-text clinical narratives coexist—using a single, consistent configuration model. By combining XPath-style rules with reusable NLP pipelines, teams can adapt de-identification behavior across document types without rewriting logic or maintaining parallel processing systems.
This approach delivers enterprise-grade performance, flexibility, and compliance, making it ideal for large healthcare data platforms, analytics pipelines, and downstream AI workloads that rely on CDA-formatted clinical data.
Example:
import sparknlp_jsl
from sparknlp_jsl.annotator import *
from sparknlp.pretrained import PretrainedPipeline
rules = {
"recordTarget.patientRole.patient.name.given": "first_name",
"recordTarget.patientRole.patient.name.family": "last_name",
"recordTarget.patientRole.addr.streetAddressLine": "Address",
"recordTarget.patientRole.telecom.value": "Phone",
"author.assignedAuthor.assignedPerson.name.given": "first_name",
"author.assignedAuthor.assignedPerson.name.family": "last_name",
}
deid_pipeline = PretrainedPipeline("clinical_deidentification_docwise_benchmark_optimized",
"en",
"clinical/models")
cda_deidentification = (
CdaDeIdentification()
.setInputCol("text")
.setOutputCol("deid")
.setMode("obfuscate")
.setMappingRules(rules)
.setFreeTextPaths([
"component.structuredBody.component.section.text"
])
.setPipeline(spark, deid_pipeline, "obfuscated")
.setDays(20)
.setSeed(88)
)
result = cda_deidentification.deidentify(cda_document)
Original CDA:
<?xml version="1.0" encoding="UTF-8"?>
<ClinicalDocument xmlns="urn:hl7-org:v3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:hl7-org:v3 CDA.xsd">
<!-- ===================== AUTHOR ===================== -->
<author>
<time value="20240101113000"/>
<assignedAuthor>
<id root="2.16.840.1.113883.4.6" extension="111223333"/>
<assignedPerson>
<name>
<given>Emily</given>
<family>Clark</family>
</name>
</assignedPerson>
<representedOrganization>
<id root="2.16.840.1.113883.19.5" extension="ORG001"/>
<name>Good Health Clinic</name>
</representedOrganization>
</assignedAuthor>
</author>
<!-- ===================== BODY ===================== -->
<component>
<structuredBody>
<component>
<section>
<code code="10164-2" codeSystem="2.16.840.1.113883.6.1" displayName="History of Present Illness"/>
<title>History of Present Illness</title>
<text>
Patient John Doe presented with chest pain and shortness of breath.
He lives at 456 Main Street, Istanbul. Contact number is +905551112233.
</text>
</section>
</component>
</structuredBody>
</component>
</ClinicalDocument>
De-identified CDA:
<?xml version="1.0" encoding="UTF-8" standalone="no"?><ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:hl7-org:v3 CDA.xsd">
<!-- ===================== AUTHOR ===================== -->
<author>
<time value="20240101113000"/>
<assignedAuthor>
<id extension="111223333" root="2.16.840.1.113883.4.6"/>
<assignedPerson>
<name>
<given>Mayford</given>
<family>Rock</family>
</name>
</assignedPerson>
<representedOrganization>
<id extension="ORG001" root="2.16.840.1.113883.19.5"/>
<name>Good Health Clinic</name>
</representedOrganization>
</assignedAuthor>
</author>
<!-- ===================== BODY ===================== -->
<component>
<structuredBody>
<component>
<section>
<code code="10164-2" codeSystem="2.16.840.1.113883.6.1" displayName="History of Present Illness"/>
<title>History of Present Illness</title>
<text>Patient Valerie Ates presented with chest pain and shortness of breath.
He lives at Pr-2 Ponce By Pass, 13218 Brook Lane Drive. Contact number is +127773334455.
</text>
</section>
</component>
</structuredBody>
</component>
</ClinicalDocument>
De-identification MCP Server (Preview) for Clinical Text De-identification via External AI Agents and IDE Clients
We are introducing a new Model Context Protocol (MCP) server that enables Spark NLP Healthcare clinical de-identification to be accessed directly from external AI agents and IDE clients. This preview release exposes de-identification pipelines through a standardized MCP interface, enabling seamless integration with modern AI tooling and developer workflows.
Key Features The following capabilities are included in the preview release:
- Enables clinical text de-identification via external AI agents and IDE clients using MCP
- Provides a standardized MCP interface for invoking Spark NLP Healthcare de-identification pipelines
- Supports configurable output modes:
- Masked
- Obfuscated
- Masked and Obfuscated
- Delivered as a containerized, standalone service suitable for both local and remote deployments
- Designed for consistent integration across MCP-compatible clients
Reference Implementation
A reference implementation is available in the Spark NLP Workshop repository:
- Repository: https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/agents/mcp_servers/deidentification
- Access the relevant blog here: https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/agents/mcp_servers/deidentification/blog/deid-blog-mcp-server.md
Usage Overview
- Register the MCP Server in Your Client
In an MCP-compatible client (for example, Cursor, VS Code Copilot / Agent, or Claude CLI), register a new MCP server with the following configuration:
- URL:
http://localhost:8001/mcp - Transport:
streamable-http
This registration pattern is consistent across MCP clients and requires only the server URL and transport type.
- Send a Tool Request
Once connected, the client automatically discovers the tools exposed by the MCP server.
Example request:
Use the
deidentify_texttool to de-identify this clinical note with the output mode set tomasked(orobfuscatedorboth) and return the transformed text.
Upgraded Llama.cpp Backend with Expanded Compatibility for the Latest LLM Families
With Healthcare NLP 6.3.0, we deliver a major upgrade to the LLM backend based on a newer llama.cpp version, improving performance, stability, and compatibility for local and distributed LLM inference. This update enables smoother integration of the latest llama.cpp–compatible models into ingestion and enrichment pipelines, while strengthening support for large-scale, on-prem and cloud-based deployments.
🚀 Key Improvements
-
Upgraded LLM Backend (llama.cpp)
The llama.cpp backend has been modernized to benefit from upstream performance optimizations, improved memory handling, and enhanced stability. This results in more efficient and reliable local LLM inference, especially in distributed Spark environments. -
Expanded Model Compatibility
Healthcare NLP now supports newer LLM families such as Qwen3, allowing customers to adopt the latest generation models with confidence. These models can be seamlessly loaded and used through MedicalLLM, MedicalVisualLLM, and LLMLoader, enabling both text and vision use cases within the same unified framework.
Impact
These enhancements provide faster startup times, improved inference stability, and greater flexibility when working with modern LLMs. Teams can more easily integrate the latest models into their NLP pipelines, build richer multimodal workflows, and scale local LLM inference without increasing operational complexity.
Memory-Optimized MedicalNer Training for Large-Scale Datasets
MedicalNer training is enhanced with additional memory-focused optimizations that improve scalability when working with large datasets. These improvements reduce peak memory usage and help training run more reliably on cloud clusters, enabling larger corpus training without requiring proportionally larger executors.
Key additions include new training parameters:
-
setEnableMemoryOptimizer(True): Already available in previous versions, this setting activates memory optimization techniques during training to lower memory consumption.
-
setOptimizePartitioning(True): Optionally repartitions the dataset before training to improve data distribution and performance, especially for large or skewed datasets.
-
setPrefetchBatches(n): Number of batches to prefetch during training to improve data loading efficiency.
Together, these updates make MedicalNer training more efficient and better suited for large-scale production workflows.
Clinical De-Identification at Scale: Pipeline Design and Speed–Accuracy Benchmarks Across Infrastructures
To present a focused update on large-scale clinical de-identification benchmarks, emphasizing pipeline design, execution strategy, and infrastructure-aware performance, we benchmarked how different pipeline architectures - rule-augmented NER, hybrid NER + zero-shot, and zero-shot–centric approaches - behave under realistic Google Colab and Databricks–AWS deployments.
- Four complementary datasets are used to evaluate clinical de-identification pipelines from different perspectives;
- Dataset 1: Paper Dataset - Expert-Annotated
- Dataset 2: Curated Surrogate Clinical Dataset
- Dataset 3: Document-Level Clinical De-Identification Context Dataset
- Dataset 4: Large-Scale Aggregated Clinical NER and De-Identification Dataset
-
Clinical De-identification – Most Up-to-Date Pipelines
- Test Environment:
- GPU Setup: Google Colab A100 GPU, 48 Spark partitions
- CPU Setup: Google Colab CPU High-RAM, 32 Spark partitions
- Datasets Used:
- Dataset 1, Dataset 2 for accuracy benchmark, and
- Dataset 3 for speed benchmark.
- Test Environment:
| pipeline | GPU wall time |
CPU wall time |
Paper precision |
Paper recall |
Paper F1-score |
Surrogate precision |
Surrogate recall |
Surrogate F1-score |
pipeline content |
|---|---|---|---|---|---|---|---|---|---|
| clinical_deidentification_docwise_benchmark_optimized | 5 min 15 sec | 8 min 55 sec | 0.93 | 0.93 | 0.93 | 0.92 | 0.96 | 0.94 | 21 rule-based annotators 4 NER |
| clinical_deidentification_docwise_benchmark_medium | 4 min 38 sec | 32 min 57 sec | 0.90 | 0.97 | 0.93 | 0.87 | 0.96 | 0.91 | 21 rule-based annotators 3 NER + 1 Zero-shot (medium) |
| clinical_deidentification_docwise_benchmark_medium_v2 | 3 min 42 sec | 37 min 56 sec | 0.91 | 0.96 | 0.93 | 0.86 | 0.93 | 0.90 | 21 rule-based annotators 2 NER + Zero-shot Chunker |
| clinical_deidentification_docwise_zeroshot_medium | 26.7 sec | 27 min | 0.92 | 0.94 | 0.93 | 0.86 | 0.90 | 0.88 | 21 rule-based annotators Zero-shot Chunker (medium) |
| clinical_deidentification_docwise_SingleStage_zeroshot_medium | 33.1 sec | 26 min 41 sec | 0.92 | 0.91 | 0.92 | 0.87 | 0.88 | 0.88 | Zero-shot Chunker (medium) |
| clinical_deidentification_docwise_benchmark_large | 4 min 51 sec | 2 h 10 min | 0.90 | 0.97 | 0.94 | 0.88 | 0.96 | 0.92 | 21 rule-based annotators 3 NER + 1 Zero-shot (large) |
| clinical_deidentification_docwise_benchmark_large_v2 | 3 min 46 sec | 1 h 32 min | 0.92 | 0.98 | 0.95 | 0.87 | 0.94 | 0.91 | 21 rule-based annotators 2 NER + Zero-shot Chunker |
| clinical_deidentification_docwise_zeroshot_large | 43.8 sec | 1 h 18 min | 0.93 | 0.97 | 0.95 | 0.87 | 0.93 | 0.90 | 21 rule-based annotators Zero-shot Chunker (large) |
| clinical_deidentification_docwise_SingleStage_zeroshot_large | 41.1 sec | 1 h 15 min | 0.93 | 0.95 | 0.94 | 0.88 | 0.92 | 0.90 | Zero-shot Chunker (large) |
This table reports end-to-end runtime and token-level precision, recall, and F1-score for the most up-to-date clinical de-identification pipelines.
This benchmark results also provide a clear foundation for understanding how modern clinical de-identification systems behave under realistic infrastructure and execution constraints.
-
Deidentification Pipelines Speed Comparison on Databrics-AWS
- Test Environment:
- GPU Setup: Databricks - Worker Type: g4dn.2xlarge[T4] 32 GB Memory, 1 GPU, 8 Workers
- CPU Setup: Databricks - Worker Type: m5d.2xlarge 32 GB Memory, 8 Cores, 8 Workers
-
Dataset Used: Dataset 4 for speed benchmark
- CPU Runtime Comparison of Large, Medium and Optimized Pipelines
- Test Environment:
| Model | Infrastructure | Runtime | Batch Size |
|---|---|---|---|
| clinical_deidentification_docwise_benchmark_large_en | CPU | 9h 23m 54s | 32 |
| clinical_deidentification_docwise_benchmark_medium_en | CPU | 3h 7m 19s | 32 |
| clinical_deidentification_docwise_benchmark_optimized_en | CPU | 26m 6s | 32 |
- CPU & GPU Runtime Comparison of Medium Pipeline
| Model | Infrastructure | Runtime | Batch Size |
|---|---|---|---|
| clinical_deidentification_docwise_benchmark_medium_en | GPU | 1h 2m 35s | 8 |
| clinical_deidentification_docwise_benchmark_medium_en | CPU | 3h 7m 19s | 32 |
The findings emphasize that pipeline optimization yields greater performance gains than hardware scaling alone, while GPU resources provide additional, complementary speedups when applied to appropriately balanced pipeline configurations
-
Pretrained Zero-Shot Named Entity Recognition (NER) Deidentification Subentity Speed Comparison on Databrics-AWS
- Test Environment:
- GPU Setup: Databricks - Worker Type: g4dn.2xlarge[T4] 32 GB Memory, 1 GPU, 8 Workers
- CPU Setup: Databricks - Worker Type: m5d.2xlarge 32 GB Memory, 8 Cores, 8 Workers
-
Dataset Used: Dataset 4 for speed benchmark
- Zero-shot Medium Model CPU & GPU Runtime Comparison
- Test Environment:
| Model | Infrastructure | Runtime | Batch Size |
|---|---|---|---|
| zeroshot_ner_deid_subentity_merged_medium_en | CPU | 2h 47m 24s | 32 |
| zeroshot_ner_deid_subentity_merged_medium_en | GPU | 6m 26s | 32 |
- Zero-shot Medium Model Batch Size Comparison via GPU Cluster
| Model | Infrastructure | Runtime | Batch Size |
|---|---|---|---|
| zeroshot_ner_deid_subentity_merged_medium_en | GPU | 6m 26s | 32 |
| zeroshot_ner_deid_subentity_merged_medium_en | GPU | 12m 3s | 8 |
- Zero-shot Medium & Large Models GPU Runtime Comparison
| Model | Infrastructure | Runtime | Batch Size |
|---|---|---|---|
| zeroshot_ner_deid_subentity_merged_medium_en | GPU | 12m 3s | 8 |
| zeroshot_ner_deid_subentity_merged_large_en | GPU | 30m 23s | 8 |
The findings show that GPU usage is essential for production-scale runs, batch size optimization is critical for maximizing GPU efficiency, and model size should be selected based on the required balance between accuracy and runtime performance.
Comparative Medical Vision LLM Benchmark Results Against Leading Foundation Models
To provide a clear and comparative assessment of our Medical Vision LLMs, we report performance across a diverse set of medical reasoning, safety, bias, and clinical knowledge benchmarks, alongside leading proprietary foundation models.
These benchmarks evaluate capabilities spanning medical calculation, clinical QA, dialog understanding, hallucination resistance, bias robustness, and academic medical knowledge.
| Benchmark | Gemini-2.5-Pro | Sonnet-4 | JSL-MedicalVLM-30B | JSL-MedicalVLM-7B |
|---|---|---|---|---|
| MedCalc | 15.0 | 14.0 | 29.0 | 24.0 |
| MedBullets | 40.0 | 42.0 | 47.0 | 46.0 |
| ACI-Bench | 81.9 | 82.4 | 83.01 | 84.0 |
| MedicationQA | 72.56 | 72.4 | 76.8 | 77.1 |
| MedDialog | 75.1 | 75.3 | 77.1 | 77.5 |
| PubMedQA | 75.0 | 72.0 | 77.0 | 81.0 |
| RaceBias | 70.0 | 70.0 | 93.0 | 78.0 |
| MedHallu | 90.0 | 90.0 | 90.0 | 92.0 |
| College Biology | 95.0 | 98.6 | 99.3 | 92.4 |
| Medical Genetics | 95.0 | 94.0 | 95.0 | 93.0 |
| Average | 70.96 | 71.07 | 76.72 | 74.5 |
Key takeaways
- JSL-MedicalVLM-30B demonstrates strong and consistent performance across clinical reasoning, academic medical knowledge, and bias-sensitive evaluations, with particularly high scores in RaceBias, College Biology, and Medical Genetics.
- JSL-MedicalVLM-7B remains highly competitive relative to larger models, outperforming proprietary alternatives on multiple benchmarks while offering a more efficient deployment footprint.
- Both JSL MedicalVLM models show robust resistance to hallucinations and strong clinical dialog understanding, supporting their use in real-world medical and multimodal healthcare applications.
These results validate the reliability and clinical readiness of JSL Medical VLMs across a broad spectrum of medical evaluation tasks.
Advanced Clinical One-Liner Pretrained Pipelines for PHI Detection and Extraction
We introduce specialized pretrained pipelines designed specifically for Protected Health Information (PHI) detection and de-identification in clinical documents. These pipelines leverage state-of-the-art Named Entity Recognition (NER) models to automatically identify and extract sensitive medical information, ensuring compliance with healthcare privacy regulations.
Our de-identification pipelines eliminate the complexity of building custom PHI detection systems from scratch. Healthcare organizations and researchers can now deploy robust privacy protection measures with simple one-liner implementations, streamlining the process of sanitizing clinical documents while maintaining high accuracy standards.
| Model Name | Description |
|---|---|
clinical_deidentification_docwise_benchmark_large_v2 |
Mask and Obfuscate DATE, LOCATION, PROFESSION, DOCTOR, EMAIL, PATIENT, URL, USERNAME, CITY, COUNTRY, DLN, HOSPITAL, IDNUM, MEDICALRECORD, STATE, STREET, ZIP, AGE, PHONE, ORGANIZATION, SSN, ACCOUNT, PLATE, VIN, LICENSE, IP entities. |
clinical_deidentification_docwise_benchmark_medium_v2 |
Mask and Obfuscate DATE, LOCATION, PROFESSION, DOCTOR, EMAIL, PATIENT, URL, USERNAME, CITY, COUNTRY, DLN, HOSPITAL, IDNUM, MEDICALRECORD, STATE, STREET, ZIP, AGE, PHONE, ORGANIZATION, SSN, ACCOUNT, PLATE, VIN, LICENSE, IP entities. |
clinical_deidentification_docwise_benchmark_optimized_v2 |
Mask and Obfuscate NAME, DATE, LOCATION, PROFESSION, DOCTOR, EMAIL, PATIENT, URL, USERNAME, CITY, COUNTRY, DLN, HOSPITAL, IDNUM, MEDICALRECORD, STATE, STREET, ZIP, AGE, PHONE, ORGANIZATION, SSN, ACCOUNT, PLATE, VIN, LICENSE, IP entities. |
clinical_deidentification_docwise_zeroshot_large |
Mask and Obfuscate DATE, PROFESSION, DOCTOR, EMAIL, PATIENT, URL, USERNAME, CITY, COUNTRY, DLN, HOSPITAL, IDNUM, MEDICALRECORD, STATE, STREET, ZIP, AGE, PHONE, ORGANIZATION, SSN, ACCOUNT, PLATE, VIN, LICENSE, IP entities. |
clinical_deidentification_docwise_zeroshot_medium |
Mask and Obfuscate DATE, PROFESSION, DOCTOR, EMAIL, PATIENT, URL, USERNAME, CITY, COUNTRY, DLN, HOSPITAL, IDNUM, MEDICALRECORD, STATE, STREET, ZIP, AGE, PHONE, ORGANIZATION, SSN, ACCOUNT, PLATE, VIN, LICENSE, IP entities. |
clinical_deidentification_docwise_SingleStage_zeroshot_large |
Mask and Obfuscate DOCTOR, PATIENT, AGE, DATE, HOSPITAL, CITY, STREET, STATE, COUNTRY, PHONE, IDNUM, EMAIL, ZIP, ORGANIZATION, PROFESSION, USERNAME entities. |
clinical_deidentification_docwise_SingleStage_zeroshot_medium |
Mask and Obfuscate DOCTOR, PATIENT, AGE, DATE, HOSPITAL, CITY, STREET, STATE, COUNTRY, PHONE, IDNUM, EMAIL, ZIP, ORGANIZATION, PROFESSION, USERNAME entities. |
Example:
deid_pipeline = PretrainedPipeline("clinical_deidentification_docwise_zeroshot_medium", "en", "clinical/models")
text = """Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024.
The patient’s medical record number is 56467890.
The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 ."""
deid_result = deid_pipeline.fullAnnotate(text)
Result:
| text | result | result |
|---|---|---|
| Dr. John Lee, from Royal Medical Clinic in Chicago, attended to the patient on 11/05/2024. The patient’s medical record number is 56467890. The patient, Emma Wilson, is 50 years old, her Contact number: 444-456-7890 . | Dr. |
Dr. Valerie Aho, from Mercy Hospital Aurora in Berea, attended to the patient on 05/07/2024. The patient’s medical record number is 78689012. The patient, Johnathon Bunde, is 55 years old, her Contact number: 666-678-9012 . |
Enhanced 8 New Entity Resolver Models for Mapping Clinical Terms to Standardized UMLS CUI Codes and LOINC
These Entity Resolver models map clinical entities to standardized UMLS Concept Unique Identifiers (CUI) using sbiobert_base_cased_mli and mpnet_embeddings_biolord_2023_c sentence embeddings. Trained on the 2025AB UMLS Metathesaurus releases, they cover diverse semantic categories including diseases, syndromes, clinical drugs, pharmacologic substances, antibiotics, symptoms, procedures, clinical findings, anatomical structures, medical devices, and injuries & poisoning.
In addition to UMLS CUI mapping, we also introduce a model for mapping medical entities to Logical Observation Identifiers Names and Codes (LOINC), facilitating standardized representation of laboratory and clinical observations.
| Model Name | Description |
|---|---|
sbiobertresolve_umls_findings |
Maps clinical findings to their corresponding UMLS CUI codes |
sbiobertresolve_umls_clinical_drugs |
Maps drug entities to UMLS CUI codes |
sbiobertresolve_umls_disease_syndrome |
Maps clinical entities(“Disease or Syndrome”) to UMLS CUI codes |
sbiobertresolve_umls_drug_substance |
Maps drug and substances to UMLS CUI codes |
sbiobertresolve_umls_general_concepts |
Maps clinical entities and concepts to the following 4 UMLS CUI code categories: Disease, Symptom, Medication and Procedure |
sbiobertresolve_umls_major_concepts |
Maps clinical entities and concepts to 4 major categories of UMLS CUI codes: Clinical Findings, Medical Devices, Anatomical Structures, Injuries & Poisoning terms |
biolordresolve_umls_general_concepts |
Maps clinical entities to 4 UMLS CUI code categories using mpnet_embeddings_biolord_2023_c embeddings: Disease, Symptom, Medication, and Procedure |
sbiobertresolve_loinc |
Maps extracted medical entities to Logical Observation Identifiers Names and Codes (LOINC) |
Example:
...
ner_model = MedicalNerModel.pretrained("ner_posology_greedy", "en", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("posology_ner")
ner_model_converter = NerConverterInternal()\
.setInputCols(["sentence", "token", "posology_ner"])\
.setOutputCol("posology_ner_chunk")\
.setWhiteList(["DRUG"])
chunk2doc = Chunk2Doc().setInputCols("posology_ner_chunk").setOutputCol("ner_chunk_doc")
sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli","en","clinical/models")\
.setInputCols(["ner_chunk_doc"])\
.setOutputCol("sbert_embeddings")\
.setCaseSensitive(False)
resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_umls_clinical_drugs","en", "clinical/models") \
.setInputCols(["sbert_embeddings"]) \
.setOutputCol("resolution")\
.setDistanceFunction("EUCLIDEAN")
pipeline = Pipeline(stages = [document_assembler, sentenceDetector, tokenizer, word_embeddings, ner_model, ner_model_converter, chunk2doc, sbert_embedder, resolver])
data = spark.createDataFrame([["""She was immediately given hydrogen peroxide 30 mg to treat the infection on her leg, and has been advised Neosporin Cream for 5 days. She has a history of taking magnesium hydroxide 100mg/1ml and metformin 1000 mg."""]]).toDF("text")
Result:
| ner_chunk | entity | umls_code | resolution | all_k_resolutions | all_k_results | all_k_distances | all_k_cosine_distances |
|---|---|---|---|---|---|---|---|
| hydrogen peroxide 30 mg | DRUG | C1126248 | hydrogen peroxide 30 mg/ml | hydrogen peroxide 30 mg/ml:::hydrogen peroxide solution 30%:::hydrogen peroxid… | C1126248:::C0304655:::C1605252:::C0304656:::C1154260:::C2242362:::C1724195:::… | 4.3731:::4.7154:::5.3302:::6.2122:::6.8675:::7.2770:::7.4682:::7.8312:::7.858… | 0.0323:::0.0369:::0.0483:::0.0649:::0.0807:::0.0890:::0.0957:::0.1018:::0.106… |
| Neosporin Cream | DRUG | C0132149 | neosporin cream | neosporin cream:::neomycin sulfate cream:::neosporin topical ointment:::nasep… | C0132149:::C4722788:::C0704071:::C0698988:::C1252084:::C3833898:::C0698810:::… | 0.0000:::7.0688:::7.3112:::7.3482:::7.3820:::7.4605:::7.7285:::7.8918:::7.908… | 0.0000:::0.0888:::0.0953:::0.0934:::0.0964:::0.0941:::0.1052:::0.1113:::0.108… |
| magnesium hydroxide 100mg/1ml | DRUG | C1134402 | magnesium hydroxide 100 mg | magnesium hydroxide 100 mg:::magnesium hydroxide 100 mg/ml:::magnesium sulph… | C1134402:::C1126785:::C4317023:::C4051486:::C4047137:::C1131100:::C1371187:::… | 4.9759:::5.1251:::5.8597:::6.5641:::6.5735:::6.8202:::6.8606:::6.9799:::7.007… | 0.0401:::0.0432:::0.0565:::0.0688:::0.0700:::0.0764:::0.0781:::0.0783:::0.079… |
| metformin 1000 mg | DRUG | C0987664 | metformin 1000 mg | metformin 1000 mg:::metformin hydrochloride 1000 mg:::metformin hcl 1000mg ta… | C0987664:::C2719784:::C0978482:::C2719786:::C4282269:::C2719794:::C4282270:::… | 0.0000:::5.2988:::5.3783:::5.9071:::6.1034:::6.3066:::6.6597:::6.6626:::6.782… | 0.0000:::0.0445:::0.0454:::0.0553:::0.0586:::0.0632:::0.0698:::0.0707:::0.072… |
Introduced 17 New Rule-Based Entity Extraction Models for Identifying Structured Codes and Patterns in Clinical Texts
These rule-based extraction models identify structured entities and patterns within clinical texts using ContextualParser and RegexMatcherInternal annotators. The models cover a variety of entity types including ICD-10 codes, URLs, vehicle identification numbers (VIN), and other domain-specific patterns, enabling accurate extraction without requiring labeled training data.
| Model Name | Description |
|---|---|
icd10_parser |
Extracts icd10 entities |
account_parser |
Extracts account number entities |
age_parser |
Extracts age entities |
date_regex_matcher |
Extract date entities |
dln_parser |
Extracts drive license number entities |
license_parser |
Extracts license number entities |
phone_parser |
Extracts phone number entities |
plate_parser |
Extracts plate number entities |
ssn_parser |
Extracts SSN number entities |
vin_parser |
Extracts vehicle identifier number entities |
cpt_parser |
Extracts cpt entities |
email_regex_matcher |
Extracts emails |
ip_regex_matcher |
Extracts IPs |
med_parser |
Extracts medical record entities |
specimen_parser |
Extracts specimen entities |
url_regex_matcher |
Extracts URLs |
zip_regex_matcher |
Extracts ZIP code entities |
Example:
dln_contextual_parser = ContextualParserModel.pretrained("dln_parser","en","clinical/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("chunk_dln")
model = parserPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
sample_text ="""Name : Hendrickson, Ora, Record date: 2093-01-13, # 719435.
Dr. John Green, ID: 1231511863, IP 203.120.223.13.
He is a 60-year-old male was admitted to the Day Hospital for cystectomy on 01/13/93.
Patient's VIN : 1HGBH41JXMN109286, SSN #333-44-6666, Driver's license no: A334455B. Driver's license# 12345678. MY DL# B324567 CDL bs34df45
Phone (302) 786-5227, 0295 Keats Street, San Francisco, E-MAIL: smith@gmail.com."""
Result:
| chunk | begin | end | label |
|---|---|---|---|
| A334455B | 271 | 278 | DLN |
| 12345678 | 299 | 306 | DLN |
| B324567 | 316 | 322 | DLN |
| bs34df45 | 328 | 335 | DLN |
Introduced 7 New NER Models for PHI Deidentification and Drug Entity Extraction
This release introduces 3 new NER models for clinical entity extraction. Two models focus on PHI (Protected Health Information) detection for deidentification: a generic model detecting broad entity types (DATE, NAME, LOCATION, ID, CONTACT, AGE, PROFESSION) and a subentity model with granular labels (PATIENT, DOCTOR, HOSPITAL, STREET, CITY, ZIP). The third model extracts drug entities by combining dosage, strength, form, and route into a unified Drug label.
| Model Name | Description |
|---|---|
ner_deid_generic_nonMedical |
Model detects PHI entities such as DATE, NAME, LOCATION, ID, CONTACT, AGE, PROFESSION |
ner_deid_subentity_nonMedical |
Model detects PHI entities with granular labels such as PATIENT, DOCTOR, HOSPITAL, STREET, CITY, ZIP |
ner_drugs_large_v2 |
Model combines dosage, strength, form, and route into a single entity: Drug |
zeroshot_ner_deid_generic_nonMedical_large |
The model is designed to support any set of entity labels, allowing users to adapt it to their specific use cases. |
zeroshot_ner_deid_generic_nonMedical_medium |
The model is designed to support any set of entity labels, allowing users to adapt it to their specific use cases. |
zeroshot_ner_deid_subentity_nonMedical_large |
The model is designed to support any set of entity labels, allowing users to adapt it to their specific use cases. |
zeroshot_ner_deid_subentity_nonMedical_medium |
The model is designed to support any set of entity labels, allowing users to adapt it to their specific use cases. |
Example:
ner_model = MedicalNerModel.pretrained("ner_deid_generic_nonMedical", "en", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
text = """
Mr. James Wilson is a 65-year-old male who presented to the emergency department at Boston General Hospital on 10/25/2023.
He lives at 123 Oak Street, Springfield, IL 62704. He can be contacted at 555-0199.
His SSN is 999-00-1234. Dr. Gregory House is the attending physician.
"""
data = spark.createDataFrame([[text]]).toDF("text")
Result:
| chunk | begin | end | ner_label |
|---|---|---|---|
| James Wilson | 5 | 16 | NAME |
| 65-year-old | 23 | 33 | AGE |
| Boston General Hospital | 85 | 107 | LOCATION |
| 10/25/2023 | 112 | 121 | DATE |
| 123 Oak Street | 137 | 150 | LOCATION |
| Springfield | 153 | 163 | LOCATION |
| IL | 166 | 167 | LOCATION |
| 555-0199 | 199 | 206 | CONTACT |
| 999-00-1234 | 221 | 231 | ID |
| Gregory House | 238 | 250 | NAME |
Enhanced 8 new contextual assertion models to classify clinical entities by assertion status
We introduced 8 new contextual assertion models that classify assertion context for clinical mentions—helping distinguish whether a condition applies to the patient, is denied, historical, conditional, hypothetical, planned, or attributed to someone else.
| Model Name | Description |
|---|---|
contextual_assertion_absent |
Identifies medical conditions that are explicitly absent or denied in the patient |
contextual_assertion_conditional |
Identifies medical conditions that are conditional or dependent on certain circumstances |
contextual_assertion_family |
Identifies medical conditions that belong to family members rather than the patient |
contextual_assertion_hypothetical |
Identifies medical conditions mentioned in hypothetical or uncertain contexts |
contextual_assertion_past |
Identifies medical conditions that occurred in the patient’s past medical history |
contextual_assertion_planned |
Identifies medical procedures or treatments that are planned for the future |
contextual_assertion_possible |
Identifies medical conditions that are possible or suspected but not confirmed |
contextual_assertion_someoneelse |
Identifies medical conditions that belong to someone other than the patient (not family) |
Example:
contextual_assertion = ContextualAssertion\
.pretrained("contextual_assertion_absent", "en", "clinical/models")\
.setInputCols("sentence", "token", "ner_chunk")\
.setOutputCol("assertion_absent")
text = """The patient denies any chest pain, shortness of breath, or fever. No history of diabetes or hypertension."""
data = spark.createDataFrame([[text]]).toDF('text')
Result:
| ner_chunk | begin | end | ner_label | result |
|---|---|---|---|---|
| any chest pain | 19 | 32 | PROBLEM | absent |
| shortness of breath | 35 | 53 | PROBLEM | absent |
| fever | 59 | 63 | PROBLEM | absent |
| diabetes | 80 | 87 | PROBLEM | absent |
| hypertension | 92 | 103 | PROBLEM | absent |
Added new ONNX-based multilingual clinical NER models for Italian, Spanish, and Romanian, covering disease, procedure, medication, and symptom entity extraction
This release adds several new ONNX-exported NER models for multilingual clinical text mining, enabling faster and more efficient inference while extracting key medical entities from real-world clinical notes and records. The models cover Italian, Spanish, and Romanian medical text and support structured extraction of diseases, procedures, medications, and symptoms using BIO tagging.
| Model Name | Description |
|---|---|
roberta_disease_ner_onnx |
RoBERTa-based token classification model for identifying disease mentions in text |
roberta_med_ner_onnx |
RoBERTa-based token classification model trained to identify medication mentions in clinical and biomedical text |
roberta_procedure_ner_onnx |
RoBERTa-based token classification model for extracting symptom mentions from text |
roberta_symptom_ner_onnx |
RoBERTa-based token classification model for extracting symptom mentions from text |
vetclinical_bert_onnx |
Trained on large-scale real-world veterinary medical records, this model captures animal-health terminology and note structure to support accurate disease/syndrome classification, information extraction, and clinical text analysis |
bert_token_classifier_disease_ner_it_onnx |
BERT-based NER model identifies disease mentions for Italian medical text |
bert_token_classifier_medical_ner_it_onnx |
BERT-based NER model detects medication names for Italian medical text |
bert_token_classifier_procedure_ner_it_onnx |
BERT-based NER model identifies medical procedures for Italian medical text |
roberta_disease_ner_es_onnx |
BERT-based NER model identifies disease mentions for Spanish medical text |
roberta_procedure_ner_es_onnx |
BERT-based NER model identifies medical procedures for Spanish medical text |
roberta_symptom_ner_es_onnx |
BERT-based NER model identifies symptom mentions for Spanish medical text |
xlmroberta_medical_ner_ro_onnx |
XLM-RoBERTa-based NER model identifies medication mentions for Romanian medical text |
Example:
tokenClassifier = MedicalBertForTokenClassifier \
.pretrained("bert_token_classifier_procedure_ner_it_onnx", "it", "clinical/models") \
.setInputCols(["document", "token"]) \
.setOutputCol("ner")
data = spark.createDataFrame([["Il paziente è stato sottoposto a risonanza magnetica e biopsia epatica."]]).toDF("text")
Result:
| text | entity |
|---|---|
| risonanza magnetica | PROCEDURE |
| biopsia epatica | PROCEDURE |
New Blog Posts & Technical Deep Dives
-
From Clinical Text to Knowledge Graphs with John Snow Labs Healthcare NLP This blog post shows how to build an end-to-end clinical knowledge graph from unstructured medical text using John Snow Labs Healthcare NLP library. We’ll start by extracting key clinical entities about medication with Named Entity Recognition, then connect them using Relation Extraction to capture medically meaningful links. Finally, we’ll convert these structured relationships into a knowledge graph that makes complex clinical narratives easier to search, analyze, and interpret.
-
Clinical De-Identification at Scale: Pipeline Design and Speed–Accuracy Trade-offs Across Infrastructures This blog post presents a focused update on large-scale clinical de-identification benchmarks, emphasize pipeline design, execution strategy, and infrastructure-aware performance. Rather than treating accuracy as an isolated metric, we analyze how different pipeline architectures - rule-augmented NER, hybrid NER + zero-shot, and zero-shot–centric approaches - behave under realistic Google Colab and Databricks–AWS deployments.
Various core improvements, bug fixes, enhanced overall robustness and reliability of Healthcare NLP
- Improved Pipeline Tracer coverage by adding support for PretrainedZeroShotNERChunker (plus ChunkFilterer/CER/CEF) and correctly tracing AssertionMerger replaceDict behavior.
- Added replaceDict to AssertionMerger to allow custom replacement of assertion labels.
- PipelineOutputParser improvements to support mappings output in clinical_deidentification pipelines
Updated Notebooks And Demonstrations For making Healthcare NLP Easier To Navigate And Understand
- New CDA DeIdentification Notebook
We Have Added And Updated A Substantial Number Of New Clinical Models And Pipelines, Further Solidifying Our Offering In The Healthcare Domain.
jsl_meds_vlm_7b_q4_v1_enjsl_meds_vlm_7b_q8_v1_enjsl_meds_vlm_7b_q16_v1_enjsl_meds_vlm_4b_q4_v1_enjsl_meds_vlm_4b_q8_v1_enjsl_meds_vlm_4b_q16_v1_enjsl_meds_vlm_reasoning_8b_q4_v1_enjsl_meds_vlm_reasoning_8b_q8_v1_enjsl_meds_vlm_reasoning_8b_q16_v1_enclinical_deidentification_docwise_benchmark_large_v2clinical_deidentification_docwise_benchmark_medium_v2clinical_deidentification_docwise_benchmark_optimized_v2clinical_deidentification_docwise_zeroshot_largeclinical_deidentification_docwise_zeroshot_mediumclinical_deidentification_docwise_SingleStage_zeroshot_largeclinical_deidentification_docwise_SingleStage_zeroshot_mediumsbiobertresolve_umls_findingssbiobertresolve_umls_clinical_drugssbiobertresolve_umls_disease_syndromesbiobertresolve_umls_drug_substancesbiobertresolve_umls_general_conceptssbiobertresolve_umls_major_conceptsbiolordresolve_umls_general_conceptssbiobertresolve_loincicd10_parseraccount_parserage_parserdate_regex_matcherdln_parserlicense_parserphone_parserplate_parserssn_parservin_parsercpt_parseremail_regex_matcherip_regex_matchermed_parserspecimen_parserurl_regex_matcherzip_regex_matcherner_deid_generic_nonMedicalner_deid_subentity_nonMedicalner_drugs_large_v2zeroshot_ner_deid_generic_nonMedical_largezeroshot_ner_deid_generic_nonMedical_mediumzeroshot_ner_deid_subentity_nonMedical_largezeroshot_ner_deid_subentity_nonMedical_mediumroberta_disease_ner_onnxroberta_med_ner_onnxroberta_procedure_ner_onnxroberta_symptom_ner_onnxvetclinical_bert_onnxbert_token_classifier_disease_ner_it_onnxbert_token_classifier_medical_ner_it_onnxbert_token_classifier_procedure_ner_it_onnxroberta_disease_ner_es_onnxroberta_procedure_ner_es_onnxroberta_symptom_ner_es_onnxxlmroberta_medical_ner_ro_onnxcontextual_assertion_absentcontextual_assertion_conditionalcontextual_assertion_familycontextual_assertion_hypotheticalcontextual_assertion_pastcontextual_assertion_plannedcontextual_assertion_possiblecontextual_assertion_someoneelse
For all Healthcare NLP models, please check: Models Hub Page
Versions
- 6.3.0
- 6.2.2
- 6.2.1
- 6.2.0
- 6.1.1
- 6.1.0
- 6.0.4
- 6.0.3
- 6.0.2
- 6.0.1
- 6.0.0
- 5.5.3
- 5.5.2
- 5.5.1
- 5.5.0
- 5.4.1
- 5.4.0
- 5.3.3
- 5.3.2
- 5.3.1
- 5.3.0
- 5.2.1
- 5.2.0
- 5.1.4
- 5.1.3
- 5.1.2
- 5.1.1
- 5.1.0
- 5.0.2
- 5.0.1
- 5.0.0
- 4.4.4
- 4.4.3
- 4.4.2
- 4.4.1
- 4.4.0
- 4.3.2
- 4.3.1
- 4.3.0
- 4.2.8
- 4.2.4
- 4.2.3
- 4.2.2
- 4.2.1
- 4.2.0
- 4.1.0
- 4.0.2
- 4.0.0
- 3.5.3
- 3.5.2
- 3.5.1
- 3.5.0
- 3.4.2
- 3.4.1
- 3.4.0
- 3.3.4
- 3.3.2
- 3.3.1
- 3.3.0
- 3.2.3
- 3.2.2
- 3.2.1
- 3.2.0
- 3.1.3
- 3.1.2
- 3.1.1
- 3.1.0
- 3.0.3
- 3.0.2
- 3.0.1
- 3.0.0
- 2.7.6
- 2.7.5
- 2.7.4
- 2.7.3
- 2.7.2
- 2.7.1
- 2.7.0
- 2.6.2
- 2.6.0
- 2.5.5
- 2.5.3
- 2.5.2
- 2.5.0
- 2.4.6
- 2.4.5
- 2.4.2
- 2.4.1
- 2.4.0