Numind Medical NER LLM v1 (jsl_meds_ner_deid_generic_2b_q8_v1)

Description

This medical LLM model is trained to extract medical entities from clinical notes and return them in structured JSON format. It supports various entity types such as AGE, CITY, DRUG, PATIENT, PROBLEM, etc.

Copy S3 URI

How to use

from sparknlp_jsl.annotator import MedicalLLM
from sparknlp.base import DocumentAssembler
from pyspark.ml import Pipeline

prompt = """Extract all medical entities from the clinical note below and return them in JSON format according to the template.

#### Template:
entities
  ]
}}

#### Clinical Note:
On March 15, 2024, 58-year-old male patient John Smith (medical record number 12345678) was admitted to Memorial Hospital in New York, NY under the care of Dr. Sarah Johnson with chest pain, cough and shortness of breath. He was diagnosed with stage IV non-small cell lung cancer with metastases to the liver and bones. Treatment included osimertinib 80 mg daily, carboplatin and pemetrexed chemotherapy, and pembrolizumab immunotherapy. After three months, imaging showed a 30% reduction in tumor size, his symptoms improved, and follow-up is scheduled for July 22, 2024.


#### Instructions:
- Extract all entities exactly as they appear in the text
- Return only valid JSON format
- Use empty lists for categories with no entities found
- Do not add explanations, only return the JSON

#### Output:
"""

data = spark.createDataFrame([[prompt]]).toDF("text")

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

medical_llm = MedicalLLM.pretrained("jsl_meds_ner_deid_generic_2b_q8_v1", "en", "clinical/models")\
    .setInputCols("document")\
    .setOutputCol("completions")\
    .setBatchSize(1)\
    .setNPredict(3000)\
    .setUseChatTemplate(True)\
    .setTemperature(0.1)\
    .setTopK(40)\
    .setTopP(0.9)

pipeline = Pipeline(stages=[
    document_assembler,
    medical_llm
])

model = pipeline.fit(data)
results = model.transform(data)

output = results.select("completions").collect()[0].completions[0].result
print(output)

from johnsnowlabs import nlp, medical

prompt = """Extract all medical entities from the clinical note below and return them in JSON format according to the template.

#### Template:
entities
  ]
}}

#### Clinical Note:
On March 15, 2024, 58-year-old male patient John Smith (medical record number 12345678) was admitted to Memorial Hospital in New York, NY under the care of Dr. Sarah Johnson with chest pain, cough and shortness of breath. He was diagnosed with stage IV non-small cell lung cancer with metastases to the liver and bones. Treatment included osimertinib 80 mg daily, carboplatin and pemetrexed chemotherapy, and pembrolizumab immunotherapy. After three months, imaging showed a 30% reduction in tumor size, his symptoms improved, and follow-up is scheduled for July 22, 2024.


#### Instructions:
- Extract all entities exactly as they appear in the text
- Return only valid JSON format
- Use empty lists for categories with no entities found
- Do not add explanations, only return the JSON

#### Output:
"""

data = nlp.SparkSession.builder.getOrCreate().createDataFrame([[prompt]]).toDF("text")

document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

medical_llm = medical.MedicalLLM.pretrained("jsl_meds_ner_deid_generic_2b_q8_v1", "en", "clinical/models")\
    .setInputCols("document")\
    .setOutputCol("completions")\
    .setBatchSize(1)\
    .setNPredict(3000)\
    .setUseChatTemplate(True)\
    .setTemperature(0.1)\
    .setTopK(40)\
    .setTopP(0.9)

pipeline = nlp.Pipeline().setStages([
    document_assembler,
    medical_llm
])

model = pipeline.fit(data)
results = model.transform(data)

output = results.select("completions").collect()[0].completions[0].result
print(output)

import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotators._
import org.apache.spark.sql.functions._
import org.apache.spark.ml.Pipeline

val prompt = """Extract all medical entities from the clinical note below and return them in JSON format according to the template.

#### Template:
entities
  ]
}}

#### Clinical Note:
On March 15, 2024, 58-year-old male patient John Smith, a retired engineer, (medical record number 12345678) was admitted to Memorial Hospital in New York, NY under the care of Dr. Sarah Johnson with chest pain, cough and shortness of breath. He was diagnosed with stage IV non-small cell lung cancer with metastases to the liver and bones. Treatment included osimertinib 80 mg daily, carboplatin and pemetrexed chemotherapy, and pembrolizumab immunotherapy. After three months, imaging showed a 30% reduction in tumor size, his symptoms improved, and follow-up is scheduled for July 22, 2024.

#### Instructions:
- Extract all entities exactly as they appear in the text
- Return only valid JSON format
- Use empty lists for categories with no entities found
- Do not add explanations, only return the JSON

#### Output:
"""

val data = Seq(prompt).toDF("text")

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val medicalLLM = MedicalLLM
  .pretrained("jsl_meds_ner_deid_generic_2b_q8_v1", "en", "clinical/models")
  .setInputCols(Array("document"))
  .setOutputCol("completions")
  .setBatchSize(1)
  .setNPredict(3000)
  .setUseChatTemplate(true)
  .setTemperature(0.1)
  .setTopK(40)
  .setTopP(0.9)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  medicalLLM
))

val model = pipeline.fit(data)
val result = model.transform(data)

val output = result.select("completions").collect()(0).getAs[Seq[Row]]("completions")(0).getAs[String]("result")
println(output)

Results

    "AGE": ["58-year-old"],
    "CONTACT": ["12345678"],
    "DATE": ["March 15, 2024", "July 22, 2024"],
    "ID": ["12345678"],
    "LOCATION": ["Memorial Hospital","NewYork","NY"],
    "NAME": ["John Smith", "Sarah Johnson"],
    "PROFESSION": ["engineer"],

Model Information

Model Name: jsl_meds_ner_deid_generic_2b_q8_v1
Compatibility: Healthcare NLP 6.2.2+
License: Licensed
Edition: Official
Input Labels: [document]
Output Labels: [completions]
Language: en
Size: 2.5 GB