Description
This LLM model is trained to perform Q&A, Summarization, RAG, and Chat.
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
medical_llm = MedicalLLM.pretrained("jsl_meds_q4_v3", "en", "clinical/models")\
.setInputCols("document")\
.setOutputCol("completions")\
.setBatchSize(1)\
.setNPredict(100)\
.setUseChatTemplate(True)\
.setTemperature(0)
pipeline = Pipeline(
stages = [
document_assembler,
medical_llm
])
prompt = """
Based on the following text, what age group is most susceptible to breast cancer?
## Text:
The exact cause of breast cancer is unknown. However, several risk factors can increase your likelihood of developing breast cancer, such as:
- A personal or family history of breast cancer
- A genetic mutation, such as BRCA1 or BRCA2
- Exposure to radiation
- Age (most commonly occurring in women over 50)
- Early onset of menstruation or late menopause
- Obesity
- Hormonal factors, such as taking hormone replacement therapy
"""
data = spark.createDataFrame([[prompt]]).toDF("text")
results = pipeline.fit(data).transform(data)
results.select("completions").show(truncate=False)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val medical_llm = MedicalLLM.pretrained("jsl_meds_q4_v3", "en", "clinical/models")
.setInputCols("document")
.setOutputCol("completions")
.setBatchSize(1)
.setNPredict(100)
.setUseChatTemplate(True)
.setTemperature(0)
val pipeline = new Pipeline().setStages(Array(
document_assembler,
medical_llm
))
val prompt = """
Based on the following text, what age group is most susceptible to breast cancer?
## Text:
The exact cause of breast cancer is unknown. However, several risk factors can increase your likelihood of developing breast cancer, such as:
- A personal or family history of breast cancer
- A genetic mutation, such as BRCA1 or BRCA2
- Exposure to radiation
- Age (most commonly occurring in women over 50)
- Early onset of menstruation or late menopause
- Obesity
- Hormonal factors, such as taking hormone replacement therapy
"""
val data = Seq(prompt).toDF("text")
val results = pipeline.fit(data).transform(data)
results.select("completions").show(truncate=False)
Results
Based on the provided text, the age group most susceptible to breast cancer is women over 50 years old. This is explicitly mentioned as the most common occurrence age for breast cancer. While other factors like genetic mutations, family history, and hormonal factors also contribute to the risk, the text specifically highlights age as a significant risk factor. It is important to note that while age is a risk factor, breast cancer can still occur in younger women, and awareness and preventive measures should be considered across all age groups.
Model Information
Model Name: | jsl_meds_q4_v3 |
Compatibility: | Healthcare NLP 5.5.0+ |
License: | Licensed |
Edition: | Official |
Language: | en |
Size: | 2.4 GB |
Benchmarking
We have generated a total of 400 questions, 100 from each category. These questions were labeled and reviewed by 3 physician annotators. %
indicates the preference rate.
Please see the more benchmark information here.
## Overall
| Model | Factuality % | Clinical Relevancy % | Conciseness % |
|------------|--------------|----------------------|---------------|
| JSL-MedS | 0.24 | 0.25 | 0.38 |
| GPT4o | 0.19 | 0.26 | 0.27 |
| Neutral | 0.43 | 0.36 | 0.18 |
| None | 0.14 | 0.13 | 0.17 |
| Total | 1.00 | 1.00 | 1.00 |
## Summary
| Model | Factuality % | Clinical Relevancy % | Conciseness % |
|------------|--------------|----------------------|---------------|
| JSL-MedS | 0.47 | 0.48 | 0.42 |
| GPT4o | 0.25 | 0.25 | 0.25 |
| Neutral | 0.22 | 0.22 | 0.25 |
| None | 0.07 | 0.05 | 0.08 |
| Total | 1.00 | 1.00 | 1.00 |
## QA
| Model | Factuality % | Clinical Relevancy % | Conciseness % |
|------------|--------------|----------------------|---------------|
| JSL-MedS | 0.35 | 0.36 | 0.42 |
| GPT4o | 0.24 | 0.24 | 0.29 |
| Neutral | 0.33 | 0.33 | 0.18 |
| None | 0.09 | 0.07 | 0.11 |
| Total | 1.00 | 1.00 | 1.00 |
## BioMedical
| Model | Factuality % | Clinical Relevancy % | Conciseness % |
|------------|--------------|----------------------|---------------|
| JSL-MedS | 0.33 | 0.24 | 0.57 |
| GPT4o | 0.12 | 0.08 | 0.16 |
| Neutral | 0.45 | 0.57 | 0.16 |
| None | 0.10 | 0.10 | 0.10 |
| Total | 1.00 | 1.00 | 1.00 |
## OpenEnded
| Model | Factuality % | Clinical Relevancy % | Conciseness % |
|------------|--------------|----------------------|---------------|
| JSL-MedS | 0.35 | 0.30 | 0.39 |
| GPT4o | 0.30 | 0.33 | 0.41 |
| Neutral | 0.19 | 0.20 | 0.02 |
| None | 0.17 | 0.17 | 0.19 |
| Total | 1.00 | 1.00 | 1.00 |
PREVIOUSJSL_MedS_v2 (LLM - q4)