Description
This LLM model is trained to perform Q&A, Summarization, RAG, and Chat.
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
medical_llm = MedicalLLM.pretrained("jsl_meds_q8_v2", "en", "clinical/models")\
.setInputCols("document")\
.setOutputCol("completions")\
.setBatchSize(1)\
.setNPredict(100)\
.setUseChatTemplate(True)\
.setTemperature(0)
pipeline = Pipeline(
stages = [
document_assembler,
medical_llm
])
prompt = """
Based on the following text, what age group is most susceptible to breast cancer?
## Text:
The exact cause of breast cancer is unknown. However, several risk factors can increase your likelihood of developing breast cancer, such as:
- A personal or family history of breast cancer
- A genetic mutation, such as BRCA1 or BRCA2
- Exposure to radiation
- Age (most commonly occurring in women over 50)
- Early onset of menstruation or late menopause
- Obesity
- Hormonal factors, such as taking hormone replacement therapy
"""
data = spark.createDataFrame([[prompt]]).toDF("text")
results = pipeline.fit(data).transform(data)
results.select("completions").show(truncate=False)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val medical_llm = MedicalLLM.pretrained("jsl_meds_q8_v2", "en", "clinical/models")
.setInputCols("document")
.setOutputCol("completions")
.setBatchSize(1)
.setNPredict(100)
.setUseChatTemplate(True)
.setTemperature(0)
val pipeline = new Pipeline().setStages(Array(
document_assembler,
medical_llm
))
val prompt = """
Based on the following text, what age group is most susceptible to breast cancer?
## Text:
The exact cause of breast cancer is unknown. However, several risk factors can increase your likelihood of developing breast cancer, such as:
- A personal or family history of breast cancer
- A genetic mutation, such as BRCA1 or BRCA2
- Exposure to radiation
- Age (most commonly occurring in women over 50)
- Early onset of menstruation or late menopause
- Obesity
- Hormonal factors, such as taking hormone replacement therapy
"""
val data = Seq(prompt).toDF("text")
val results = pipeline.fit(data).transform(data)
results.select("completions").show(truncate=False)
Results
Based on the provided text, the age group most susceptible to breast cancer is women over the age of 50. This is explicitly mentioned in the text, indicating that breast cancer is most commonly occurring in this age group. It is important to note that while age is a significant risk factor, other factors such as genetic mutations, family history, and hormonal factors also contribute to the likelihood of developing breast cancer. Regular screenings and awareness of risk factors are crucial for early detection and effective management of breast cancer.
Model Information
Model Name: | jsl_meds_q8_v2 |
Compatibility: | Healthcare NLP 5.5.0+ |
License: | Licensed |
Edition: | Official |
Language: | en |
Size: | 3.9 GB |
Benchmarking
We have generated a total of 400 questions, 100 from each category. These questions were labeled and reviewed by 3 physician annotators. %
indicates the preference rate.
Please see the more benchmark information here.
## Overall
| Model | Factuality % | Clinical Relevancy % | Conciseness % |
|------------|--------------|----------------------|---------------|
| JSL-MedS | 0.24 | 0.25 | 0.38 |
| GPT4o | 0.19 | 0.26 | 0.27 |
| Neutral | 0.43 | 0.36 | 0.18 |
| None | 0.14 | 0.13 | 0.17 |
| Total | 1.00 | 1.00 | 1.00 |
## Summary
| Model | Factuality % | Clinical Relevancy % | Conciseness % |
|------------|--------------|----------------------|---------------|
| JSL-MedS | 0.47 | 0.48 | 0.42 |
| GPT4o | 0.25 | 0.25 | 0.25 |
| Neutral | 0.22 | 0.22 | 0.25 |
| None | 0.07 | 0.05 | 0.08 |
| Total | 1.00 | 1.00 | 1.00 |
## QA
| Model | Factuality % | Clinical Relevancy % | Conciseness % |
|------------|--------------|----------------------|---------------|
| JSL-MedS | 0.35 | 0.36 | 0.42 |
| GPT4o | 0.24 | 0.24 | 0.29 |
| Neutral | 0.33 | 0.33 | 0.18 |
| None | 0.09 | 0.07 | 0.11 |
| Total | 1.00 | 1.00 | 1.00 |
## BioMedical
| Model | Factuality % | Clinical Relevancy % | Conciseness % |
|------------|--------------|----------------------|---------------|
| JSL-MedS | 0.33 | 0.24 | 0.57 |
| GPT4o | 0.12 | 0.08 | 0.16 |
| Neutral | 0.45 | 0.57 | 0.16 |
| None | 0.10 | 0.10 | 0.10 |
| Total | 1.00 | 1.00 | 1.00 |
## OpenEnded
| Model | Factuality % | Clinical Relevancy % | Conciseness % |
|------------|--------------|----------------------|---------------|
| JSL-MedS | 0.35 | 0.30 | 0.39 |
| GPT4o | 0.30 | 0.33 | 0.41 |
| Neutral | 0.19 | 0.20 | 0.02 |
| None | 0.17 | 0.17 | 0.19 |
| Total | 1.00 | 1.00 | 1.00 |
PREVIOUSJSL_MedS (LLM - q8)