Medical Text Summarization

Description

This model is a modified version of Flan-T5 (LLM) based summarization model that is finetuned with additional data curated by John Snow Labs. This model is further optimized by augmenting the training methodology, and dataset. It can generate summaries from clinical notes up to 512 tokens given the input text (max 1024 tokens)

Live Demo Open in Colab Download Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("documents")

med_summarizer  = MedicalSummarizer()\
    .pretrained("summarizer_generic_jsl", "en", "clinical/models")\
    .setInputCols("documents")\
    .setOutputCol("summary")\
    .setMaxNewTokens(100)\
    .setMaxTextLength(1024)
    
pipeline = Pipeline(stages=[document_assembler, med_summarizer])

text = """Patient with hypertension, syncope, and spinal stenosis - for recheck.
(Medical Transcription Sample Report)
SUBJECTIVE:
The patient is a 78-year-old female who returns for recheck. She has hypertension. She denies difficulty with chest pain, palpations, orthopnea, nocturnal dyspnea, or edema.
PAST MEDICAL HISTORY / SURGERY / HOSPITALIZATIONS:
Reviewed and unchanged from the dictation on 12/03/2003.
MEDICATIONS:
Atenolol 50 mg daily, Premarin 0.625 mg daily, calcium with vitamin D two to three pills daily, multivitamin daily, aspirin as needed, and TriViFlor 25 mg two pills daily. She also has Elocon cream 0.1% and Synalar cream 0.01% that she uses as needed for rash.
ALLERGIES:..."""

data = spark.createDataFrame([[text]]).toDF("text")

pipeline.fit(data).transform(data)
val document_assembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("documents")

val med_summarizer  = MedicalSummarizer()
    .pretrained("summarizer_generic_jsl", "en", "clinical/models")
    .setInputCols("documents")
    .setOutputCol("summary")
    .setMaxNewTokens(100)

val pipeline = new Pipeline().setStages(Array(document_assembler, med_summarizer))

val text = """Patient with hypertension, syncope, and spinal stenosis - for recheck.
(Medical Transcription Sample Report)
SUBJECTIVE:
The patient is a 78-year-old female who returns for recheck. She has hypertension. She denies difficulty with chest pain, palpations, orthopnea, nocturnal dyspnea, or edema.
PAST MEDICAL HISTORY / SURGERY / HOSPITALIZATIONS:
Reviewed and unchanged from the dictation on 12/03/2003.
MEDICATIONS:
Atenolol 50 mg daily, Premarin 0.625 mg daily, calcium with vitamin D two to three pills daily, multivitamin daily, aspirin as needed, and TriViFlor 25 mg two pills daily. She also has Elocon cream 0.1% and Synalar cream 0.01% that she uses as needed for rash.
ALLERGIES:..."""

val data = Seq(text).toDS.toDF("text")

val result = pipeline.fit(data).transform(data)

Results

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|result                                                                                                                                                                                                                                                                                                                                   |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[recheck. A 78-year-old female patient returns for recheck due to hypertension, syncope, and spinal stenosis. She has a history of heart failure, myocardial infarction, lymphoma, and asthma. She has been prescribed Atenolol, Premarin, calcium with vitamin D, multivitamin, aspirin, and TriViFlor. She has also been prescribed El]|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Model Information

Model Name: summarizer_generic_jsl
Compatibility: Healthcare NLP 4.3.2+
License: Licensed
Edition: Official
Language: en
Size: 920.0 MB

Benchmarking

Benchmark on Samsum Dataset

 model_name model_size rouge bleu bertscore_precision bertscore_recall: bertscore_f1
philschmid/flan-t5-base-samsum  240M 0.2734 0.1813 0.8938 0.9133 0.9034
linydub/bart-large-samsum 500M  0.3060 0.2168 0.8961 0.9065 0.9013
philschmid/bart-large-cnn-samsum  500M 0.3794 0.1262 0.8599 0.9153 0.8867
transformersbook/pegasus-samsum 570M  0.3049 0.1543 0.8942 0.9183 0.9061
summarizer_generic_jsl 240M  0.2703 0.1932 0.8944 0.9161 0.9051