Financial FLAN-T5 Summarization (Base)

Description

FLAN-T5 is a state-of-the-art language model developed by Google researchers that utilizes the T5 architecture for text summarization tasks. It is trained on a large dataset of diverse texts and can generate high-quality summaries of articles, documents, and other text-based inputs.

References:

@article{flant5_paper,
  title={Scaling instruction-finetuned language models},
  author={Chung, Hyung Won and Hou, Le and Longpre, Shayne and Zoph, Barret and Tay, Yi and Fedus, William and Li, Eric and Wang, Xuezhi and Dehghani, Mostafa and Brahma, Siddhartha and others},
  journal={arXiv preprint arXiv:2210.11416},
  year={2022}
}

@article{t5_paper,
  title={Exploring the limits of transfer learning with a unified text-to-text transformer},
  author={Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J},
  journal={The Journal of Machine Learning Research},
  volume={21},
  number={1},
  pages={5485--5551},
  year={2020},
  publisher={JMLRORG}
}

Predicted Entities

Download Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("documents")

flant5 = finance.Summarizer().pretrained('finsum_flant5_base','en','finance/models')\
    .setInputCols(["documents"])\
    .setOutputCol("summary")

pipeline = nlp.Pipeline(stages=[document_assembler, flant5])

data = spark.createDataFrame([
  [1, "Based on the financial data provided, it appears that the company has experienced steady growth in revenue over the past few years. However, this growth has been offset by increasing expenses, particularly in the areas of research and development, and marketing and advertising. As a result, the company's profit margins have remained relatively stable, but have not shown significant improvement."]
]).toDF('id', 'text')

results = pipeline.fit(data).transform(data)

results.select("summary.result").show(truncate=False)

Results

+------------------------------------------------------------------------------------------------------------+
|result                                                                                                      |
+------------------------------------------------------------------------------------------------------------+
|[The company's revenue has been growing but the company's expenses have been offset by increasing revenues.]|
+------------------------------------------------------------------------------------------------------------+

Model Information

Model Name: finsum_flant5_base
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Language: en
Size: 920.9 MB