T5 Question Generation (Small)


This model is a Text Generation model, originally trained on SQUAD dataset, then finetuned by AllenAI team, to generate questions from texts. The power lies on the ability to generate also questions providing a low number of tokens, for example a subject and a verb (Amazon should provide), what would return a question similar to What Amazon should provide?).

At the same time, this model can be used to feed Question Answering Models, as the first parameter (question), while providing a bigger paragraph as context. This way, you:

  • First, generate questions on the fly
  • Second, look for an answer in the text.

Moreover, the input of this model can even be a concatenation of entities from NER (EMV - ORG , will provide - ACTION).

Predicted Entities


How to use

document_assembler = DocumentAssembler() \
    .setInputCol("text") \

t5 = T5Transformer() \
    .pretrained("t5_question_generation_small") \
    .setInputCols(["documents"]) \

data_df = spark.createDataFrame([["EMV will pay"]]).toDF("text")

pipeline = Pipeline().setStages([document_assembler, t5])
results = pipeline.fit(data_df).transform(data_df)

val documentAssembler = new DocumentAssembler()

val t5 = T5Transformer.pretrained("t5_question_generation_small")

val pipeline = new Pipeline().setStages(Array(documentAssembler, t5))

val data = Seq("EMV will pay").toDF("text")

val result = pipeline.fit(data).transform(data)



|result              |
|[What will EMV pay?]|

Model Information

Model Name: t5_question_generation_small
Compatibility: Spark NLP 4.0.0+
License: Open Source
Edition: Official
Input Labels: [documents]
Output Labels: [summaries]
Language: en
Size: 148.0 MB