T5 Question Generation (Small)

Description

This model is a Text Generation model, originally trained on SQUAD dataset, then finetuned by AllenAI team, to generate questions from texts. The power lies on the ability to generate also questions providing a low number of tokens, for example a subject and a verb (Amazon should provide), what would return a question similar to What Amazon should provide?).

At the same time, this model can be used to feed Question Answering Models, as the first parameter (question), while providing a bigger paragraph as context. This way, you:

  • First, generate questions on the fly
  • Second, look for an answer in the text.

Moreover, the input of this model can even be a concatenation of entities from NER (EMV - ORG , will provide - ACTION).

Predicted Entities

Download

How to use

document_assembler = DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("documents")

t5 = T5Transformer() \
    .pretrained("t5_question_generation_small") \
    .setTask("")\
    .setMaxOutputLength(200)\
    .setInputCols(["documents"]) \
    .setOutputCol("question")

data_df = spark.createDataFrame([["EMV will pay"]]).toDF("text")

pipeline = Pipeline().setStages([document_assembler, t5])
results = pipeline.fit(data_df).transform(data_df)

results.select("question.result").show(truncate=False)
val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("documents")

val t5 = T5Transformer.pretrained("t5_question_generation_small")
  .setTask("")
  .setMaxOutputLength(200)
  .setInputCols("documents")
  .setOutputCol("question")

val pipeline = new Pipeline().setStages(Array(documentAssembler, t5))

val data = Seq("EMV will pay").toDF("text")

val result = pipeline.fit(data).transform(data)

result.select("question.result").show(false)

Results

+--------------------+
|result              |
+--------------------+
|[What will EMV pay?]|
+--------------------+

Model Information

Model Name: t5_question_generation_small
Compatibility: Spark NLP 4.0.0+
License: Open Source
Edition: Official
Input Labels: [documents]
Output Labels: [summaries]
Language: en
Size: 148.0 MB

References

SQUAD2.0