Google T5 for Closed Book Question Answering Small

Description

The model was pre-trained using T5’s denoising objective on C4, subsequently additionally pre-trained using REALM’s salient span masking objective on Wikipedia, and finally fine-tuned on Natural Questions (NQ).

Note: The model was fine-tuned on 100% of the train splits of Natural Questions (NQ) for 10k steps.

Other community Checkpoints: here

Paper: How Much Knowledge Can You Pack Into the Parameters of a Language Model?

Open in Colab Download

How to use

Either set the following tasks or have them inline with your input:

  • nq question:
  • trivia question:
  • question:
  • nq:
document_assembler = DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("documents")

t5 = T5Transformer() \
    .pretrained("google_t5_small_ssm_nq") \
    .setTask("nq:")\
    .setMaxOutputLength(200)\
    .setInputCols(["documents"]) \
    .setOutputCol("answer")

pipeline = Pipeline().setStages([document_assembler, t5])
results = pipeline.fit(data_df).transform(data_df)

results.select("answer.result").show(truncate=False)

val documentAssembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("documents")

val t5 = T5Transformer
    .pretrained("google_t5_small_ssm_nq")
    .setTask("nq:")
    .setInputCols(Array("documents"))
    .setOutputCol("answer")

val pipeline = new Pipeline().setStages(Array(documentAssembler, t5))

val model = pipeline.fit(dataDf)
val results = model.transform(dataDf)

results.select("answer.result").show(truncate = false)

Model Information

Model Name: google_t5_small_ssm_nq
Compatibility: Spark NLP 2.7.1+
Edition: Official
Input Labels: [sentence]
Output Labels: [t5]
Language: en

Data Source

The model was pre-trained using T5’s denoising objective on C4, subsequently additionally pre-trained using REALM’s salient span masking objective on Wikipedia, and finally fine-tuned on Natural Questions (NQ).