Google's T5 for closed book question answering

Description

This is a text-to-text model trained by Google on the colossal, cleaned version of Common Crawl’s web crawl corpus (C4) data set and then fined tuned on Wikipedia and the natural questions (NQ) dataset. The model can answer free text questions, such as “Which is the capital of France ?” without relying on any context or external resources.

Predicted Entities

[DOCUMENT]

Download

How to use

from sparknlp.annotator import SentenceDetectorDLModel, T5Transformer

       data = self.spark.createDataFrame([
            [1, "Which is the capital of France? Who was the first president of USA?"],
            [1, "Which is the capital of Bulgaria ?"],
            [2, "Who is Donald Trump?"]]).toDF("id", "text")

        document_assembler = DocumentAssembler() \
            .setInputCol("text") \
            .setOutputCol("documents")

        sentence_detector = SentenceDetectorDLModel\
            .pretrained()\
            .setInputCols(["documents"])\
            .setOutputCol("questions")

        t5 = T5Transformer()\
            .pretrained("google_t5_small_ssm_nq")\
            .setInputCols(["questions"])\
            .setOutputCol("answers")\

        pipeline = Pipeline().setStages([document_assembler, sentence_detector, t5])
        results = pipeline.fit(data).transform(data)

        results.select("questions.result", "answers.result").show(truncate=False)
  val testData = ResourceHelper.spark.createDataFrame(Seq(

      (1, "Which is the capital of France? Who was the first president of USA?"),
      (1, "Which is the capital of Bulgaria ?"),
      (2, "Who is Donald Trump?")

    )).toDF("id", "text")

    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("documents")

    val sentenceDetector = SentenceDetectorDLModel
      .pretrained()
      .setInputCols(Array("documents"))
      .setOutputCol("questions")

    val t5 = T5Transformer
      .pretrained("google_t5_small_ssm_nq")
      .setInputCols(Array("questions"))
      .setOutputCol("answers")

    val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDetector, t5))

    val model = pipeline.fit(testData)
    val results = model.transform(testData)

    results.select("questions.result", "answers.result").show(truncate = false)

Results

+-------------------------------------------------------------------------------------------------------------+-----------------------------------------+
|result                                                                                                                 |result                                     |
+-------------------------------------------------------------------------------------------------------------+-----------------------------------------+
|[Which is the capital of France?, Who was the first president of USA?]|[Paris, George Washington]|
|[Which is the capital of Bulgaria ?]                                                              |[Sofia]                                     |
|[Who is Donald Trump?]                                                                                |[a United States citizen]      |
+------------------------------------------------------------------------------------------------------------+------------------------------------------+

Model Information

Model Name: google_t5_small_ssm_nq
Compatibility: Spark NLP 2.7.0+
Edition: Official
Language: en

Data Source

C4, Wikipedia, NQ