TREC(6) Question Classifier

Description

Classify open-domain, fact-based questions into one of the following broad semantic categories: Abbreviation, Description, Entities, Human Beings, Locations or Numeric Values.

Classified Labels

ABBR, DESC, NUM, ENTY, LOC, HUM.

Live Demo
Open in Colab
Download

How to use

documentAssembler = DocumentAssembler()\
  .setInputCol("text")\
  .setOutputCol("document")
use = UniversalSentenceEncoder.pretrained(lang="en") \
  .setInputCols(["document"])\
  .setOutputCol("sentence_embeddings")
document_classifier = ClassifierDLModel.pretrained('classifierdl_use_trec6', 'en') \
  .setInputCols(["document", "sentence_embeddings"]) \
  .setOutputCol("class")

nlpPipeline = Pipeline(stages=[documentAssembler, use, document_classifier])
light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

annotations = light_pipeline.fullAnnotate('When did the construction of stone circles begin in the UK?')

val documentAssembler = DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")
val use = UniversalSentenceEncoder.pretrained(lang="en")
  .setInputCols(Array("document"))
  .setOutputCol("sentence_embeddings")
val document_classifier = ClassifierDLModel.pretrained("classifierdl_use_trec6", "en")
  .setInputCols(Array("document", "sentence_embeddings"))
  .setOutputCol("class")
val pipeline = new Pipeline().setStages(Array(documentAssembler, use, document_classifier))

val result = pipeline.fit(Seq.empty["When did the construction of stone circles begin in the UK?"].toDS.toDF("text")).transform(data)
import nlu

text = ["""When did the construction of stone circles begin in the UK?"""]
trec6_df = nlu.load('en.classify.trec6.use').predict(text, output_level='document')
trec6_df[["document", "trec6"]]

Results

+------------------------------------------------------------------------------------------------+------------+
|document                                                                                        |class       |
+------------------------------------------------------------------------------------------------+------------+
|When did the construction of stone circles begin in the UK?                                     | NUM        |
+------------------------------------------------------------------------------------------------+------------+

Model Information

|————————-|————————————–| | Model Name | classifierdl_use_trec6 | | Model Class | ClassifierDLModel | | Spark Compatibility | 2.5.0 | | Spark NLP Compatibility | 2.4 | | License | open source | | Edition | public | | Input Labels | [document, sentence_embeddings] | | Output Labels | [class] | | Language | en | | Upstream Dependencies | tfhub_use |

Data Source

This model is trained on the 6 class version of TREC dataset. http://search.r-project.org/library/textdata/html/dataset_trec.html

Benchmarking

              precision    recall  f1-score   support

        ABBR       0.00      0.00      0.00        26
        DESC       0.89      0.96      0.92       343
        ENTY       0.86      0.86      0.86       391
         HUM       0.91      0.90      0.91       366
         LOC       0.88      0.91      0.89       233
         NUM       0.94      0.94      0.94       274

    accuracy                           0.89      1633
   macro avg       0.75      0.76      0.75      1633
weighted avg       0.88      0.89      0.89      1633