

Spark NLP Evaluation

This module includes tools to evaluate the accuracy of annotators and visualize the parameters used on training. It includes specific metrics for each annotator and its training time. The results will display on the console or to an MLflow tracking UI. Just with a simple import you can start using eval module.

  • Check how to setup MLflow UI
  • See here on eval folder if you want to check specific running examples.


from sparknlp_jsl.eval import *
import com.johnsnowlabs.nlp.eval._

Evaluating Norvig Spell Checker

You can evaluate this spell checker either by training an annotator or by using a pretrained model.

  • spark: Spark session.
  • trainFile: A corpus of documents with correctly spell words.
  • testFile: A corpus of documents with misspells words.
  • groundTruthFile: The same corpus used on testFile but with correctly spell words.

Train File Example:

Any document that you prefer with correctly spell words.

Test File Example:

My siter go to Munich.

Ground Truth File Example:

My sister goes to Munich.

Example for annotator:

spell = NorvigSweetingApproach() \
        .setInputCols(["token"]) \
        .setOutputCol("checked") \

norvigSpellEvaluation = NorvigSpellEvaluation(spark, test_file, ground_truth_file)
norvigSpellEvaluation.computeAccuracyAnnotator(train_file, spell)
val spell = new NorvigSweetingApproach()

val norvigSpellEvaluation = new NorvigSpellEvaluation(spark, testFile, groundTruthFile)
norvigSpellEvaluation.computeAccuracyAnnotator(trainFile, spell)

Example for pretrained model:

spell = NorvigSweetingModel.pretrained()

norvigSpellEvaluation = NorvigSpellEvaluation(spark, test_file, ground_truth_file)
val spell = NorvigSweetingModel.pretrained()
val norvigSpellEvaluation = new NorvigSpellEvaluation(spark, testFile, groundTruthFile)

Evaluating Symmetric Spell Checker

You can evaluate this spell checker either by training an annotator or by using a pretrained model.

  • spark: Spark session
  • trainFile: A corpus of documents with correctly spell words.
  • testFile: A corpus of documents with misspells words.
  • groundTruthFile: The same corpus used on testFile but with correctly spell words.

Train File Example:

Any document that you prefer with correctly spell words.

Test File Example:

My siter go to Munich.

Ground Truth File Example:

My sister goes to Munich.

Example for annotator:

spell = SymmetricDeleteApproach() \
        .setInputCols(["token"]) \
        .setOutputCol("checked") \

symSpellEvaluation = SymSpellEvaluation(spark, test_file, ground_truth_file)
symSpellEvaluation.computeAccuracyAnnotator(train_file, spell)
val spell = new SymmetricDeleteApproach()

val symSpellEvaluation = new SymSpellEvaluation(spark, testFile, groundTruthFile)
symSpellEvaluation.computeAccuracyAnnotator(trainFile, spell)

Example for pretrained model:

spell = SymmetricDeleteModel.pretrained()

symSpellEvaluation = NorvigSpellEvaluation(spark, test_file, ground_truth_file)
val spell = SymmetricDeleteModel.pretrained()
val symSpellEvaluation = new SymSpellEvaluation(spark, testFile, groundTruthFile)

Evaluating NER DL

You can evaluate NER DL when training an annotator.

  • spark: Spark session.
  • trainFile: Files with labeled NER entities for training.
  • testFile: Files with labeled NER entities for testing. These files are used to evaluate the model. So, it’s used for prediction and the labels as ground truth.
  • tagLevel: The granularity of tagging when measuring accuracy on entities. Set “IOB” to include inside and beginning, empty to ignore it. For example to display accuracy for entity I-PER and B-PER set “IOB” whereas just for entity PER set it as an empty string.


embeddings = WordEmbeddings() \
            .setInputCols(["document", "token"]) \
            .setOutputCol("embeddings") \
            .setEmbeddingsSource("glove.6B.100d.txt", 100, "TEXT")

ner_approach = NerDLApproach() \
      .setInputCols(["document", "token", "embeddings"]) \
      .setLabelColumn("label") \
      .setOutputCol("ner") \
      .setMaxEpochs(10) \

nerDLEvaluation = NerDLEvaluation(spark, test_File, tag_level)
nerDLEvaluation.computeAccuracyAnnotator(train_file, ner_approach, embeddings)
val embeddings = new WordEmbeddings()
      .setInputCols("sentence", "token")
      .setEmbeddingsSource("glove.6B.100d.txt", 100, WordEmbeddingsFormat.TEXT)

val nerApproach = new NerDLApproach()
  .setInputCols(Array("sentence", "token", "embeddings"))

val nerDLEvaluation = new NerDLEvaluation(spark, testFile, tagLevel)
nerDLEvaluation.computeAccuracyAnnotator(trainFile, nerApproach, embeddings)

Example for pretrained model:

ner_dl = NerDLModel.pretrained()

nerDlEvaluation = NerDLEvaluation(spark, test_File, tag_level)
val nerDl = NerDLModel.pretrained()

val nerDlEvaluation = NerDLEvaluation(spark, testFile, tagLevel)

Evaluating NER CRF

You can evaluate NER CRF when training an annotator.

  • spark: Spark session.
  • trainFile: Files with labeled NER entities for training.
  • testFile: Files with labeled NER entities for testing. These files are used to evaluate the model. So, it’s used for prediction and the labels as ground truth.
  • format: The granularity of tagging when measuring accuracy on entities. Set “IOB” to include inside and beginning, empty to ignore it. For example to display accuracy for entity I-PER and B-PER set “IOB” whereas just for entity PER set it as an empty string.


embeddings = WordEmbeddings() \
            .setInputCols(["document", "token"]) \
            .setOutputCol("embeddings") \
            .setEmbeddingsSource("glove.6B.100d.txt", 100, "TEXT")

ner_approach = NerCrfApproach() \
      .setInputCols(["document", "token", "pos", "embeddings"]) \
      .setLabelColumn("label") \
      .setOutputCol("ner") \
      .setMaxEpochs(10) \

nerCrfEvaluation = NerCrfEvaluation(spark, test_File, tag_level)
nerCrfEvaluation.computeAccuracyAnnotator(train_file, ner_approach, embeddings)
val embeddings = new WordEmbeddings()
      .setInputCols("sentence", "token")
      .setEmbeddingsSource("./glove.6B.100d.txt ", 100, WordEmbeddingsFormat.TEXT)

val nerTagger = new NerCrfApproach()
  .setInputCols(Array("sentence", "token","pos", "embeddings"))

val nerCrfEvaluation = new NerCrfEvaluation(testFile, format)
nerCrfEvaluation.computeAccuracyAnnotator(trainFile, nerTagger, embeddings)

Example for pretrained model:

ner_crf = NerCrfModel.pretrained()

nerCrfEvaluation = NerCrfEvaluation(spark, test_File, tag_level)
nerCrf = NerCrfModel.pretrained()

nerCrfEvaluation = NerCrfEvaluation(spark, testFile, tagLevel)

Evaluating POS Tagger

You can evaluate POS either by training an annotator or by using a pretrained model.

  • spark: Spark session.
  • trainFile: A labeled POS file see and example here.
  • testFile: A CoNLL-U format file.

Example for annotator:

pos_tagger = PerceptronApproach() \
             .setInputCols(["document", "token"]) \
             .setOutputCol("pos") \

posEvaluation = POSEvaluation(spark, test_file)
posEvaluation.computeAccuracyAnnotator(train_file, pos_tagger)
val posTagger = new PerceptronApproach()
      .setInputCols(Array("document", "token"))

val posEvaluation = new POSEvaluation(spark, testFile)
posEvaluation.computeAccuracyAnnotator(trainFile, posTagger)
Last updated