Pipelines and Models

 

Pretrained Pipelines

Pipelines Name English
Explain Document ML explain_document_ml Download
Explain Document DL explain_document_dl Download
Entity Recognizer DL entity_recognizer_dl Download

Pretrained Models

English

Model Name English
LemmatizerModel (Lemmatizer) lemma_antbnc Download
PerceptronModel (POS) pos_anc Download
NerCRFModel (NER with GloVe) ner_crf Download
NerDLModel (NER with GloVe) ner_dl Download
WordEmbeddings (GloVe) glove_100d Download
WordEmbeddings (BERT) bert_uncased Download
NerDLModel (NER with BERT) ner_dl_bert Download
DeepSentenceDetector ner_dl_sentence Download
ContextSpellCheckerModel (Spell Checker) spellcheck_dl Download
SymmetricDeleteModel (Spell Checker) spellcheck_sd Download
NorvigSweetingModel (Spell Checker) spellcheck_norvig Download
ViveknSentimentModel (Sentiment) sentiment_vivekn Download
DependencyParser (Dependency) dependency_conllu Download
TypedDependencyParser (Dependency) dependency_typed_conllu Download

Italian

Model Name Italian
LemmatizerModel (Lemmatizer) lemma_dxc Download
SentimentDetector (Sentiment) sentiment_dxc Download

French

Model Name French
PerceptronModel (POS UD) pos_ud_gsd Download
LemmatizerModel (Lemmatizer) lemma Download

How to use Models and Pipelines

Online

To use Spark NLP pretrained pipelines, you can call PretrainedPipeline with pipeline’s name and its language (default is en):

pipeline = PretrainedPipeline('explain_document_dl', lang='en')

Same in Scala

val pipeline = PretrainedPipeline("explain_document_dl", lang="en")

You can follow the same approach to use Spark NLP pretrained models:

# load NER model trained by deep learning approach and GloVe word embeddings
ner_dl = NerDLModel.pretrained('ner_dl')
# load NER model trained by deep learning approach and BERT word embeddings
ner_crf = NerDLModel.pretrained('ner_dl_bert')

The default language is English, so for other laguages you should set the language:

// load French POS tagger model trained by Universal Dependencies
val french_pos = PerceptronModel.pretrained("pos_ud_gsd", lang="fr")
// load Italain LemmatizerModel
val italian_lemma = LemmatizerModel.pretrained("lemma_dxc", lang="it")

Offline

If you have any trouble using online pipelines or models in your environment (maybe it’s air-gapped), you can directly download them for offline use.

After downloading offline models/pipelines and extracting them, here is how you can use them iside your code (the path could be a shared storage like HDFS in a cluster):

  • Loading PerceptronModel annotator model inside Spark NLP Pipeline
val french_pos = PerceptronModel.load("/tmp/pos_ud_gsd_fr_2.0.2_2.4_1556531457346/")
      .setInputCols("document", "token")
      .setOutputCol("pos")
  • Loading Offline Pipeline
val advancedPipeline = PipelineModel.load("/tmp/explain_document_dl_en_2.0.2_2.4_1556530585689/")
// To use the loaded Pipeline for prediction
advancedPipeline.transform(predictionDF)
Last updated