Models

 

Pretrained Models

English

Model Name en
LemmatizerModel (Lemmatizer) lemma_antbnc Download
PerceptronModel (POS) pos_anc Download
NerCRFModel (NER with GloVe) ner_crf Download
NerDLModel (NER with GloVe) ner_dl Download
NerDLModel (NER with GloVe) ner_dl_contrib Download
NerDLModel (NER with BERT) ner_dl_bert Download
NerDLModel (NER with BERT) ner_dl_bert_contrib Download
WordEmbeddings (GloVe) glove_100d Download
WordEmbeddings (GloVe) glove_840B_300 Download
WordEmbeddings (GloVe) glove_6B_300 Download
WordEmbeddings (BERT) bert_uncased Download
DeepSentenceDetector ner_dl_sentence Download
ContextSpellCheckerModel (Spell Checker) spellcheck_dl Download
SymmetricDeleteModel (Spell Checker) spellcheck_sd Download
NorvigSweetingModel (Spell Checker) spellcheck_norvig Download
ViveknSentimentModel (Sentiment) sentiment_vivekn Download
DependencyParser (Dependency) dependency_conllu Download
TypedDependencyParser (Dependency) dependency_typed_conllu Download

French

Model Name fr
LemmatizerModel (Lemmatizer) lemma Download
PerceptronModel (POS UD) pos_ud_gsd Download
NerDLModel (glove_6B_300 and glove_840B_300) ner_dl Download
Feature Description
Lemma Trained by Lemmatizer annotator on lemmatization-lists by Michal Měchura
POS Trained by PerceptronApproach annotator on the Universal Dependencies
NER Trained by NerDLApproach annotator with BiLSTM-CNN on the WikiNER corpus and supports the identification of PER, LOC, ORG and MISC entities

German

Model Name de
LemmatizerModel (Lemmatizer) lemma de
PerceptronModel (POS UD) pos_ud_hdt de
NerDLModel (glove_6B_300 and glove_840B_300) ner_dl de
Feature Description
Lemma Trained by Lemmatizer annotator on lemmatization-lists by Michal Měchura
POS Trained by PerceptronApproach annotator on the Universal Dependencies
NER Trained by NerDLApproach annotator with BiLSTM-CNN on the WikiNER corpus and supports the identification of PER, LOC, ORG and MISC entities

Italian

Model Name it
LemmatizerModel (Lemmatizer) lemma_dxc Download
SentimentDetector (Sentiment) sentiment_dxc Download
PerceptronModel (POS UD) pos_ud_isdt Download
NerDLModel (glove_6B_300 and glove_840B_300) ner_dl Download
Feature Description
Lemma Trained by Lemmatizer annotator on DXC Technology dataset
POS Trained by PerceptronApproach annotator on the Universal Dependencies
NER Trained by NerDLApproach annotator with BiLSTM-CNN on the WikiNER corpus and supports the identification of PER, LOC, ORG and MISC entities

Multi-language

Model Name xx
WordEmbeddings (BERT) bert_multi_cased Download

How to use Pretrained Models

Online

You can follow this approach to use Spark NLP pretrained models:

# load NER model trained by deep learning approach and GloVe word embeddings
ner_dl = NerDLModel.pretrained('ner_dl')
# load NER model trained by deep learning approach and BERT word embeddings
ner_bert = NerDLModel.pretrained('ner_dl_bert')

The default language is en, so for other laguages you should set the language:

// load French POS tagger model trained by Universal Dependencies
val french_pos = PerceptronModel.pretrained("pos_ud_gsd", lang="fr")
// load Italain LemmatizerModel
val italian_lemma = LemmatizerModel.pretrained("lemma_dxc", lang="it")

Offline

If you have any trouble using online pipelines or models in your environment (maybe it’s air-gapped), you can directly download them for offline use.

After downloading offline models/pipelines and extracting them, here is how you can use them iside your code (the path could be a shared storage like HDFS in a cluster):

  • Loading PerceptronModel annotator model inside Spark NLP Pipeline
val french_pos = PerceptronModel.load("/tmp/pos_ud_gsd_fr_2.0.2_2.4_1556531457346/")
      .setInputCols("document", "token")
      .setOutputCol("pos")
Last updated