Models

 

Pretrained Models

English

Model Name en
LemmatizerModel (Lemmatizer) lemma_antbnc Download
PerceptronModel (POS) pos_anc Download
NerCRFModel (NER with GloVe) ner_crf Download
NerDLModel (NER with GloVe) ner_dl Download
NerDLModel (NER with GloVe) ner_dl_contrib Download
NerDLModel (NER with BERT) ner_dl_bert Download
NerDLModel (NER with BERT) ner_dl_bert_contrib Download
NerDLModel (OntoNotes with GloVe 100d) onto_100 Download
NerDLModel (OntoNotes with GloVe 300d) onto_300 Download
WordEmbeddings (GloVe) glove_100d Download
WordEmbeddings (BERT) bert_uncased Download
DeepSentenceDetector ner_dl_sentence Download
ContextSpellCheckerModel (Spell Checker) spellcheck_dl Download
SymmetricDeleteModel (Spell Checker) spellcheck_sd Download
NorvigSweetingModel (Spell Checker) spellcheck_norvig Download
ViveknSentimentModel (Sentiment) sentiment_vivekn Download
DependencyParser (Dependency) dependency_conllu Download
TypedDependencyParser (Dependency) dependency_typed_conllu Download

French

Model Name fr
LemmatizerModel (Lemmatizer) lemma Download
PerceptronModel (POS UD) pos_ud_gsd Download
NerDLModel (glove_840B_300) wikiner_840B_300 Download
Feature Description
Lemma Trained by Lemmatizer annotator on lemmatization-lists by Michal Měchura
POS Trained by PerceptronApproach annotator on the Universal Dependencies
NER Trained by NerDLApproach annotator with Char CNN - BiLSTM and GloVe Embeddings on the WikiNER corpus and supports the identification of PER, LOC, ORG and MISC entities

German

Model Name de
LemmatizerModel (Lemmatizer) lemma de
PerceptronModel (POS UD) pos_ud_hdt de
NerDLModel (glove_840B_300) wikiner_840B_300 de
Feature Description
Lemma Trained by Lemmatizer annotator on lemmatization-lists by Michal Měchura
POS Trained by PerceptronApproach annotator on the Universal Dependencies
NER Trained by NerDLApproach annotator with Char CNN - BiLSTM and GloVe Embeddings on the WikiNER corpus and supports the identification of PER, LOC, ORG and MISC entities

Italian

Model Name it
LemmatizerModel (Lemmatizer) lemma_dxc Download
SentimentDetector (Sentiment) sentiment_dxc Download
PerceptronModel (POS UD) pos_ud_isdt Download
NerDLModel (glove_840B_300) wikiner_840B_300 Download
Feature Description
Lemma Trained by Lemmatizer annotator on DXC Technology dataset
POS Trained by PerceptronApproach annotator on the Universal Dependencies
NER Trained by NerDLApproach annotator with Char CNN - BiLSTM and GloVe Embeddings on the WikiNER corpus and supports the identification of PER, LOC, ORG and MISC entities

Multi-language

Model Name xx
WordEmbeddings (GloVe) glove_840B_300 Download
WordEmbeddings (GloVe) glove_6B_300 Download
WordEmbeddings (BERT) bert_multi_cased Download

How to use Pretrained Models

Online

You can follow this approach to use Spark NLP pretrained models:

# load NER model trained by deep learning approach and GloVe word embeddings
ner_dl = NerDLModel.pretrained('ner_dl')
# load NER model trained by deep learning approach and BERT word embeddings
ner_bert = NerDLModel.pretrained('ner_dl_bert')

The default language is en, so for other laguages you should set the language:

// load French POS tagger model trained by Universal Dependencies
val french_pos = PerceptronModel.pretrained("pos_ud_gsd", lang="fr")
// load Italain LemmatizerModel
val italian_lemma = LemmatizerModel.pretrained("lemma_dxc", lang="it")

Offline

If you have any trouble using online pipelines or models in your environment (maybe it’s air-gapped), you can directly download them for offline use.

After downloading offline models/pipelines and extracting them, here is how you can use them iside your code (the path could be a shared storage like HDFS in a cluster):

  • Loading PerceptronModel annotator model inside Spark NLP Pipeline
val french_pos = PerceptronModel.load("/tmp/pos_ud_gsd_fr_2.0.2_2.4_1556531457346/")
      .setInputCols("document", "token")
      .setOutputCol("pos")
Last updated