Models

 

Pretrained Models

Pretrained Models moved to its own dedicated repository. Please follow this link for updated list: https://github.com/JohnSnowLabs/spark-nlp-models

How to use Pretrained Models

Online

You can follow this approach to use Spark NLP pretrained models:

# load NER model trained by deep learning approach and GloVe word embeddings
ner_dl = NerDLModel.pretrained('ner_dl')
# load NER model trained by deep learning approach and BERT word embeddings
ner_bert = NerDLModel.pretrained('ner_dl_bert')

The default language is en, so for other laguages you should set the language:

// load French POS tagger model trained by Universal Dependencies
val french_pos = PerceptronModel.pretrained("pos_ud_gsd", lang="fr")
// load Italain LemmatizerModel
val italian_lemma = LemmatizerModel.pretrained("lemma_dxc", lang="it")

Offline

If you have any trouble using online pipelines or models in your environment (maybe it’s air-gapped), you can directly download them for offline use.

After downloading offline models/pipelines and extracting them, here is how you can use them iside your code (the path could be a shared storage like HDFS in a cluster):

  • Loading PerceptronModel annotator model inside Spark NLP Pipeline
val french_pos = PerceptronModel.load("/tmp/pos_ud_gsd_fr_2.0.2_2.4_1556531457346/")
      .setInputCols("document", "token")
      .setOutputCol("pos")

Public Models

pretrained(name, lang) function to use

English - Models

Model Name Build Description Notes Offline
LemmatizerModel (Lemmatizer) lemma_antbnc 2.0.2     Download
PerceptronModel (POS) pos_anc 2.0.2     Download
NerCrfModel (NER with GloVe) ner_crf 2.4.0     Download
NerDLModel (NER with GloVe) ner_dl 2.4.0     Download
NerDLModel (OntoNotes with GloVe 100d) onto_100 2.4.0     Download
NerDLModel (OntoNotes with GloVe 300d) onto_300 2.4.0     Download
WordEmbeddings (GloVe) glove_100d 2.4.0     Download
BertEmbeddings (base_uncased) bert_base_uncased 2.4.0     Download
BertEmbeddings (base_cased) bert_base_cased 2.4.0     Download
BertEmbeddings (large_uncased) bert_large_uncased 2.4.0     Download
BertEmbeddings (large_cased) bert_large_cased 2.4.0     Download
ElmoEmbeddings elmo 2.4.0     Download
UniversalSentenceEncoder tf_hub_use 2.4.0     Download
UniversalSentenceEncoder tf_hub_use_lg 2.4.0     Download
NerDLModel ner_dl_sentence 2.4.0     Download
SymmetricDeleteModel (Spell Checker) spellcheck_sd 2.0.2     Download
NorvigSweetingModel (Spell Checker) spellcheck_norvig 2.0.2     Download
ViveknSentimentModel (Sentiment) sentiment_vivekn 2.0.2     Download
DependencyParser (Dependency) dependency_conllu 2.0.8     Download
TypedDependencyParser (Dependency) dependency_typed_conllu 2.0.8     Download

French - Models

Model Name Build Notes Description Offline
LemmatizerModel (Lemmatizer) lemma 2.0.2     Download
PerceptronModel (POS UD) pos_ud_gsd 2.0.2     Download
NerDLModel (glove_840B_300) wikiner_840B_300 2.0.2     Download
Feature Description  
Lemma Trained by Lemmatizer annotator on lemmatization-lists by Michal Měchura  
POS Trained by PerceptronApproach annotator on the Universal Dependencies  
NER Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER, LOC, ORG and MISC entities  

German - Models

Model Name Build Notes Description Offline
LemmatizerModel (Lemmatizer) lemma 2.0.8     Download
PerceptronModel (POS UD) pos_ud_hdt 2.0.8     Download
NerDLModel (glove_840B_300) wikiner_840B_300 2.4.0     Download
Feature Description
Lemma Trained by Lemmatizer annotator on lemmatization-lists by Michal Měchura
POS Trained by PerceptronApproach annotator on the Universal Dependencies
NER Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER, LOC, ORG and MISC entities

Italian - Models

Model Name Build Notes Description Offline
LemmatizerModel (Lemmatizer) lemma_dxc 2.0.2     Download
ViveknSentimentAnalysis (Sentiment) sentiment_dxc 2.0.2     Download
PerceptronModel (POS UD) pos_ud_isdt 2.0.8     Download
NerDLModel (glove_840B_300) wikiner_840B_300 2.4.0     Download
Feature Description
Lemma Trained by Lemmatizer annotator on DXC Technology dataset
POS Trained by PerceptronApproach annotator on the Universal Dependencies
NER Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER, LOC, ORG and MISC entities

Spanish - Models

Model Name Build Notes Description Offline
LemmatizerModel (Lemmatizer) lemma 2.4.0     Download
PerceptronModel (POS UD) pos_ud_gsd 2.4.0     Download
NerDLModel (glove_100d) wikiner_6B_100 2.4.0     Download
NerDLModel (glove_6B_300) wikiner_6B_300 2.4.0     Download
NerDLModel (glove_840B_300) wikiner_840B_300 2.4.0     Download
Feature Description
Lemma Trained by Lemmatizer annotator on lemmatization-lists by Michal Měchura
POS Trained by PerceptronApproach annotator on the Universal Dependencies
NER Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER, LOC, ORG and MISC entities

Russian - Models

Model Name Build Notes Description Offline
LemmatizerModel (Lemmatizer) lemma 2.4.4     Download
PerceptronModel (POS UD) pos_ud_gsd 2.4.4     Download
NerDLModel (glove_100d) wikiner_6B_100 2.4.4     Download
NerDLModel (glove_6B_300) wikiner_6B_300 2.4.4     Download
NerDLModel (glove_840B_300) wikiner_840B_300 2.4.4     Download
Feature Description
Lemma Trained by Lemmatizer annotator on the Universal Dependencies
POS Trained by PerceptronApproach annotator on the Universal Dependencies
NER Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER, LOC, ORG and MISC entities

Multi-language

Model Name Build Notes Description Offline  
WordEmbeddings (GloVe) glove_840B_300 2.4.0     Download  
WordEmbeddings (GloVe) glove_6B_300 2.4.0     Download  
BertEmbeddings (multi_cased) bert_multi_cased 2.4.0     Download  
Last updated