Models

 

Pretrained Models

Pretrained Models moved to its own dedicated repository. Please follow this link for updated list: https://github.com/JohnSnowLabs/spark-nlp-models

How to use Pretrained Models

Online

You can follow this approach to use Spark NLP pretrained models:

# load NER model trained by deep learning approach and GloVe word embeddings
ner_dl = NerDLModel.pretrained('ner_dl')
# load NER model trained by deep learning approach and BERT word embeddings
ner_bert = NerDLModel.pretrained('ner_dl_bert')

The default language is en, so for other laguages you should set the language:

// load French POS tagger model trained by Universal Dependencies
val french_pos = PerceptronModel.pretrained("pos_ud_gsd", lang="fr")
// load Italain LemmatizerModel
val italian_lemma = LemmatizerModel.pretrained("lemma_dxc", lang="it")

Offline

If you have any trouble using online pipelines or models in your environment (maybe it’s air-gapped), you can directly download them for offline use.

After downloading offline models/pipelines and extracting them, here is how you can use them iside your code (the path could be a shared storage like HDFS in a cluster):

  • Loading PerceptronModel annotator model inside Spark NLP Pipeline
val french_pos = PerceptronModel.load("/tmp/pos_ud_gsd_fr_2.0.2_2.4_1556531457346/")
      .setInputCols("document", "token")
      .setOutputCol("pos")

Public Models

If you wish to use a pre-trained model for a specific annotator in your pipeline, you need to use the annotator which is mentioned under Model following with pretrained(name, lang) function.

Example to load a pretraiand BERT model or NER model:

bert = BertEmbeddings.pretrained(name='bert_base_cased', lang='en')

ner_onto = NerDLModel.pretrained(name='ner_dl_bert', lang='en')

NOTE: build means the model can be downloaded or loaded for that specific version or above. For instance, 2.4.0 can be used in all the releases after 2.4.x but not before.

Dutch - Models

Model Name Build Lang Offline
LemmatizerModel (Lemmatizer) lemma 2.5.0 nl Download
PerceptronModel (POS UD) pos_ud_alpino 2.5.0 nl Download
NerDLModel (glove_100d) wikiner_6B_100 2.5.0 nl Download
NerDLModel (glove_6B_300) wikiner_6B_300 2.5.0 nl Download
NerDLModel (glove_840B_300) wikiner_840B_300 2.5.0 nl Download

English - Models

Model Name Build Lang Offline
LemmatizerModel (Lemmatizer) lemma_antbnc 2.0.2 en Download
PerceptronModel (POS) pos_anc 2.0.2 en Download
PerceptronModel (POS UD) pos_ud_ewt 2.2.2 en Download
NerCrfModel (NER with GloVe) ner_crf 2.4.0 en Download
NerDLModel (NER with GloVe) ner_dl 2.4.3 en Download
NerDLModel (NER with BERT) ner_dl_bert 2.4.3 en Download
NerDLModel (OntoNotes with GloVe 100d) onto_100 2.4.0 en Download
NerDLModel (OntoNotes with GloVe 300d) onto_300 2.4.0 en Download
DeepSentenceDetector ner_dl_sentence 2.4.0 en Download
SymmetricDeleteModel (Spell Checker) spellcheck_sd 2.0.2 en Download
NorvigSweetingModel (Spell Checker) spellcheck_norvig 2.0.2 en Download
ViveknSentimentModel (Sentiment) sentiment_vivekn 2.0.2 en Download
DependencyParser (Dependency) dependency_conllu 2.0.8 en Download
TypedDependencyParser (Dependency) dependency_typed_conllu 2.0.8 en Download

Embeddings

Model Name Build Lang Offline
WordEmbeddings (GloVe) glove_100d 2.4.0 en Download
BertEmbeddings bert_base_uncased 2.4.0 en Download
BertEmbeddings bert_base_cased 2.4.0 en Download
BertEmbeddings bert_large_uncased 2.4.0 en Download
BertEmbeddings bert_large_cased 2.4.0 en Download
ElmoEmbeddings elmo 2.4.0 en Download
UniversalSentenceEncoder (USE) tfhub_use 2.4.0 en Download
UniversalSentenceEncoder (USE) tfhub_use_lg 2.4.0 en Download
AlbertEmbeddings albert_base_uncased 2.5.0 en Download
AlbertEmbeddings albert_large_uncased 2.5.0 en Download
AlbertEmbeddings albert_xlarge_uncased 2.5.0 en Download
AlbertEmbeddings albert_xxlarge_uncased 2.5.0 en Download
XlnetEmbeddings xlnet_base_cased 2.5.0 en Download
XlnetEmbeddings xlnet_large_cased 2.5.0 en Download

Classification

Model Name Build Lang Offline
ClassifierDL (with tfhub_use) classifierdl_use_trec6 2.5.0 en Download
ClassifierDL (with tfhub_use) classifierdl_use_trec50 2.5.0 en Download
SentimentDL (with tfhub_use) sentimentdl_use_imdb 2.5.0 en Download
SentimentDL (with tfhub_use) sentimentdl_use_twitter 2.5.0 en Download
SentimentDL (with glove_100d) sentimentdl_glove_imdb 2.5.0 en Download

French - Models

Model Name Build Lang Offline
LemmatizerModel (Lemmatizer) lemma 2.0.2 fr Download
PerceptronModel (POS UD) pos_ud_gsd 2.0.2 fr Download
NerDLModel (glove_840B_300) wikiner_840B_300 2.0.2 fr Download
Feature Description  
Lemma Trained by Lemmatizer annotator on lemmatization-lists by Michal Měchura  
POS Trained by PerceptronApproach annotator on the Universal Dependencies  
NER Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER, LOC, ORG and MISC entities  

German - Models

Model Name Build Lang Offline
LemmatizerModel (Lemmatizer) lemma 2.0.8 de Download
PerceptronModel (POS UD) pos_ud_hdt 2.0.8 de Download
NerDLModel (glove_840B_300) wikiner_840B_300 2.4.0 de Download
Feature Description
Lemma Trained by Lemmatizer annotator on lemmatization-lists by Michal Měchura
POS Trained by PerceptronApproach annotator on the Universal Dependencies
NER Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER, LOC, ORG and MISC entities

Italian - Models

Model Name Build Lang Offline
LemmatizerModel (Lemmatizer) lemma_dxc 2.0.2 it Download
ViveknSentimentAnalysis (Sentiment) sentiment_dxc 2.0.2 it Download
PerceptronModel (POS UD) pos_ud_isdt 2.0.8 it Download
NerDLModel (glove_840B_300) wikiner_840B_300 2.4.0 it Download
Feature Description
Lemma Trained by Lemmatizer annotator on DXC Technology dataset
POS Trained by PerceptronApproach annotator on the Universal Dependencies
NER Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER, LOC, ORG and MISC entities

Norwegian - Models

Model Name Build Lang Offline
LemmatizerModel (Lemmatizer) lemma 2.5.0 nb Download
PerceptronModel (POS UD) pos_ud_nynorsk 2.5.0 nn Download
PerceptronModel (POS UD) pos_ud_bokmaal 2.5.0 nb Download
NerDLModel (glove_100d) norne_6B_100 2.5.0 no Download
NerDLModel (glove_6B_300) norne_6B_300 2.5.0 no Download
NerDLModel (glove_840B_300) norne_840B_300 2.5.0 no Download

Polish - Models

Model Name Build Lang Offline
LemmatizerModel (Lemmatizer) lemma 2.5.0 pl Download
PerceptronModel (POS UD) pos_ud_lfg 2.5.0 pl Download
NerDLModel (glove_100d) wikiner_6B_100 2.5.0 pl Download
NerDLModel (glove_6B_300) wikiner_6B_300 2.5.0 pl Download
NerDLModel (glove_840B_300) wikiner_840B_300 2.5.0 pl Download

Portuguese - Models

Model Name Build Lang Offline
LemmatizerModel (Lemmatizer) lemma 2.5.0 pt Download
PerceptronModel (POS UD) pos_ud_bosque 2.5.0 pt Download
NerDLModel (glove_100d) wikiner_6B_100 2.5.0 pt Download
NerDLModel (glove_6B_300) wikiner_6B_300 2.5.0 pt Download
NerDLModel (glove_840B_300) wikiner_840B_300 2.5.0 pt Download

Russian - Models

Model Name Build Lang Offline
LemmatizerModel (Lemmatizer) lemma 2.4.4 ru Download
PerceptronModel (POS UD) pos_ud_gsd 2.4.4 ru Download
NerDLModel (glove_100d) wikiner_6B_100 2.4.4 ru Download
NerDLModel (glove_6B_300) wikiner_6B_300 2.4.4 ru Download
NerDLModel (glove_840B_300) wikiner_840B_300 2.4.4 ru Download
Feature Description
Lemma Trained by Lemmatizer annotator on the Universal Dependencies
POS Trained by PerceptronApproach annotator on the Universal Dependencies
NER Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER, LOC, ORG and MISC entities

Spanish - Models

Model Name Build Lang Offline
LemmatizerModel (Lemmatizer) lemma 2.4.0 es Download
PerceptronModel (POS UD) pos_ud_gsd 2.4.0 es Download
NerDLModel (glove_100d) wikiner_6B_100 2.4.0 es Download
NerDLModel (glove_6B_300) wikiner_6B_300 2.4.0 es Download
NerDLModel (glove_840B_300) wikiner_840B_300 2.4.0 es Download
Feature Description
Lemma Trained by Lemmatizer annotator on lemmatization-lists by Michal Měchura
POS Trained by PerceptronApproach annotator on the Universal Dependencies
NER Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER, LOC, ORG and MISC entities

Bulgarian - Models

Model Name Build Lang Offline
LemmatizerModel (Lemmatizer) lemma 2.5.0 bg Download
PerceptronModel (POS UD) pos_ud_btb 2.5.0 bg Download

Czech - Models

Model Name Build Lang Offline
LemmatizerModel (Lemmatizer) lemma 2.5.0 cs Download
PerceptronModel (POS UD) pos_ud_pdt 2.5.0 cs Download

Greek - Models

Model Name Build Lang Offline
LemmatizerModel (Lemmatizer) lemma 2.5.0 el Download
PerceptronModel (POS UD) pos_ud_gdt 2.5.0 el Download

Finnish - Models

Model Name Build Lang Offline
LemmatizerModel (Lemmatizer) lemma 2.5.0 fi Download
PerceptronModel (POS UD) pos_ud_tdt 2.5.0 fi Download

Hungarian - Models

Model Name Build Lang Offline
LemmatizerModel (Lemmatizer) lemma 2.5.0 hu Download
PerceptronModel (POS UD) pos_ud_szeged 2.5.0 hu Download

Romanian - Models

Model Name Build Lang Offline
LemmatizerModel (Lemmatizer) lemma 2.5.0 ro Download
PerceptronModel (POS UD) pos_ud_rrt 2.5.0 ro Download

Slovak - Models

Model Name Build Lang Offline
LemmatizerModel (Lemmatizer) lemma 2.5.0 sk Download
PerceptronModel (POS UD) pos_ud_snk 2.5.0 sk Download

Swedish - Models

Model Name Build Lang Offline
LemmatizerModel (Lemmatizer) lemma 2.5.0 sv Download
PerceptronModel (POS UD) pos_ud_tal 2.5.0 sv Download

Turkish - Models

Model Name Build Lang Offline
LemmatizerModel (Lemmatizer) lemma 2.5.0 tr Download
PerceptronModel (POS UD) pos_ud_imst 2.5.0 tr Download

Ukrainian - Models

Model Name Build Lang Offline
LemmatizerModel (Lemmatizer) lemma 2.5.0 uk Download
PerceptronModel (POS UD) pos_ud_iu 2.5.0 uk Download

Multi-language

Model Name Build Lang Offline
WordEmbeddings (GloVe) glove_840B_300 2.4.0 xx Download
WordEmbeddings (GloVe) glove_6B_300 2.4.0 xx Download
BertEmbeddings (multi_cased) bert_multi_cased 2.4.0 xx Download
LanguageDetectorDL ld_wiki_7 2.5.2 xx Download
LanguageDetectorDL ld_wiki_20 2.5.2 xx Download
  • The model with 7 languages: Czech, German, English, Spanish, French, Italy, and Slovak
  • The model with 20 languages: Bulgarian, Czech, German, Greek, English, Spanish, Finnish, French, Croatian, Hungarian, Italy, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Swedish, Turkish, and Ukrainian
Last updated