Pretrained Models
Pretrained Models moved to its own dedicated repository. Please follow this link for the updated list: https://github.com/JohnSnowLabs/spark-nlp-models
How to use Pretrained Models
Online
You can follow this approach to use Spark NLP pretrained models:
# load NER model trained by deep learning approach and GloVe word embeddings
ner_dl = NerDLModel.pretrained('ner_dl')
# load NER model trained by deep learning approach and BERT word embeddings
ner_bert = NerDLModel.pretrained('ner_dl_bert')
The default language is en
, so for other laguages you should set the language:
// load French POS tagger model trained by Universal Dependencies
val french_pos = PerceptronModel.pretrained("pos_ud_gsd", lang="fr")
// load Italain LemmatizerModel
val italian_lemma = LemmatizerModel.pretrained("lemma_dxc", lang="it")
Offline
If you have any trouble using online pipelines or models in your environment (maybe it’s air-gapped), you can directly download them for offline
use.
After downloading offline models/pipelines and extracting them, here is how you can use them iside your code (the path could be a shared storage like HDFS in a cluster):
- Loading
PerceptronModel
annotator model inside Spark NLP Pipeline
val french_pos = PerceptronModel.load("/tmp/pos_ud_gsd_fr_2.0.2_2.4_1556531457346/")
.setInputCols("document", "token")
.setOutputCol("pos")
Public Models
If you wish to use a pre-trained model for a specific annotator in your pipeline, you need to use the annotator which is mentioned under Model
following with pretrained(name, lang)
function.
Example to load a pretraiand BERT model or NER model:
bert = BertEmbeddings.pretrained(name='bert_base_cased', lang='en')
ner_onto = NerDLModel.pretrained(name='ner_dl_bert', lang='en')
NOTE: build
means the model can be downloaded or loaded for that specific version or above. For instance, 2.4.0
can be used in all the releases after 2.4.x
but not before.
Dutch - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma |
2.5.0 | nl |
Download |
PerceptronModel (POS UD) | pos_ud_alpino |
2.5.0 | nl |
Download |
NerDLModel (glove_100d) | wikiner_6B_100 |
2.5.0 | nl |
Download |
NerDLModel (glove_6B_300) | wikiner_6B_300 |
2.5.0 | nl |
Download |
NerDLModel (glove_840B_300) | wikiner_840B_300 |
2.5.0 | nl |
Download |
English - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma_antbnc |
2.0.2 | en |
Download |
PerceptronModel (POS) | pos_anc |
2.0.2 | en |
Download |
PerceptronModel (POS UD) | pos_ud_ewt |
2.2.2 | en |
Download |
NerCrfModel (NER with GloVe) | ner_crf |
2.4.0 | en |
Download |
NerDLModel (NER with GloVe) | ner_dl |
2.4.3 | en |
Download |
NerDLModel (NER with BERT) | ner_dl_bert |
2.4.3 | en |
Download |
NerDLModel (OntoNotes with GloVe 100d) | onto_100 |
2.4.0 | en |
Download |
NerDLModel (OntoNotes with GloVe 300d) | onto_300 |
2.4.0 | en |
Download |
SymmetricDeleteModel (Spell Checker) | spellcheck_sd |
2.0.2 | en |
Download |
NorvigSweetingModel (Spell Checker) | spellcheck_norvig |
2.0.2 | en |
Download |
ViveknSentimentModel (Sentiment) | sentiment_vivekn |
2.0.2 | en |
Download |
DependencyParser (Dependency) | dependency_conllu |
2.0.8 | en |
Download |
TypedDependencyParser (Dependency) | dependency_typed_conllu |
2.0.8 | en |
Download |
Embeddings
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
WordEmbeddings (GloVe) | glove_100d |
2.4.0 | en |
Download |
BertEmbeddings | bert_base_uncased |
2.4.0 | en |
Download |
BertEmbeddings | bert_base_cased |
2.4.0 | en |
Download |
BertEmbeddings | bert_large_uncased |
2.4.0 | en |
Download |
BertEmbeddings | bert_large_cased |
2.4.0 | en |
Download |
ElmoEmbeddings | elmo |
2.4.0 | en |
Download |
UniversalSentenceEncoder (USE) | tfhub_use |
2.4.0 | en |
Download |
UniversalSentenceEncoder (USE) | tfhub_use_lg |
2.4.0 | en |
Download |
AlbertEmbeddings | albert_base_uncased |
2.5.0 | en |
Download |
AlbertEmbeddings | albert_large_uncased |
2.5.0 | en |
Download |
AlbertEmbeddings | albert_xlarge_uncased |
2.5.0 | en |
Download |
AlbertEmbeddings | albert_xxlarge_uncased |
2.5.0 | en |
Download |
XlnetEmbeddings | xlnet_base_cased |
2.5.0 | en |
Download |
XlnetEmbeddings | xlnet_large_cased |
2.5.0 | en |
Download |
Classification
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
ClassifierDL (with tfhub_use) | classifierdl_use_trec6 |
2.5.0 | en |
Download |
ClassifierDL (with tfhub_use) | classifierdl_use_trec50 |
2.5.0 | en |
Download |
SentimentDL (with tfhub_use) | sentimentdl_use_imdb |
2.5.0 | en |
Download |
SentimentDL (with tfhub_use) | sentimentdl_use_twitter |
2.5.0 | en |
Download |
SentimentDL (with glove_100d) | sentimentdl_glove_imdb |
2.5.0 | en |
Download |
French - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma |
2.0.2 | fr |
Download |
PerceptronModel (POS UD) | pos_ud_gsd |
2.0.2 | fr |
Download |
NerDLModel (glove_840B_300) | wikiner_840B_300 |
2.0.2 | fr |
Download |
Feature | Description | |
---|---|---|
Lemma | Trained by Lemmatizer annotator on lemmatization-lists by Michal Měchura |
|
POS | Trained by PerceptronApproach annotator on the Universal Dependencies | |
NER | Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER , LOC , ORG and MISC entities |
German - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma |
2.0.8 | de |
Download |
PerceptronModel (POS UD) | pos_ud_hdt |
2.0.8 | de |
Download |
NerDLModel (glove_840B_300) | wikiner_840B_300 |
2.4.0 | de |
Download |
Feature | Description |
---|---|
Lemma | Trained by Lemmatizer annotator on lemmatization-lists by Michal Měchura |
POS | Trained by PerceptronApproach annotator on the Universal Dependencies |
NER | Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER , LOC , ORG and MISC entities |
Italian - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma_dxc |
2.0.2 | it |
Download |
ViveknSentimentAnalysis (Sentiment) | sentiment_dxc |
2.0.2 | it |
Download |
PerceptronModel (POS UD) | pos_ud_isdt |
2.0.8 | it |
Download |
NerDLModel (glove_840B_300) | wikiner_840B_300 |
2.4.0 | it |
Download |
Feature | Description |
---|---|
Lemma | Trained by Lemmatizer annotator on DXC Technology dataset |
POS | Trained by PerceptronApproach annotator on the Universal Dependencies |
NER | Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER , LOC , ORG and MISC entities |
Norwegian - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma |
2.5.0 | nb |
Download |
PerceptronModel (POS UD) | pos_ud_nynorsk |
2.5.0 | nn |
Download |
PerceptronModel (POS UD) | pos_ud_bokmaal |
2.5.0 | nb |
Download |
NerDLModel (glove_100d) | norne_6B_100 |
2.5.0 | no |
Download |
NerDLModel (glove_6B_300) | norne_6B_300 |
2.5.0 | no |
Download |
NerDLModel (glove_840B_300) | norne_840B_300 |
2.5.0 | no |
Download |
Polish - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma |
2.5.0 | pl |
Download |
PerceptronModel (POS UD) | pos_ud_lfg |
2.5.0 | pl |
Download |
NerDLModel (glove_100d) | wikiner_6B_100 |
2.5.0 | pl |
Download |
NerDLModel (glove_6B_300) | wikiner_6B_300 |
2.5.0 | pl |
Download |
NerDLModel (glove_840B_300) | wikiner_840B_300 |
2.5.0 | pl |
Download |
Portuguese - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma |
2.5.0 | pt |
Download |
PerceptronModel (POS UD) | pos_ud_bosque |
2.5.0 | pt |
Download |
NerDLModel (glove_100d) | wikiner_6B_100 |
2.5.0 | pt |
Download |
NerDLModel (glove_6B_300) | wikiner_6B_300 |
2.5.0 | pt |
Download |
NerDLModel (glove_840B_300) | wikiner_840B_300 |
2.5.0 | pt |
Download |
Russian - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma |
2.4.4 | ru |
Download |
PerceptronModel (POS UD) | pos_ud_gsd |
2.4.4 | ru |
Download |
NerDLModel (glove_100d) | wikiner_6B_100 |
2.4.4 | ru |
Download |
NerDLModel (glove_6B_300) | wikiner_6B_300 |
2.4.4 | ru |
Download |
NerDLModel (glove_840B_300) | wikiner_840B_300 |
2.4.4 | ru |
Download |
Feature | Description |
---|---|
Lemma | Trained by Lemmatizer annotator on the Universal Dependencies |
POS | Trained by PerceptronApproach annotator on the Universal Dependencies |
NER | Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER , LOC , ORG and MISC entities |
Spanish - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma |
2.4.0 | es |
Download |
PerceptronModel (POS UD) | pos_ud_gsd |
2.4.0 | es |
Download |
NerDLModel (glove_100d) | wikiner_6B_100 |
2.4.0 | es |
Download |
NerDLModel (glove_6B_300) | wikiner_6B_300 |
2.4.0 | es |
Download |
NerDLModel (glove_840B_300) | wikiner_840B_300 |
2.4.0 | es |
Download |
Feature | Description |
---|---|
Lemma | Trained by Lemmatizer annotator on lemmatization-lists by Michal Měchura |
POS | Trained by PerceptronApproach annotator on the Universal Dependencies |
NER | Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER , LOC , ORG and MISC entities |
Bulgarian - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma |
2.5.0 | bg |
Download |
PerceptronModel (POS UD) | pos_ud_btb |
2.5.0 | bg |
Download |
Czech - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma |
2.5.0 | cs |
Download |
PerceptronModel (POS UD) | pos_ud_pdt |
2.5.0 | cs |
Download |
Greek - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma |
2.5.0 | el |
Download |
PerceptronModel (POS UD) | pos_ud_gdt |
2.5.0 | el |
Download |
Finnish - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma |
2.5.0 | fi |
Download |
PerceptronModel (POS UD) | pos_ud_tdt |
2.5.0 | fi |
Download |
Hungarian - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma |
2.5.0 | hu |
Download |
PerceptronModel (POS UD) | pos_ud_szeged |
2.5.0 | hu |
Download |
Romanian - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma |
2.5.0 | ro |
Download |
PerceptronModel (POS UD) | pos_ud_rrt |
2.5.0 | ro |
Download |
Slovak - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma |
2.5.0 | sk |
Download |
PerceptronModel (POS UD) | pos_ud_snk |
2.5.0 | sk |
Download |
Swedish - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma |
2.5.0 | sv |
Download |
PerceptronModel (POS UD) | pos_ud_tal |
2.5.0 | sv |
Download |
Turkish - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma |
2.5.0 | tr |
Download |
PerceptronModel (POS UD) | pos_ud_imst |
2.5.0 | tr |
Download |
Ukrainian - Models
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
LemmatizerModel (Lemmatizer) | lemma |
2.5.0 | uk |
Download |
PerceptronModel (POS UD) | pos_ud_iu |
2.5.0 | uk |
Download |
Multi-language
Model | Name | Build | Lang | Offline |
---|---|---|---|---|
WordEmbeddings (GloVe) | glove_840B_300 |
2.4.0 | xx |
Download |
WordEmbeddings (GloVe) | glove_6B_300 |
2.4.0 | xx |
Download |
BertEmbeddings (multi_cased) | bert_multi_cased |
2.4.0 | xx |
Download |
LanguageDetectorDL | ld_wiki_7 |
2.5.2 | xx |
Download |
LanguageDetectorDL | ld_wiki_20 |
2.5.2 | xx |
Download |
- The model with 7 languages: Czech, German, English, Spanish, French, Italy, and Slovak
- The model with 20 languages: Bulgarian, Czech, German, Greek, English, Spanish, Finnish, French, Croatian, Hungarian, Italy, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Swedish, Turkish, and Ukrainian
Please follow this link for the updated list: https://github.com/JohnSnowLabs/spark-nlp-models