Models

 

Pretrained Models

pretrained(name, lang) function to use

English

Model Name en
LemmatizerModel (Lemmatizer) lemma_antbnc Download
PerceptronModel (POS) pos_anc Download
NerCRFModel (NER with GloVe) ner_crf Download
NerDLModel (NER with GloVe) ner_dl Download
NerDLModel (NER with GloVe) ner_dl_contrib Download
NerDLModel (NER with BERT) ner_dl_bert_base_cased Download
NerDLModel (OntoNotes with GloVe 100d) onto_100 Download
NerDLModel (OntoNotes with GloVe 300d) onto_300 Download
WordEmbeddings (GloVe) glove_100d Download
BertEmbeddings (base_uncased) bert_base_uncased Download
BertEmbeddings (base_cased) bert_base_cased Download
BertEmbeddings (large_uncased) bert_large_uncased Download
BertEmbeddings (large_cased) bert_large_cased Download
DeepSentenceDetector ner_dl_sentence Download
ContextSpellCheckerModel (Spell Checker) spellcheck_dl Download
SymmetricDeleteModel (Spell Checker) spellcheck_sd Download
NorvigSweetingModel (Spell Checker) spellcheck_norvig Download
ViveknSentimentModel (Sentiment) sentiment_vivekn Download
DependencyParser (Dependency) dependency_conllu Download
TypedDependencyParser (Dependency) dependency_typed_conllu Download

English - Licensed Enterprise

It is required to specify 3rd argument to pretrained(name, lang, loc) function (location) to add the location of these

Model name language loc
NerDLModel ner_clinical en clinical/models
AssertionLogRegModel assertion_ml en clinical/models
AssertionDLModel assertion_dl en clinical/models
NerDLModel deidentify_dl en clinical/models
DeIdentificationModel deidentify_rb en clinical/models
WordEmbeddingsModel embeddings_clinical en clinical/models
PerceptronModel pos_clinical en clinical/models
EntityResolverModel resolve_icd10 en clinical/models
EntityResolverModel resolve_icd10cm_cl_em en clinical/models
EntityResolverModel resolve_icd10pcs_cl_em en clinical/models
ContextSpellCheckerModel context_spell_med en clinical/models

French

Model Name fr
LemmatizerModel (Lemmatizer) lemma Download
PerceptronModel (POS UD) pos_ud_gsd Download
NerDLModel (glove_840B_300) wikiner_840B_300 Download
Feature Description
Lemma Trained by Lemmatizer annotator on lemmatization-lists by Michal Měchura
POS Trained by PerceptronApproach annotator on the Universal Dependencies
NER Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER, LOC, ORG and MISC entities

German

Model Name de
LemmatizerModel (Lemmatizer) lemma de
PerceptronModel (POS UD) pos_ud_hdt de
NerDLModel (glove_840B_300) wikiner_840B_300 de
Feature Description
Lemma Trained by Lemmatizer annotator on lemmatization-lists by Michal Měchura
POS Trained by PerceptronApproach annotator on the Universal Dependencies
NER Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER, LOC, ORG and MISC entities

Italian

Model Name it
LemmatizerModel (Lemmatizer) lemma_dxc Download
SentimentDetector (Sentiment) sentiment_dxc Download
PerceptronModel (POS UD) pos_ud_isdt Download
NerDLModel (glove_840B_300) wikiner_840B_300 Download
Feature Description
Lemma Trained by Lemmatizer annotator on DXC Technology dataset
POS Trained by PerceptronApproach annotator on the Universal Dependencies
NER Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER, LOC, ORG and MISC entities

Multi-language

Model Name xx
WordEmbeddings (GloVe) glove_840B_300 Download
WordEmbeddings (GloVe) glove_6B_300 Download
BertEmbeddings (multi_cased) bert_multi_cased Download

How to use Pretrained Models

Online

You can follow this approach to use Spark NLP pretrained models:

# load NER model trained by deep learning approach and GloVe word embeddings
ner_dl = NerDLModel.pretrained('ner_dl')
# load NER model trained by deep learning approach and BERT word embeddings
ner_bert = NerDLModel.pretrained('ner_dl_bert')

The default language is en, so for other laguages you should set the language:

// load French POS tagger model trained by Universal Dependencies
val french_pos = PerceptronModel.pretrained("pos_ud_gsd", lang="fr")
// load Italain LemmatizerModel
val italian_lemma = LemmatizerModel.pretrained("lemma_dxc", lang="it")

Offline

If you have any trouble using online pipelines or models in your environment (maybe it’s air-gapped), you can directly download them for offline use.

After downloading offline models/pipelines and extracting them, here is how you can use them iside your code (the path could be a shared storage like HDFS in a cluster):

  • Loading PerceptronModel annotator model inside Spark NLP Pipeline
val french_pos = PerceptronModel.load("/tmp/pos_ud_gsd_fr_2.0.2_2.4_1556531457346/")
      .setInputCols("document", "token")
      .setOutputCol("pos")
Last updated