Pretrained Pipelines have moved to Models Hub. Please follow this link for the updated list of all models and pipelines: Models Hub
English
NOTE:
noncontrib
pipelines are compatible with Windows
operating systems.
Pipelines | Name |
---|---|
Explain Document ML | explain_document_ml |
Explain Document DL | explain_document_dl |
Explain Document DL Win | explain_document_dl_noncontrib |
Explain Document DL Fast | explain_document_dl_fast |
Explain Document DL Fast Win | explain_document_dl_fast_noncontrib |
Recognize Entities DL | recognize_entities_dl |
Recognize Entities DL Win | recognize_entities_dl_noncontrib |
OntoNotes Entities Small | onto_recognize_entities_sm |
OntoNotes Entities Large | onto_recognize_entities_lg |
Match Datetime | match_datetime |
Match Pattern | match_pattern |
Match Chunk | match_chunks |
Match Phrases | match_phrases |
Clean Stop | clean_stop |
Clean Pattern | clean_pattern |
Clean Slang | clean_slang |
Check Spelling | check_spelling |
Analyze Sentiment | analyze_sentiment |
Analyze Sentiment DL | analyze_sentimentdl_use_imdb |
Analyze Sentiment DL | analyze_sentimentdl_use_twitter |
Dependency Parse | dependency_parse |
explain_document_ml
explain_document_dl
recognize_entities_dl
onto_recognize_entities_sm
Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the OntoNotes corpus and supports the identification of 18 entities.
onto_recognize_entities_lg
Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the OntoNotes corpus and supports the identification of 18 entities.
match_datetime
DateMatcher yyyy/MM/dd
match_pattern
RegexMatcher (match phone numbers)
match_chunks
The pipeline uses regex <DT/>?/<JJ/>*<NN>+
French
Pipelines | Name |
---|---|
Explain Document Large | explain_document_lg |
Explain Document Medium | explain_document_md |
Entity Recognizer Large | entity_recognizer_lg |
Entity Recognizer Medium | entity_recognizer_md |
Feature | Description |
---|---|
NER | Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER , LOC , ORG and MISC entities |
Lemma | Trained by Lemmatizer annotator on lemmatization-lists by Michal Měchura |
POS | Trained by PerceptronApproach annotator on the Universal Dependencies |
Size | Model size indicator, md and lg. The large pipeline uses glove_840B_300 and the medium uses glove_6B_300 WordEmbeddings |
French explain_document_lg
French explain_document_md
French entity_recognizer_lg
French entity_recognizer_md
Italian
Pipelines | Name |
---|---|
Explain Document Large | explain_document_lg |
Explain Document Medium | explain_document_md |
Entity Recognizer Large | entity_recognizer_lg |
Entity Recognizer Medium | entity_recognizer_md |
Feature | Description |
---|---|
NER | Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER , LOC , ORG and MISC entities |
Lemma | Trained by Lemmatizer annotator on DXC Technology dataset |
POS | Trained by PerceptronApproach annotator on the Universal Dependencies |
Size | Model size indicator, md and lg. The large pipeline uses glove_840B_300 and the medium uses glove_6B_300 WordEmbeddings |
Italian explain_document_lg
Italian explain_document_md
Italian entity_recognizer_lg
Italian entity_recognizer_md
Spanish
Pipeline | Name | Build | lang | Description | Offline |
---|---|---|---|---|---|
Explain Document Small | explain_document_sm |
2.4.0 | es |
Download | |
Explain Document Medium | explain_document_md |
2.4.0 | es |
Download | |
Explain Document Large | explain_document_lg |
2.4.0 | es |
Download | |
Entity Recognizer Small | entity_recognizer_sm |
2.4.0 | es |
Download | |
Entity Recognizer Medium | entity_recognizer_md |
2.4.0 | es |
Download | |
Entity Recognizer Large | entity_recognizer_lg |
2.4.0 | es |
Download |
Feature | Description |
---|---|
Lemma | Trained by Lemmatizer annotator on lemmatization-lists by Michal Měchura |
POS | Trained by PerceptronApproach annotator on the Universal Dependencies |
NER | Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER , LOC , ORG and MISC entities |
Size | Model size indicator, sm, md, and lg. The small pipelines use glove_100d, the medium pipelines use glove_6B_300, and large pipelines use glove_840B_300 WordEmbeddings |
Russian
Pipeline | Name | Build | lang | Description | Offline |
---|---|---|---|---|---|
Explain Document Small | explain_document_sm |
2.4.4 | ru |
Download | |
Explain Document Medium | explain_document_md |
2.4.4 | ru |
Download | |
Explain Document Large | explain_document_lg |
2.4.4 | ru |
Download | |
Entity Recognizer Small | entity_recognizer_sm |
2.4.4 | ru |
Download | |
Entity Recognizer Medium | entity_recognizer_md |
2.4.4 | ru |
Download | |
Entity Recognizer Large | entity_recognizer_lg |
2.4.4 | ru |
Download |
Feature | Description |
---|---|
Lemma | Trained by Lemmatizer annotator on the Universal Dependencies |
POS | Trained by PerceptronApproach annotator on the Universal Dependencies |
NER | Trained by NerDLApproach annotator with Char CNNs - BiLSTM - CRF and GloVe Embeddings on the WikiNER corpus and supports the identification of PER , LOC , ORG and MISC entities |
Dutch
Pipeline | Name | Build | lang | Description | Offline |
---|---|---|---|---|---|
Explain Document Small | explain_document_sm |
2.5.0 | nl |
Download | |
Explain Document Medium | explain_document_md |
2.5.0 | nl |
Download | |
Explain Document Large | explain_document_lg |
2.5.0 | nl |
Download | |
Entity Recognizer Small | entity_recognizer_sm |
2.5.0 | nl |
Download | |
Entity Recognizer Medium | entity_recognizer_md |
2.5.0 | nl |
Download | |
Entity Recognizer Large | entity_recognizer_lg |
2.5.0 | nl |
Download |
Norwegian
Pipeline | Name | Build | lang | Description | Offline |
---|---|---|---|---|---|
Explain Document Small | explain_document_sm |
2.5.0 | no |
Download | |
Explain Document Medium | explain_document_md |
2.5.0 | no |
Download | |
Explain Document Large | explain_document_lg |
2.5.0 | no |
Download | |
Entity Recognizer Small | entity_recognizer_sm |
2.5.0 | no |
Download | |
Entity Recognizer Medium | entity_recognizer_md |
2.5.0 | no |
Download | |
Entity Recognizer Large | entity_recognizer_lg |
2.5.0 | no |
Download |
Polish
Pipeline | Name | Build | lang | Description | Offline |
---|---|---|---|---|---|
Explain Document Small | explain_document_sm |
2.5.0 | pl |
Download | |
Explain Document Medium | explain_document_md |
2.5.0 | pl |
Download | |
Explain Document Large | explain_document_lg |
2.5.0 | pl |
Download | |
Entity Recognizer Small | entity_recognizer_sm |
2.5.0 | pl |
Download | |
Entity Recognizer Medium | entity_recognizer_md |
2.5.0 | pl |
Download | |
Entity Recognizer Large | entity_recognizer_lg |
2.5.0 | pl |
Download |
Portuguese
Pipeline | Name | Build | lang | Description | Offline |
---|---|---|---|---|---|
Explain Document Small | explain_document_sm |
2.5.0 | pt |
Download | |
Explain Document Medium | explain_document_md |
2.5.0 | pt |
Download | |
Explain Document Large | explain_document_lg |
2.5.0 | pt |
Download | |
Entity Recognizer Small | entity_recognizer_sm |
2.5.0 | pt |
Download | |
Entity Recognizer Medium | entity_recognizer_md |
2.5.0 | pt |
Download | |
Entity Recognizer Large | entity_recognizer_lg |
2.5.0 | pt |
Download |
Multi-language
Pipeline | Name | Build | lang | Description | Offline |
---|---|---|---|---|---|
LanguageDetectorDL | detect_language_7 |
2.5.2 | xx |
Download | |
LanguageDetectorDL | detect_language_20 |
2.5.2 | xx |
Download |
- The model with 7 languages: Czech, German, English, Spanish, French, Italy, and Slovak
- The model with 20 languages: Bulgarian, Czech, German, Greek, English, Spanish, Finnish, French, Croatian, Hungarian, Italy, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Swedish, Turkish, and Ukrainian
How to use
Online
To use Spark NLP pretrained pipelines, you can call PretrainedPipeline
with pipeline’s name and its language (default is en
):
Same in Scala
Offline
If you have any trouble using online pipelines or models in your environment (maybe it’s air-gapped), you can directly download them for offline
use.
After downloading offline models/pipelines and extracting them, here is how you can use them iside your code (the path could be a shared storage like HDFS in a cluster):