Fastext Word Embeddings in German

Description

Word Embeddings lookup annotator that maps tokens to vectors.

Predicted Entities

How to use

model = WordEmbeddingsModel.pretrained("w2v_cc_300d","de")\
	            .setInputCols(["document","token"])\
	            .setOutputCol("word_embeddings")

val model = WordEmbeddingsModel.pretrained("w2v_cc_300d","de")
	                .setInputCols(Array("document","token"))
	                .setOutputCol("word_embeddings")

import nlu
nlu.load("de.embed.w2v").predict("""Put your text here.""")

Results

Word2Vec feature vectors based on `w2v_cc_300d`.

Model Information

Model Name:	w2v_cc_300d
Type:	embeddings
Compatibility:	Spark NLP 2.5.5+
License:	Open Source
Edition:	Official
Input Labels:	[document, token]
Output Labels:	[embeddings]
Language:	de
Size:	1.2 GB
Case sensitive:	false
Dimension:	300

References

FastText common crawl word embeddings for Germany https://fasttext.cc/docs/en/crawl-vectors.html

PREVIOUSPipeline to Detect Time-related Terminology

NEXTNER Pipeline for Hindi+English