GloVe Embeddings 840B 300 (Multilingual)


GloVe (Global Vectors) is a model for distributed word representation. This is achieved by mapping words into a meaningful space where the distance between words is related to semantic similarity. It outperformed many common Word2vec models on the word analogy task. One benefit of GloVe is that it is the result of directly modeling relationships, instead of getting them as a side effect of training a language model.


How to use

embeddings = WordEmbeddings.pretrained("glove_840B_300", "xx") \
      .setInputCols("sentence", "token") \
nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings])
pipeline_model =[[""]]).toDF("text"))
result = pipeline_model.transform(spark.createDataFrame(pd.DataFrame({"text": ["""I love Spark NLP"""]})))
val embeddings = WordEmbeddings.pretrained("glove_840B_300", "xx")
      .setInputCols(Array("sentence", "token"))
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings))
val result =["I love Spark NLP"].toDS.toDF("text")).transform(data)
import nlu

text = ["""I love Spark NLP"""]
glove_df = nlu.load('xx.embed.glove.840B_300').predict(text)


 token  |  glove_embeddings                                  |
 I	| [0.1941000074148178, 0.22603000700473785, -0.4...] |
 love	| [0.13948999345302582, 0.534529983997345, -0.25...] |
 Spark	| [0.20353999733924866, 0.6292600035667419, 0.27...] |
 NLP	| [0.059436000883579254, 0.18411000072956085, -0...] |

Model Information

Model Name: glove_840B_300
Type: embeddings
Compatibility: Spark NLP 2.4.0+
License: Open Source
Edition: Official
Input Labels: [sentence, token]
Output Labels: [word_embeddings]
Language: [xx]
Dimension: 300
Case sensitive: true

Data Source

The model is imported from