# Universal Sentence Encoder Large

## Description

The Universal Sentence Encoder encodes text into high-dimensional vectors that can be used for text classification, semantic similarity, clustering and other natural language tasks.

The model is trained and optimized for greater-than-word length text, such as sentences, phrases or short paragraphs. It is trained on a variety of data sources and a variety of tasks with the aim of dynamically accommodating a wide variety of natural language understanding tasks. The input is variable length English text and the output is a 512 dimensional vector. We apply this model to the STS benchmark for semantic similarity, and the results can be seen in the example notebook made available. The universal-sentence-encoder model is trained with a deep averaging network (DAN) encoder.

The details are described in the paper “Universal Sentence Encoder”.

## How to use


embeddings = UniversalSentenceEncoder.pretrained("tfhub_use_lg", "en") \
.setInputCols("document") \
.setOutputCol("sentence_embeddings")


val embeddings = UniversalSentenceEncoder.pretrained("tfhub_use_lg", "en")
.setInputCols("document")
.setOutputCol("sentence_embeddings")


## Model Information

 Model Name: tfhub_use_lg Type: embeddings Compatibility: Spark NLP 2.4.0 License: Open Source Edition: Official Input Labels: [sentence] Output Labels: [sentence_embeddings] Language: [en] Dimension: 512 Case sensitive: true

## Data Source

The model is imported from https://tfhub.dev/google/universal-sentence-encoder-large/3