Finance E5 Embedding Base

Description

This model is a financial version of the E5 base model fine-tuned on in-house curated financial datasets. Reference: Wang, Liang, et al. “Text embeddings by weakly-supervised contrastive pre-training.” arXiv preprint arXiv:2212.03533 (2022).

Predicted Entities

Copy S3 URI

How to use

document_assembler = (
    nlp.DocumentAssembler().setInputCol("text").setOutputCol("document")
)

E5_embedding = (
    nlp.E5Embeddings.pretrained(
        "finembedding_e5_base", "en", "finance/models"
    )
    .setInputCols(["document"])
    .setOutputCol("E5")
)
pipeline = nlp.Pipeline(stages=[document_assembler, E5_embedding])

data = spark.createDataFrame(
    [["What is the best way to invest in the stock market?"]]
).toDF("text")

result = pipeline.fit(data).transform(data)
result. Select("E5.result").show()

Results

+----------------------------------------------------------------------------------------------------+
|                                                                                          embeddings|
+----------------------------------------------------------------------------------------------------+
|[0.45521045, -0.16874692, -0.06179046, -0.37956607, 1.152633, 0.6849592, -0.9676384, 0.4624033, ...|
+----------------------------------------------------------------------------------------------------+

Model Information

Model Name: finembedding_e5_base
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [document]
Output Labels: [E5]
Language: en
Size: 398.5 MB

References

In-house curated financial datasets.