Description
This model is a financial version of the E5 base model fine-tuned on in-house curated financial datasets. Reference: Wang, Liang, et al. “Text embeddings by weakly-supervised contrastive pre-training.” arXiv preprint arXiv:2212.03533 (2022).
Predicted Entities
How to use
document_assembler = (
nlp.DocumentAssembler().setInputCol("text").setOutputCol("document")
)
E5_embedding = (
nlp.E5Embeddings.pretrained(
"finembedding_e5_base", "en", "finance/models"
)
.setInputCols(["document"])
.setOutputCol("E5")
)
pipeline = nlp.Pipeline(stages=[document_assembler, E5_embedding])
data = spark.createDataFrame(
[["What is the best way to invest in the stock market?"]]
).toDF("text")
result = pipeline.fit(data).transform(data)
result. Select("E5.result").show()
Results
+----------------------------------------------------------------------------------------------------+
| embeddings|
+----------------------------------------------------------------------------------------------------+
|[0.45521045, -0.16874692, -0.06179046, -0.37956607, 1.152633, 0.6849592, -0.9676384, 0.4624033, ...|
+----------------------------------------------------------------------------------------------------+
Model Information
Model Name: | finembedding_e5_base |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document] |
Output Labels: | [E5] |
Language: | en |
Size: | 398.5 MB |
References
In-house curated financial datasets.