Description
This model is a financial version of the E5 large model fine-tuned on in-house curated financial datasets. Reference: Wang, Liang, et al. “Text embeddings by weakly-supervised contrastive pre-training.” arXiv preprint arXiv:2212.03533 (2022).
Predicted Entities
How to use
document_assembler = (
nlp.DocumentAssembler().setInputCol("text").setOutputCol("document")
)
E5_embedding = (
nlp.E5Embeddings.pretrained(
"finembedding_e5_large", "en", "finance/models"
)
.setInputCols(["document"])
.setOutputCol("E5")
)
pipeline = nlp.Pipeline(stages=[document_assembler, E5_embedding])
data = spark.createDataFrame(
[["What is the best way to invest in the stock market?"]]
).toDF("text")
result = pipeline.fit(data).transform(data)
result. Select("E5.result").show()
Results
+----------------------------------------------------------------------------------------------------+
| embeddings|
+----------------------------------------------------------------------------------------------------+
|[0.8358813, -1.30341, -0.576791, 0.25893408, 0.26888973, 0.028243342, 0.47971666, 0.47653574, 0.4...|
+----------------------------------------------------------------------------------------------------+
Model Information
Model Name: | finembedding_e5_large |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document] |
Output Labels: | [E5] |
Language: | en |
Size: | 1.2 GB |
References
In-house annotated financial datasets.