Description
This NER model identifies entities that can be associated with a financial sentiment. The model is trained using custom finance embeddings and is designed to be used with the associated Assertion Status model that classifies the entities into a sentiment category.
Predicted Entities
ASSET
, CASHFLOW
, EXPENSE
, FREE_CASH_FLOW
, GAINS
, KPI
, LIABILITY
, LOSSES
, PROFIT
, REVENUE
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = nlp.WordEmbeddingsModel.pretrained("finance_word_embeddings", "en", "finance/models")\
.setInputCols(["sentence","token"])\
.setOutputCol("embeddings")
ner_model =finance.NerModel.pretrained("finner_aspect_based_sentiment_fe", "en", "finance/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence","token","ner"])\
.setOutputCol("ner_chunk")
nlpPipeline = nlp.Pipeline(stages=[
documentAssembler,
sentenceDetector,
tokenizer,
embeddings,
ner_model,
ner_converter])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
text = ["""Equity and earnings of affiliates in Latin America increased to $4.8 million in the quarter from $2.2 million in the prior year as the commodity markets in Latin America remain strong through the end of the quarter."""]
res = model.transform(spark.createDataFrame([text]).toDF("text"))
Results
+--------+------+
|chunk |label |
+--------+------+
|Equity |GAINS |
|earnings|PROFIT|
+--------+------+
Model Information
Model Name: | finner_aspect_based_sentiment_fe |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 14.6 MB |
Benchmarking
label precision recall f1-score support
ASSET 0.72 0.63 0.67 132
CASHFLOW 0.81 0.73 0.77 64
EXPENSE 0.76 0.85 0.81 315
FREE_CASH_FLOW 0.93 0.93 0.93 43
GAINS 0.78 0.81 0.80 161
KPI 0.73 0.68 0.70 253
LIABILITY 0.73 0.67 0.70 93
LOSSES 0.79 0.80 0.80 56
PROFIT 0.80 0.91 0.85 223
REVENUE 0.81 0.80 0.80 492
micro-avg 0.78 0.79 0.78 1832
macro-avg 0.79 0.78 0.78 1832
weighted-avg 0.78 0.79 0.78 1832
PREVIOUSFinance Word Embeddings