Description
This financial model is an xlg (Xlarge) version, which has been trained with more general labels than other versions such (md
, lg
, …) that are available in the Models Hub. The training corpus used for this model is a combination of Broker Reports, Earning Calls, and 10K filings,was trained using custom finance word embeddings.
Predicted Entities
AMOUNT
, ASSET
, CF
, CF_DECREASE
, CF_INCREASE
, COUNT
, CURRENCY
, DATE
, EXPENSE
, EXPENSE_DECREASE
, EXPENSE_INCREASE
, FCF
, FISCAL_YEAR
, KPI
, KPI_DECREASE
, KPI_INCREASE
, LIABILITY
, LIABILITY_DECREASE
, LIABILITY_INCREASE
, ORG
, PERCENTAGE
, PROFIT
, PROFIT_DECLINE
, PROFIT_INCREASE
, TICKER
How to use
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
embeddings = nlp.WordEmbeddingsModel.pretrained("finance_word_embeddings", "en", "finance/models")\
.setInputCols(["sentence","token"])\
.setOutputCol("embeddings")
ner_model =finance.NerModel.pretrained("finner_financial_xlarge_fe", "en", "finance/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence","token","ner"])\
.setOutputCol("ner_chunk")
nlpPipeline = nlp.Pipeline(stages=[
documentAssembler,
sentenceDetector,
tokenizer,
embeddings,
ner_model,
ner_converter])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
text = ['''We expect Revenue / PAT CAGR of ~ 19 %/~ 22 % over FY2022-FY2024E EPS . Hence , we retain our Buy recommendation on VGIL with an unchanged price target ( PT ) of . This includes $ 1 billion in cash and cash equivalents , $ 2 billion in property and equipment , and $ 2 billion in intangible assets .''']
res = model.transform(spark.createDataFrame([text]).toDF("text"))
Results
+-------------------------+---------------+
|chunk |label |
+-------------------------+---------------+
|PAT CAGR |EXPENSE |
|19 |PERCENTAGE |
|22 |PERCENTAGE |
|EPS |PROFIT_INCREASE|
|$ |CURRENCY |
|1 billion |AMOUNT |
|cash and cash equivalents|CF |
|$ |CURRENCY |
|2 billion |AMOUNT |
|$ |CURRENCY |
|2 billion |AMOUNT |
+-------------------------+---------------+
Model Information
Model Name: | finner_financial_xlarge_fe |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 14.8 MB |
References
In-house dataset
Benchmarking
precision recall f1-score support
AMOUNT 0.87 0.93 0.90 3206
ASSET 0.00 0.00 0.00 24
CF 0.67 0.56 0.61 476
CF_DECREASE 0.64 0.30 0.41 23
CF_INCREASE 0.61 0.83 0.71 59
COUNT 0.33 0.36 0.35 11
CURRENCY 0.89 0.98 0.93 2130
DATE 0.90 0.93 0.91 1196
EXPENSE 0.59 0.59 0.59 367
EXPENSE_DECREASE 0.59 0.63 0.61 73
EXPENSE_INCREASE 0.83 0.80 0.82 135
FCF 0.68 0.94 0.79 16
FISCAL_YEAR 0.88 0.90 0.89 435
KPI 0.33 0.08 0.12 13
KPI_DECREASE 0.33 0.25 0.29 4
KPI_INCREASE 0.00 0.00 0.00 8
LIABILITY 0.50 0.42 0.46 227
LIABILITY_DECREASE 1.00 0.20 0.33 5
LIABILITY_INCREASE 1.00 1.00 1.00 1
ORG 0.94 0.89 0.91 18
PERCENTAGE 0.99 0.96 0.97 774
PROFIT 0.70 0.62 0.66 377
PROFIT_DECLINE 0.54 0.41 0.47 63
PROFIT_INCREASE 0.70 0.57 0.62 201
TICKER 1.00 0.94 0.97 17
micro-avg 0.85 0.87 0.86 9859
macro-avg 0.66 0.60 0.61 9859
weighted-avg 0.84 0.87 0.85 9859