Financial NER (xlg, XLarge)

Description

This financial model is an xlg (Xlarge) version, which has been trained with more general labels than other versions such (md, lg, …) that are available in the Models Hub. The training corpus used for this model is a combination of Broker Reports, Earning Calls, and 10K filings,was trained using custom finance word embeddings.

Predicted Entities

AMOUNT, ASSET, CF, CF_DECREASE, CF_INCREASE, COUNT, CURRENCY, DATE, EXPENSE, EXPENSE_DECREASE, EXPENSE_INCREASE, FCF, FISCAL_YEAR, KPI, KPI_DECREASE, KPI_INCREASE, LIABILITY, LIABILITY_DECREASE, LIABILITY_INCREASE, ORG, PERCENTAGE, PROFIT, PROFIT_DECLINE, PROFIT_INCREASE, TICKER

Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler()\
        .setInputCol("text")\
        .setOutputCol("document")

sentenceDetector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
        .setInputCols(["document"])\
        .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
        .setInputCols(["sentence"])\
        .setOutputCol("token")

embeddings = nlp.WordEmbeddingsModel.pretrained("finance_word_embeddings", "en", "finance/models")\
            .setInputCols(["sentence","token"])\
            .setOutputCol("embeddings")

ner_model =finance.NerModel.pretrained("finner_financial_xlarge_fe", "en", "finance/models")\
      .setInputCols(["sentence", "token", "embeddings"])\
      .setOutputCol("ner")

ner_converter = nlp.NerConverter()\
        .setInputCols(["sentence","token","ner"])\
        .setOutputCol("ner_chunk")

nlpPipeline = nlp.Pipeline(stages=[
        documentAssembler,
        sentenceDetector,
        tokenizer,
        embeddings,
        ner_model,
        ner_converter])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = nlpPipeline.fit(empty_data)

text = ['''We expect Revenue / PAT CAGR of ~ 19 %/~ 22 % over FY2022-FY2024E EPS . Hence , we retain our Buy recommendation on VGIL with an unchanged price target ( PT ) of . This includes $ 1 billion in cash and cash equivalents , $ 2 billion in property and equipment , and $ 2 billion in intangible assets .''']

res = model.transform(spark.createDataFrame([text]).toDF("text"))

Results

+-------------------------+---------------+
|chunk                    |label          |
+-------------------------+---------------+
|PAT CAGR                 |EXPENSE        |
|19                       |PERCENTAGE     |
|22                       |PERCENTAGE     |
|EPS                      |PROFIT_INCREASE|
|$                        |CURRENCY       |
|1 billion                |AMOUNT         |
|cash and cash equivalents|CF             |
|$                        |CURRENCY       |
|2 billion                |AMOUNT         |
|$                        |CURRENCY       |
|2 billion                |AMOUNT         |
+-------------------------+---------------+

Model Information

Model Name: finner_financial_xlarge_fe
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 14.8 MB

References

In-house dataset

Benchmarking

                     precision    recall  f1-score   support
AMOUNT                   0.87      0.93      0.90      3206
ASSET                    0.00      0.00      0.00        24
CF                       0.67      0.56      0.61       476
CF_DECREASE              0.64      0.30      0.41        23
CF_INCREASE              0.61      0.83      0.71        59
COUNT                    0.33      0.36      0.35        11
CURRENCY                 0.89      0.98      0.93      2130
DATE                     0.90      0.93      0.91      1196
EXPENSE                  0.59      0.59      0.59       367
EXPENSE_DECREASE         0.59      0.63      0.61        73
EXPENSE_INCREASE         0.83      0.80      0.82       135
FCF                      0.68      0.94      0.79        16
FISCAL_YEAR              0.88      0.90      0.89       435
KPI                      0.33      0.08      0.12        13
KPI_DECREASE             0.33      0.25      0.29         4
KPI_INCREASE             0.00      0.00      0.00         8
LIABILITY                0.50      0.42      0.46       227
LIABILITY_DECREASE       1.00      0.20      0.33         5
LIABILITY_INCREASE       1.00      1.00      1.00         1
ORG                      0.94      0.89      0.91        18
PERCENTAGE               0.99      0.96      0.97       774
PROFIT                   0.70      0.62      0.66       377
PROFIT_DECLINE           0.54      0.41      0.47        63
PROFIT_INCREASE          0.70      0.57      0.62       201
TICKER                   1.00      0.94      0.97        17
micro-avg                0.85      0.87      0.86      9859
macro-avg                0.66      0.60      0.61      9859
weighted-avg             0.84      0.87      0.85      9859