Earning Calls Financial NER (Specific, sm)

Description

This is a sm (small) version of a financial model trained on Earning Calls transcripts to detect financial entities (NER model). This model is called Specific as it has more labels in comparison with a Generic version.

Please note this model requires some tokenization configuration to extract the currency (see python snippet below).

The currently available entities are:

  • AMOUNT: Numeric amounts, not percentages
  • ASSET: Current or Fixed Asset
  • ASSET_DECREASE: Decrease in the asset possession/exposure
  • ASSET_INCREASE: Increase in the asset possession/exposure
  • CF: Total cash flow 
  • CFO: Cash flow from operating activity
  • CFO_INCREASE: Cash flow from operating activity increased
  • COUNT: Number of items (not monetary, not percentages).
  • CURRENCY: The currency of the amount
  • DATE: Generic dates in context where either it’s not a fiscal year or it can’t be asserted as such given the context
  • EXPENSE: An expense or loss
  • EXPENSE_DECREASE: A piece of information saying there was an expense decrease in that fiscal year
  • EXPENSE_INCREASE: A piece of information saying there was an expense increase in that fiscal year
  • FCF: Free Cash Flow
  • FISCAL_YEAR: A date which expresses which month the fiscal exercise was closed for a specific year
  • INCOME: Any income that is reported
  • INCOME_INCREASE: Relative increase in income
  • KPI: Key Performance Indicator, a quantifiable measure of performance over time for a specific objective
  • KPI_DECREASE: Relative decrease in a KPI
  • KPI_INCREASE: Relative increase in a KPI
  • LIABILITY: Current or Long-Term Liability (not from stockholders)
  • LIABILITY_DECREASE: Relative decrease in liability
  • LIABILITY_INCREASE: Relative increase in liability
  • LOSS: Type of loss (e.g. gross, net)
  • ORG: Mention to a company/organization name
  • PERCENTAGE: : Numeric amounts which are percentages
  • PROFIT: Profit or also Revenue
  • PROFIT_DECLINE: A piece of information saying there was a profit / revenue decrease in that fiscal year
  • PROFIT_INCREASE: A piece of information saying there was a profit / revenue increase in that fiscal year
  • REVENUE: Revenue reported by company
  • REVENUE_DECLINE: Relative decrease in revenue when compared to other years
  • REVENUE_INCREASE: Relative increase in revenue when compared to other years
  • STOCKHOLDERS_EQUITY: Equity possessed by stockholders, not liability
  • TICKER: Trading symbol of the company

Predicted Entities

AMOUNT, ASSET, ASSET_DECREASE, ASSET_INCREASE, CF, CFO, CFO_INCREASE, COUNT, CURRENCY, DATE, EXPENSE, EXPENSE_DECREASE, EXPENSE_INCREASE, FCF, FISCAL_YEAR, INCOME, INCOME_INCREASE, KPI, KPI_DECREASE, KPI_INCREASE, LIABILITY, LIABILITY_DECREASE, LIABILITY_INCREASE, LOSS, ORG, PERCENTAGE, PROFIT, PROFIT_DECLINE, PROFIT_INCREASE, REVENUE, REVENUE_DECLINE, REVENUE_INCREASE, STOCKHOLDERS_EQUITY, TICKER

Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")\
    .setContextChars(['.', ',', ';', ':', '!', '?', '*', '-', '(', ')', '”', '’', '$','€'])

embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base", "en") \
  .setInputCols("sentence", "token") \
  .setOutputCol("embeddings")\
  .setMaxSentenceLength(512)

ner_model = finance.NerModel.pretrained("finner_earning_calls_specific_sm", "en", "finance/models")\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")

ner_converter = nlp.NerConverter()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")

pipeline = nlp.Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    embeddings,
    ner_model,
    ner_converter   
    ])

data = spark.createDataFrame([["""Adjusted EPS was ahead of our expectations at $ 1.21 , and free cash flow is also ahead of our expectations despite a $ 1.5 billion additional tax payment we made related to the R&D amortization."""]]).toDF("text")

model = pipeline.fit(data)

result = model.transform(data)

result.select(F.explode(F.arrays_zip('ner_chunk.result', 'ner_chunk.metadata')).alias("cols")) \
               .select(F.expr("cols['0']").alias("text"),
                       F.expr("cols['1']['entity']").alias("label")).show(200, truncate = False)

Results

+------------+----------+----------+
|       token| ner_label|confidence|
+------------+----------+----------+
|    Adjusted|  B-PROFIT|    0.6957|
|         EPS|  I-PROFIT|    0.8325|
|         was|         O|    0.9994|
|       ahead|         O|    0.9996|
|          of|         O|    0.9929|
|         our|         O|    0.9852|
|expectations|         O|    0.9845|
|          at|         O|       1.0|
|           $|B-CURRENCY|    0.9995|
|        1.21|  B-AMOUNT|       1.0|
|           ,|         O|    0.9993|
|         and|         O|    0.9997|
|        free|     B-FCF|    0.9883|
|        cash|     I-FCF|     0.815|
|        flow|     I-FCF|    0.8644|
|          is|         O|    0.9997|
|        also|         O|    0.9966|
|       ahead|         O|    0.9998|
|          of|         O|    0.9953|
|         our|         O|    0.9877|
|expectations|         O|     0.994|
|     despite|         O|    0.9997|
|           a|         O|    0.9979|
|           $|B-CURRENCY|    0.9992|
|         1.5|  B-AMOUNT|       1.0|
|     billion|  I-AMOUNT|    0.9997|
|  additional| B-EXPENSE|     0.641|
|         tax| I-EXPENSE|    0.3146|
|     payment| I-EXPENSE|    0.6099|
|          we|         O|    0.9613|
|        made|         O|     0.982|
|     related|         O|    0.9732|
|          to|         O|    0.8816|
|         the|         O|    0.7283|
|         R&D|         O|    0.8978|
|amortization|         O|    0.5825|
|           .|         O|       1.0|
+------------+----------+----------+

Model Information

Model Name: finner_earning_calls_specific_sm
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 16.3 MB

References

In-house annotations on Earning Calls.

Benchmarking


label                  tp    fp   fn   prec        rec          f1         
I-REVENUE_INCREASE     53    16   52   0.76811594  0.50476193   0.6091954  
I-AMOUNT               382   2    4    0.9947917   0.9896373    0.9922078  
B-COUNT                14    9    1    0.6086956   0.93333334   0.73684216 
B-AMOUNT               454   0    5    1.0         0.9891068    0.9945236  
I-KPI                  2     11   23   0.15384616  0.08         0.10526316 
I-ORG                  16    0    0    1.0         1.0          1.0        
B-DATE                 122   13   0    0.9037037   1.0          0.94941634 
B-LIABILITY_DECREASE   1     1    0    0.5         1.0          0.6666667  
I-DATE                 3     2    0    0.6         1.0          0.75       
B-LOSS                 4     0    4    1.0         0.5          0.6666667  
I-ASSET                6     2    14   0.75        0.3          0.42857146 
I-EXPENSE              46    13   59   0.77966     0.43809524   0.5609756  
I-KPI_INCREASE         1     7    13   0.125       0.071428575  0.09090909 
B-REVENUE_INCREASE     60    21   34   0.7407407   0.63829786   0.6857143  
I-COUNT                13    6    0    0.68421054  1.0          0.8125     
I-CFO                  23    1    0    0.9583333   1.0          0.9787234  
B-FCF                  13    4    0    0.7647059   1.0          0.8666667  
B-PROFIT_INCREASE      11    11   5    0.5         0.6875       0.57894737 
B-EXPENSE              26    16   45   0.61904764  0.36619717   0.460177   
B-REVENUE_DECLINE      6     4    13   0.6         0.31578946   0.41379312 
B-STOCKHOLDERS_EQUITY  3     0    3    1.0         0.5          0.6666667  
I-PROFIT_DECLINE       4     1    7    0.8         0.36363637   0.5        
I-LIABILITY_DECREASE   1     1    0    0.5         1.0          0.6666667  
I-LOSS                 12    0    10   1.0         0.54545456   0.7058824  
I-PROFIT               148   40   10   0.78723407  0.93670887   0.8554913  
B-CFO                  9     1    1    0.9         0.9          0.9        
B-CURRENCY             440   0    1    1.0         0.9977324    0.9988649  
I-PROFIT_INCREASE      11    10   6    0.52380955  0.64705884   0.5789474  
I-CURRENCY             6     0    0    1.0         1.0          1.0        
B-PROFIT               93    27   16   0.775       0.853211     0.812227   
B-PERCENTAGE           418   7    3    0.9835294   0.9928741    0.9881796  
B-TICKER               13    0    0    1.0         1.0          1.0        
I-FISCAL_YEAR          2     3    1    0.4         0.6666667    0.5        
B-ORG                  14    0    0    1.0         1.0          1.0        
I-STOCKHOLDERS_EQUITY  6     0    2    1.0         0.75         0.85714287 
I-REVENUE_DECLINE      8     9    8    0.47058824  0.5          0.4848485  
B-EXPENSE_INCREASE     6     0    4    1.0         0.6          0.75       
B-REVENUE              51    17   15   0.75        0.77272725   0.761194   
B-FISCAL_YEAR          1     1    0    0.5         1.0          0.6666667  
I-EXPENSE_DECREASE     3     3    2    0.5         0.6          0.54545456 
I-FCF                  26    9    0    0.74285716  1.0          0.852459   
I-REVENUE              45    12   18   0.7894737   0.71428573   0.75000006 
I-EXPENSE_INCREASE     8     0    3    1.0         0.72727275   0.84210527 
Macro-average          2611  311  491  0.6658762   0.6029909    0.63287526 
Micro-average          2611  311  91   0.8935661   0.84171504   0.8668659