Description
This is a xl (extra-large) version of a financial model, trained in a combination of two data sets: Earning Calls and 10K Fillings.
Please note this model requires some tokenization configuration to extract the currency (see python snippet below).
The aim of this model is to detect the main pieces of financial information in annual reports of companies, more specifically this model is being trained with 10K filings.
The currently available entities are:
- AMOUNT: Numeric amounts, not percentages
- ASSET: Current or Fixed Asset
- ASSET_DECREASE: Decrease in the asset possession/exposure
- ASSET_INCREASE: Increase in the asset possession/exposure
- CF: Total cash flow
- CF_DECREASE: Relative decrease in cash flow
- CF_INCREASE: Relative increase in cash flow
- COUNT: Number of items (not monetary, not percentages).
- CURRENCY: The currency of the amount
- DATE: Generic dates in context where either it’s not a fiscal year or it can’t be asserted as such given the context
- EXPENSE: An expense or loss
- EXPENSE_DECREASE: A piece of information saying there was an expense decrease in that fiscal year
- EXPENSE_INCREASE: A piece of information saying there was an expense increase in that fiscal year
- FCF: Free Cash Flow
- FISCAL_YEAR: A date which expresses which month the fiscal exercise was closed for a specific year
- KPI: Key Performance Indicator, a quantifiable measure of performance over time for a specific objective
- KPI_DECREASE: Relative decrease in a KPI
- KPI_INCREASE: Relative increase in a KPI
- LIABILITY: Current or Long-Term Liability (not from stockholders)
- LIABILITY_DECREASE: Relative decrease in liability
- LIABILITY_INCREASE: Relative increase in liability
- ORG: Mention to a company/organization name
- PERCENTAGE: Numeric amounts which are percentages
- PROFIT: Profit or also Revenue
- PROFIT_DECLINE: A piece of information saying there was a profit / revenue decrease in that fiscal year
- PROFIT_INCREASE: A piece of information saying there was a profit / revenue increase in that fiscal year
- TICKER: Trading symbol of the company
You can also check for the Relation Extraction model which connects these entities together
Predicted Entities
AMOUNT, ASSET, ASSET_DECREASE, ASSET_INCREASE, CF, CF_DECREASE, CF_INCREASE, COUNT, CURRENCY, DATE, EXPENSE, EXPENSE_DECREASE, EXPENSE_INCREASE, FCF, FISCAL_YEAR, KPI, KPI_DECREASE, KPI_INCREASE, LIABILITY, LIABILITY_DECREASE, LIABILITY_INCREASE, ORG, PERCENTAGE, PROFIT, PROFIT_DECLINE, PROFIT_INCREASE, TICKER
How to use
document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")
sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")\
    .setContextChars(['.', ',', ';', ':', '!', '?', '*', '-', '(', ')', '”', '’', '$','€'])
embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base", "en") \
  .setInputCols("sentence", "token") \
  .setOutputCol("embeddings")\
  .setMaxSentenceLength(512)
ner_model = finance.NerModel.pretrained("finner_financial_xlarge", "en", "finance/models")\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")
ner_converter = nlp.NerConverter()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")
pipeline = nlp.Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    embeddings,
    ner_model,
    ner_converter   
    ])
data = spark.createDataFrame([["""License fees revenue decreased 40 %, or 0.5 million to 0.7 million for the year ended December 31, 2020 compared to 1.2 million for the year ended December 31, 2019"""]]).toDF("text")
model = pipeline.fit(data)
result = model.transform(data)
result.select(F.explode(F.arrays_zip('ner_chunk.result', 'ner_chunk.metadata')).alias("cols")) \
               .select(F.expr("cols['0']").alias("text"),
                       F.expr("cols['1']['entity']").alias("label")).show(200, truncate = False)
Results
+---------+----------------+----------+
|    token|       ner_label|confidence|
+---------+----------------+----------+
|  License|B-PROFIT_DECLINE|    0.9658|
|     fees|I-PROFIT_DECLINE|    0.7826|
|  revenue|I-PROFIT_DECLINE|    0.8992|
|decreased|               O|       1.0|
|       40|    B-PERCENTAGE|    0.9997|
|        %|               O|       1.0|
|        ,|               O|    0.9997|
|       or|               O|    0.9999|
|      0.5|        B-AMOUNT|    0.9925|
|  million|        I-AMOUNT|    0.9989|
|       to|               O|    0.9996|
|      0.7|        B-AMOUNT|    0.9368|
|  million|        I-AMOUNT|    0.9949|
|      for|               O|    0.9999|
|      the|               O|    0.9944|
|     year|               O|    0.9976|
|    ended|               O|    0.9987|
| December|   B-FISCAL_YEAR|    0.9941|
|       31|   I-FISCAL_YEAR|    0.8955|
|        ,|   I-FISCAL_YEAR|    0.8869|
|     2020|   I-FISCAL_YEAR|    0.9941|
| compared|               O|    0.9999|
|       to|               O|    0.9995|
|      1.2|        B-AMOUNT|    0.9853|
|  million|        I-AMOUNT|    0.9831|
|      for|               O|    0.9999|
|      the|               O|    0.9914|
|     year|               O|    0.9948|
|    ended|               O|    0.9985|
| December|   B-FISCAL_YEAR|    0.9812|
|       31|   I-FISCAL_YEAR|    0.8185|
|        ,|   I-FISCAL_YEAR|    0.8351|
|     2019|   I-FISCAL_YEAR|    0.9541|
+---------+----------------+----------+
Model Information
| Model Name: | finner_financial_xlarge | 
| Compatibility: | Finance NLP 1.0.0+ | 
| License: | Licensed | 
| Edition: | Official | 
| Input Labels: | [sentence, token, embeddings] | 
| Output Labels: | [ner] | 
| Language: | en | 
| Size: | 16.2 MB | 
References
In-house annotations on Earning Calls and 10-K Filings combined.
Benchmarking
label                 tp    fp   fn   prec        rec         f1         
B-LIABILITY_INCREASE  1     0    0    1.0         1.0         1.0        
I-AMOUNT              915   97   8    0.9041502   0.9913326   0.9457364  
B-COUNT               7     1    2    0.875       0.7777778   0.8235294  
I-LIABILITY_INCREASE  1     0    0    1.0         1.0         1.0        
B-AMOUNT              1304  124  19   0.9131653   0.9856387   0.9480189  
I-KPI                 2     0    7    1.0         0.22222222  0.36363637 
B-DATE                525   32   44   0.94254935  0.9226714   0.9325044  
I-LIABILITY           156   49   97   0.7609756   0.6166008   0.68122274 
I-DATE                343   12   36   0.9661972   0.9050132   0.93460494 
B-CF_DECREASE         6     1    3    0.85714287  0.6666667   0.75       
I-EXPENSE             270   86   74   0.75842696  0.78488374  0.77142864 
I-KPI_INCREASE        0     0    1    0.0         0.0         0.0        
B-LIABILITY           82    30   46   0.73214287  0.640625    0.6833333  
I-CF                  420   97   84   0.8123791   0.8333333   0.82272285 
I-CF_DECREASE         17    3    12   0.85        0.5862069   0.6938776  
I-COUNT               7     0    0    1.0         1.0         1.0        
B-FCF                 5     0    0    1.0         1.0         1.0        
B-PROFIT_INCREASE     54    23   22   0.7012987   0.7105263   0.7058824  
B-KPI_INCREASE        1     0    2    1.0         0.33333334  0.5        
B-EXPENSE             118   42   36   0.7375      0.76623374  0.75159234 
I-CF_INCREASE         43    0    17   1.0         0.71666664  0.8349514  
I-PERCENTAGE          4     6    0    0.4         1.0         0.5714286  
I-PROFIT_DECLINE      39    11   4    0.78        0.90697676  0.8387097  
I-KPI_DECREASE        1     1    0    0.5         1.0         0.6666667  
B-CF_INCREASE         23    0    2    1.0         0.92        0.9583333  
I-PROFIT              228   118  19   0.6589595   0.9230769   0.7689713  
B-CURRENCY            943   42   12   0.9573604   0.98743457  0.972165   
I-PROFIT_INCREASE     80    34   16   0.7017544   0.8333333   0.7619047  
B-CF                  118   32   29   0.7866667   0.8027211   0.7946128  
B-PROFIT              134   55   23   0.7089947   0.85350317  0.7745664  
B-PERCENTAGE          281   17   7    0.942953    0.9756944   0.95904434 
B-TICKER              2     0    0    1.0         1.0         1.0        
I-FISCAL_YEAR         585   17   27   0.9717608   0.9558824   0.9637562  
B-ORG                 2     0    0    1.0         1.0         1.0        
B-PROFIT_DECLINE      22    5    8    0.8148148   0.73333335  0.7719298  
B-EXPENSE_INCREASE    35    7    4    0.8333333   0.8974359   0.86419755 
B-EXPENSE_DECREASE    23    3    4    0.88461536  0.8518519   0.8679245  
B-FISCAL_YEAR         195   6    12   0.9701493   0.942029    0.9558824  
I-EXPENSE_DECREASE    46    9    16   0.8363636   0.7419355   0.78632486 
I-FCF                 10    0    0    1.0         1.0         1.0        
I-EXPENSE_INCREASE    83    13   9    0.8645833   0.90217394  0.88297874 
Macro-average         7134  977  728  0.77496254  0.72599226  0.74967855 
Micro-average         7134  977  728  0.8795463   0.9074027   0.8932574