Finance Contra Liability NER (10-Q, 10-K, XBRL, lg)

Description

This model is a Named Entity Recognition (NER) model focused on financial numeric items. It identifies 11 numeric financial entities from diverse 10-Q and 10-K reports. These entities are annotated using eXtensible Business Reporting Language (XBRL) tags. The annotation process primarily targets numerical tokens, and the context plays a crucial role in accurately assigning the appropriate entity type from the 139 most common financial entities available in the dataset.

Predicted Entities

TreasuryStockAcquiredAverageCostPerShare, StockRepurchasedDuringPeriodShares, StockRepurchaseProgramAuthorizedAmount1, TreasuryStockSharesAcquired, StockRepurchasedAndRetiredDuringPeriodShares, RepaymentsOfDebt, CommonStockDividendsPerShareDeclared, StockRepurchaseProgramRemainingAuthorizedRepurchaseAmount1, DebtInstrumentRedemptionPricePercentage, PreferredStockDividendRatePercentage, TreasuryStockValueAcquiredCostMethod

Copy S3 URI

How to use

 
documentAssembler = nlp.DocumentAssembler() \
   .setInputCol("text") \
   .setOutputCol("document")

sentence = nlp.SentenceDetector() \
   .setInputCols(["document"]) \
   .setOutputCol("sentence") 

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")\
    .setContextChars(['.', ',', ';', ':', '!', '?', '*', '-', '(', ')', '”', '’', '$','€'])

embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
  .setInputCols(["document", "token"]) \
  .setOutputCol("embeddings")\
  .setMaxSentenceLength(512)

nerTagger = finance.NerModel.pretrained('finner_10q_xbrl_lg_contra_liability', 'en', 'finance/models')\
   .setInputCols(["sentence", "token", "embeddings"])\
   .setOutputCol("ner")
              
pipeline = nlp.Pipeline(stages=[documentAssembler,
                            sentence,
                            tokenizer,
                            embeddings,
                            nerTagger
                                ])
text = "Any optional redemption of the Notes will be at a redemption price equal to 100 % of the principal amount of the Notes to be redeemed , plus accrued and unpaid interest to , but excluding , the redemption date .  "

df = spark.createDataFrame([[text]]).toDF("text")
fit = pipeline.fit(df)

result = fit.transform(df)

result_df = result.select(F.explode(F.arrays_zip(result.token.result,result.ner.result, result.ner.metadata)).alias("cols"))\
.select(F.expr("cols['0']").alias("token"),\
      F.expr("cols['1']").alias("ner_label"),\
      F.expr("cols['2']['confidence']").alias("confidence"))

result_df.show(50, truncate=100)

Results


+----------+-----------------------------------------+----------+
|token     |ner_label                                |confidence|
+----------+-----------------------------------------+----------+
|Any       |O                                        |1.0       |
|optional  |O                                        |1.0       |
|redemption|O                                        |1.0       |
|of        |O                                        |1.0       |
|the       |O                                        |1.0       |
|Notes     |O                                        |1.0       |
|will      |O                                        |1.0       |
|be        |O                                        |1.0       |
|at        |O                                        |1.0       |
|a         |O                                        |1.0       |
|redemption|O                                        |1.0       |
|price     |O                                        |1.0       |
|equal     |O                                        |1.0       |
|to        |O                                        |1.0       |
|100       |B-DebtInstrumentRedemptionPricePercentage|0.9999    |
|%         |O                                        |1.0       |
|of        |O                                        |1.0       |
|the       |O                                        |1.0       |
|principal |O                                        |1.0       |
|amount    |O                                        |1.0       |
|of        |O                                        |1.0       |
|the       |O                                        |1.0       |
|Notes     |O                                        |1.0       |
|to        |O                                        |1.0       |
|be        |O                                        |1.0       |
|redeemed  |O                                        |1.0       |
|,         |O                                        |1.0       |
|plus      |O                                        |1.0       |
|accrued   |O                                        |1.0       |
|and       |O                                        |1.0       |
|unpaid    |O                                        |1.0       |
|interest  |O                                        |1.0       |
|to        |O                                        |1.0       |
|,         |O                                        |1.0       |
|but       |O                                        |1.0       |
|excluding |O                                        |1.0       |
|,         |O                                        |1.0       |
|the       |O                                        |1.0       |
|redemption|O                                        |1.0       |
|date      |O                                        |1.0       |
|.         |O                                        |1.0       |
+----------+-----------------------------------------+----------+

Model Information

Model Name: finner_10q_xbrl_lg_contra_liability
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 16.4 MB

References

An in-house modified version of https://huggingface.co/datasets/nlpaueb/finer-139, re-splited and filtered to focus on sentences with bigger density of tags.

Benchmarking



label                                                          precision    recall  f1-score   support
                      B-CommonStockDividendsPerShareDeclared     0.9455    0.9975    0.9708       400
                   B-DebtInstrumentRedemptionPricePercentage     0.9944    0.9806    0.9874       360
                      B-PreferredStockDividendRatePercentage     0.9600    1.0000    0.9796       144
                                          B-RepaymentsOfDebt     0.9310    0.9586    0.9446       169
                   B-StockRepurchaseProgramAuthorizedAmount1     0.9653    0.9430    0.9540       561
B-StockRepurchaseProgramRemainingAuthorizedRepurchaseAmount1     0.9099    0.9670    0.9376       303
              B-StockRepurchasedAndRetiredDuringPeriodShares     0.7500    0.4717    0.5792       159
                        B-StockRepurchasedDuringPeriodShares     0.5323    0.1579    0.2435       209
                  B-TreasuryStockAcquiredAverageCostPerShare     0.7884    0.9744    0.8716       195
                               B-TreasuryStockSharesAcquired     0.5664    0.9107    0.6984       403
                      B-TreasuryStockValueAcquiredCostMethod     0.6218    0.3304    0.4315       224
                               I-TreasuryStockSharesAcquired     0.0000    0.0000    0.0000         1
                                                           O     0.9981    0.9979    0.9980     92921
                                                    accuracy       -          -      0.9927     96049
                                                   macro-avg     0.7664    0.7453    0.7382     96049
                                                weighted-avg     0.9926    0.9927    0.9921     96049