Finance Liability NER (10-K, 10-Q, lg, XBRL)

Description

This model is a Named Entity Recognition (NER) model focused on financial numeric items. It identifies 20 numeric financial Liability entities from diverse 10-Q and 10-K reports. These entities are annotated using eXtensible Business Reporting Language (XBRL) tags. The annotation process primarily targets numerical tokens, and the context plays a crucial role in accurately assigning the appropriate entity type from the 139 most common financial entities available in the dataset.

This is a large (lg) model, trained with 200K sentences.

Predicted Entities

DebtInstrumentCarryingAmount, LineOfCreditFacilityRemainingBorrowingCapacity, DeferredFinanceCostsGross, DebtInstrumentBasisSpreadOnVariableRate1, LongTermDebtFairValue, DeferredFinanceCostsNet, ClassOfWarrantOrRightExercisePriceOfWarrantsOrRights1, ConcentrationRiskPercentage1, LossContingencyAccrualAtCarryingValue, MinorityInterestOwnershipPercentageByNoncontrollingOwners, DebtInstrumentFaceAmount, OperatingLeaseWeightedAverageRemainingLeaseTerm1, DebtInstrumentMaturityDate, LineOfCreditFacilityCurrentBorrowingCapacity, RevenueRemainingPerformanceObligation, PreferredStockSharesAuthorized, LineOfCreditFacilityUnusedCapacityCommitmentFeePercentage, MinorityInterestOwnershipPercentageByParent, UnrecognizedTaxBenefitsThatWouldImpactEffectiveTaxRate, DebtInstrumentTerm, DebtInstrumentConvertibleConversionPrice1

Copy S3 URI

How to use

 
documentAssembler = nlp.DocumentAssembler() \
   .setInputCol("text") \
   .setOutputCol("document")

sentence = nlp.SentenceDetector() \
   .setInputCols(["document"]) \
   .setOutputCol("sentence") 

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")\
    .setContextChars(['.', ',', ';', ':', '!', '?', '*', '-', '(', ')', '”', '’', '$','€'])

embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
  .setInputCols(["document", "token"]) \
  .setOutputCol("embeddings")\
  .setMaxSentenceLength(512)

nerTagger = finance.NerModel.pretrained('finner_10q_xbrl_lg_liability', 'en', 'finance/models')\
   .setInputCols(["sentence", "token", "embeddings"])\
   .setOutputCol("ner")
              
pipeline = nlp.Pipeline(stages=[documentAssembler,
                            sentence,
                            tokenizer,
                            embeddings,
                            nerTagger
                                ])
text = "As such , the if - converted value of the Notes was less than the principal amount of $ 345.0 million ."

df = spark.createDataFrame([[text]]).toDF("text")
fit = pipeline.fit(df)

result = fit.transform(df)

result_df = result.select(F.explode(F.arrays_zip(result.token.result,result.ner.result, result.ner.metadata)).alias("cols"))\
.select(F.expr("cols['0']").alias("token"),\
      F.expr("cols['1']").alias("ner_label"),\
      F.expr("cols['2']['confidence']").alias("confidence"))

result_df.show(50, truncate=100)

Results


+---------+--------------------------+----------+
|token    |ner_label                 |confidence|
+---------+--------------------------+----------+
|As       |O                         |1.0       |
|such     |O                         |1.0       |
|,        |O                         |1.0       |
|the      |O                         |1.0       |
|if       |O                         |1.0       |
|-        |O                         |1.0       |
|converted|O                         |1.0       |
|value    |O                         |1.0       |
|of       |O                         |1.0       |
|the      |O                         |1.0       |
|Notes    |O                         |0.9998    |
|was      |O                         |1.0       |
|less     |O                         |1.0       |
|than     |O                         |1.0       |
|the      |O                         |1.0       |
|principal|O                         |1.0       |
|amount   |O                         |1.0       |
|of       |O                         |0.9999    |
|$        |O                         |0.9999    |
|345.0    |B-DebtInstrumentFaceAmount|0.9064    |
|million  |O                         |0.9999    |
|.        |O                         |1.0       |
+---------+--------------------------+----------+

Model Information

Model Name: finner_10q_xbrl_lg_liability
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 16.4 MB

References

An in-house modified version of https://huggingface.co/datasets/nlpaueb/finer-139, re-splited and filtered to focus on sentences with bigger density of tags.

Benchmarking


label                                                         precision    recall  f1-score   support
    B-ClassOfWarrantOrRightExercisePriceOfWarrantsOrRights1     0.9613    0.9613    0.9613       155
                             B-ConcentrationRiskPercentage1     0.9887    0.9990    0.9938      1049
                 B-DebtInstrumentBasisSpreadOnVariableRate1     0.9696    0.9761    0.9728      1926
                             B-DebtInstrumentCarryingAmount     0.6658    0.6159    0.6399       427
                B-DebtInstrumentConvertibleConversionPrice1     0.9572    0.9835    0.9702       182
                                 B-DebtInstrumentFaceAmount     0.7537    0.9201    0.8286      1114
                               B-DebtInstrumentMaturityDate     0.8211    0.7573    0.7879       103
                                       B-DebtInstrumentTerm     0.9205    0.8323    0.8742       167
                                B-DeferredFinanceCostsGross     0.6977    0.6250    0.6593       144
                                  B-DeferredFinanceCostsNet     0.8264    0.8264    0.8264       265
             B-LineOfCreditFacilityCurrentBorrowingCapacity     0.9061    0.5714    0.7009       287
           B-LineOfCreditFacilityRemainingBorrowingCapacity     0.7935    0.9220    0.8529       346
B-LineOfCreditFacilityUnusedCapacityCommitmentFeePercentage     0.9597    0.9597    0.9597       273
                                    B-LongTermDebtFairValue     0.9307    0.9239    0.9273       276
                    B-LossContingencyAccrualAtCarryingValue     0.9476    0.9922    0.9693       255
B-MinorityInterestOwnershipPercentageByNoncontrollingOwners     0.9248    0.8531    0.8875       245
              B-MinorityInterestOwnershipPercentageByParent     0.8133    0.9414    0.8727       273
         B-OperatingLeaseWeightedAverageRemainingLeaseTerm1     1.0000    0.8762    0.9340       105
                           B-PreferredStockSharesAuthorized     0.9904    0.9626    0.9763       107
                    B-RevenueRemainingPerformanceObligation     0.9292    0.9906    0.9589       424
   B-UnrecognizedTaxBenefitsThatWouldImpactEffectiveTaxRate     0.9942    0.8912    0.9399       193
                                 I-DebtInstrumentFaceAmount     0.0000    0.0000    0.0000         1
                               I-DebtInstrumentMaturityDate     0.8211    0.7573    0.7879       309
                                       I-DebtInstrumentTerm     0.9643    0.7826    0.8640        69
         I-OperatingLeaseWeightedAverageRemainingLeaseTerm1     1.0000    0.6667    0.8000        15
                           I-PreferredStockSharesAuthorized     1.0000    0.8571    0.9231         7
                                                          O     0.9986    0.9979    0.9982    210593
                                                   accuracy       -         -       0.9942    219310
                                                  macro-avg     0.8717    0.8312    0.8469    219310
                                               weighted-avg     0.9944    0.9942    0.9942    219310