Description
This model is a Named Entity Recognition (NER) model focused on financial numeric items. It identifies 139 frequently occurring financial entities extracted from diverse 10-Q and 10-K reports. These entities are annotated using eXtensible Business Reporting Language (XBRL) tags. The annotation process primarily targets numerical tokens, and the context plays a crucial role in accurately assigning the appropriate entity type from the 139 most common financial entities available in the dataset.
This is a large (lg
) model, trained with 200K sentences.
Predicted Entities
DeferredFinanceCostsNet
, DisposalGroupIncludingDiscontinuedOperationConsideration
, DebtInstrumentCarryingAmount
, CommonStockSharesAuthorized
, RestructuringCharges
, DeferredFinanceCostsGross
, OperatingLeasesRentExpenseNet
, EquityMethodInvestmentOwnershipPercentage
, ClassOfWarrantOrRightExercisePriceOfWarrantsOrRights1
, DebtInstrumentTerm
, DebtInstrumentRedemptionPricePercentage
, CommonStockCapitalSharesReservedForFutureIssuance
, LossContingencyAccrualAtCarryingValue
, SaleOfStockPricePerShare
, MinorityInterestOwnershipPercentageByParent
, PropertyPlantAndEquipmentUsefulLife
, TreasuryStockAcquiredAverageCostPerShare
, Goodwill
, SupplementalInformationForPropertyCasualtyInsuranceUnderwritersPriorYearClaimsAndClaimsAdjustmentExpense
, CommonStockParOrStatedValuePerShare
, OperatingLeaseWeightedAverageDiscountRatePercent
, DebtInstrumentConvertibleConversionPrice1
, AmortizationOfIntangibleAssets
, PreferredStockSharesAuthorized
, OperatingLeasePayments
, DebtInstrumentMaturityDate
, ShareBasedCompensationArrangementByShareBasedPaymentAwardOptionsGrantsInPeriodWeightedAverageGrantDateFairValue
, EffectiveIncomeTaxRateReconciliationAtFederalStatutoryIncomeTaxRate
, AllocatedShareBasedCompensationExpense
, PreferredStockDividendRatePercentage
, StockRepurchaseProgramRemainingAuthorizedRepurchaseAmount1
, TreasuryStockValueAcquiredCostMethod
, ShareBasedCompensationArrangementByShareBasedPaymentAwardEquityInstrumentsOtherThanOptionsVestedInPeriodTotalFairValue
, IncomeTaxExpenseBenefit
, DerivativeFixedInterestRate
, RelatedPartyTransactionExpensesFromTransactionsWithRelatedParty
, PublicUtilitiesRequestedRateIncreaseDecreaseAmount
, RestructuringAndRelatedCostExpectedCost1
, StockRepurchaseProgramAuthorizedAmount1
, ShareBasedCompensation
, ShareBasedCompensationArrangementByShareBasedPaymentAwardEquityInstrumentsOtherThanOptionsGrantsInPeriodWeightedAverageGrantDateFairValue
, LongTermDebtFairValue
, LineOfCreditFacilityUnusedCapacityCommitmentFeePercentage
, LineOfCreditFacilityCurrentBorrowingCapacity
, ShareBasedCompensationArrangementByShareBasedPaymentAwardAwardVestingPeriod1
, SharebasedCompensationArrangementBySharebasedPaymentAwardAwardVestingRightsPercentage
, PaymentsToAcquireBusinessesGross
, MinorityInterestOwnershipPercentageByNoncontrollingOwners
, AntidilutiveSecuritiesExcludedFromComputationOfEarningsPerShareAmount
, NumberOfReportableSegments
, BusinessCombinationRecognizedIdentifiableAssetsAcquiredAndLiabilitiesAssumedIntangibleAssetsOtherThanGoodwill
, OperatingLeaseCost
, BusinessCombinationConsiderationTransferred1
, UnrecognizedTaxBenefitsThatWouldImpactEffectiveTaxRate
, CommonStockDividendsPerShareDeclared
, AreaOfRealEstateProperty
, LesseeOperatingLeaseTermOfContract
, RevenueRemainingPerformanceObligation
, RelatedPartyTransactionAmountsOfTransaction
, InterestExpense
, OperatingLeaseExpense
, StockIssuedDuringPeriodSharesNewIssues
, DebtInstrumentFaceAmount
, CapitalizedContractCostAmortization
, DebtInstrumentBasisSpreadOnVariableRate1
, ShareBasedCompensationArrangementByShareBasedPaymentAwardEquityInstrumentsOtherThanOptionsNonvestedNumber
, GainsLossesOnExtinguishmentOfDebt
, LineOfCreditFacilityRemainingBorrowingCapacity
, OperatingLeaseRightOfUseAsset
, OperatingLeaseWeightedAverageRemainingLeaseTerm1
, OperatingLossCarryforwards
, ConcentrationRiskPercentage1
, GuaranteeObligationsMaximumExposure
, StockRepurchasedAndRetiredDuringPeriodShares
, LesseeOperatingLeaseRenewalTerm
, ContractWithCustomerLiabilityRevenueRecognized
, DefinedBenefitPlanContributionsByEmployer
, ShareBasedCompensationArrangementByShareBasedPaymentAwardOptionsGrantsInPeriodGross
, RepaymentsOfDebt
, EmployeeServiceShareBasedCompensationNonvestedAwardsTotalCompensationCostNotYetRecognized
, BusinessAcquisitionPercentageOfVotingInterestsAcquired
, DebtInstrumentInterestRateEffectivePercentage
, AcquiredFiniteLivedIntangibleAssetsWeightedAverageUsefulLife
, DebtInstrumentUnamortizedDiscount
, ShareBasedCompensationArrangementByShareBasedPaymentAwardNumberOfSharesAuthorized
, BusinessCombinationContingentConsiderationLiability
, DebtInstrumentInterestRateStatedPercentage
, LeaseAndRentalExpense
, RevenueFromContractWithCustomerExcludingAssessedTax
, SharePrice
, CommonStockSharesOutstanding
, ContractWithCustomerLiability
, DerivativeNotionalAmount
, RevenueFromRelatedParties
, ShareBasedCompensationArrangementByShareBasedPaymentAwardOptionsExercisesInPeriodTotalIntrinsicValue
, Revenues
, EmployeeServiceShareBasedCompensationNonvestedAwardsTotalCompensationCostNotYetRecognizedShareBasedAwardsOtherThanOptions
, AccrualForEnvironmentalLossContingencies
, ProceedsFromIssuanceOfCommonStock
, EmployeeServiceShareBasedCompensationTaxBenefitFromCompensationExpense
, IncomeLossFromEquityMethodInvestments
, NumberOfOperatingSegments
, UnrecognizedTaxBenefits
, RevenueFromContractWithCustomerIncludingAssessedTax
, LossContingencyDamagesSoughtValue
, SharebasedCompensationArrangementBySharebasedPaymentAwardExpirationPeriod
, TreasuryStockSharesAcquired
, FiniteLivedIntangibleAssetUsefulLife
, BusinessCombinationRecognizedIdentifiableAssetsAcquiredAndLiabilitiesAssumedIntangibles
, EffectiveIncomeTaxRateContinuingOperations
, LossContingencyEstimateOfPossibleLoss
, ShareBasedCompensationArrangementByShareBasedPaymentAwardNumberOfSharesAvailableForGrant
, BusinessCombinationAcquisitionRelatedCosts
, StockRepurchasedDuringPeriodShares
, CashAndCashEquivalentsFairValueDisclosure
, LineOfCreditFacilityInterestRateAtPeriodEnd
, ShareBasedCompensationArrangementByShareBasedPaymentAwardEquityInstrumentsOtherThanOptionsGrantsInPeriod
, CumulativeEffectOfNewAccountingPrincipleInPeriodOfAdoption
, LettersOfCreditOutstandingAmount
, EmployeeServiceShareBasedCompensationNonvestedAwardsTotalCompensationCostNotYetRecognizedPeriodForRecognition1
, NumberOfRealEstateProperties
, DebtWeightedAverageInterestRate
, SaleOfStockNumberOfSharesIssuedInTransaction
, AssetImpairmentCharges
, Depreciation
, DebtInstrumentFairValue
, DefinedContributionPlanCostRecognized
, InterestExpenseDebt
, LossContingencyPendingClaimsNumber
, PaymentsToAcquireBusinessesNetOfCashAcquired
, BusinessAcquisitionEquityInterestsIssuedOrIssuableNumberOfSharesIssued
, GoodwillImpairmentLoss
, LineOfCredit
, AmortizationOfFinancingCosts
, EquityMethodInvestments
, LineOfCreditFacilityCommitmentFeePercentage
, LongTermDebt
, LineOfCreditFacilityMaximumBorrowingCapacity
, OperatingLeaseLiability
How to use
from johnsnowlabs import nlp, finance
import pyspark.sql.functions as F
spark = nlp.start()
documentAssembler = nlp.DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")
sentence = nlp.SentenceDetector() \
.setInputCols(["document"]) \
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")\
.setContextChars(['.', ',', ';', ':', '!', '?', '*', '-', '(', ')', '”', '’', '$','€'])
embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
.setInputCols(["document", "token"]) \
.setOutputCol("embeddings")\
.setMaxSentenceLength(512)
ner_model = finance.NerModel.pretrained('finner_10q_xbrl', 'en', 'finance/models')\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
pipeline = nlp.Pipeline(stages=[documentAssembler,
sentence,
tokenizer,
embeddings,
ner_model
])
text = """Common Stock The authorized capital of the Company is 200,000,000 common shares , par value $ 0.001 , of which 12,481,724 are issued or outstanding ."""
df = spark.createDataFrame([[text]]).toDF("text")
result = pipeline.fit(df).transform(df)
result_df = result.select(F.explode(F.arrays_zip(result.token.result,result.ner.result, result.ner.metadata)).alias("cols"))\
.select(F.expr("cols['0']").alias("token"),
F.expr("cols['1']").alias("ner_label"),
F.expr("cols['2']['confidence']").alias("confidence"))
result_df.show(50, truncate=100)
Results
+-----------+-------------------------------------+----------+
| token| ner_label|confidence|
+-----------+-------------------------------------+----------+
| Common| O| 1.0|
| Stock| O| 1.0|
| The| O| 1.0|
| authorized| O| 1.0|
| capital| O| 1.0|
| of| O| 1.0|
| the| O| 1.0|
| Company| O| 1.0|
| is| O| 1.0|
|200,000,000| B-CommonStockSharesAuthorized| 0.9932|
| common| O| 1.0|
| shares| O| 1.0|
| ,| O| 1.0|
| par| O| 1.0|
| value| O| 1.0|
| $| O| 1.0|
| 0.001|B-CommonStockParOrStatedValuePerShare| 0.9988|
| ,| O| 1.0|
| of| O| 1.0|
| which| O| 1.0|
| 12,481,724| B-CommonStockSharesOutstanding| 0.9649|
| are| O| 1.0|
| issued| O| 1.0|
| or| O| 1.0|
|outstanding| O| 1.0|
| .| O| 1.0|
+-----------+-------------------------------------+----------+
Model Information
Model Name: | finner_10q_xbrl |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 17.0 MB |
References
An in-house modified version of https://huggingface.co/datasets/nlpaueb/finer-139, re-splited and filtered to focus on sentences with bigger density of tags.
Benchmarking
label tp fp fn prec rec f1
Macro-average 53613 10309 10243 0.8324958 0.8049274 0.8184795
Micro-average 53613 10309 10243 0.8387253 0.8395922 0.8391586