Description
This model is a Named Entity Recognition (NER) model focused on financial numeric items. It identifies 12 numeric financial entities from diverse 10-Q and 10-K reports. These entities are annotated using eXtensible Business Reporting Language (XBRL) tags. The annotation process primarily targets numerical tokens, and the context plays a crucial role in accurately assigning the appropriate entity type from the 139 most common financial entities available in the dataset.
Predicted Entities
SaleOfStockPricePerShare
, StockIssuedDuringPeriodSharesNewIssues
, SharePrice
, ProceedsFromIssuanceOfCommonStock
, AntidilutiveSecuritiesExcludedFromComputationOfEarningsPerShareAmount
, SaleOfStockNumberOfSharesIssuedInTransaction
, CommonStockParOrStatedValuePerShare
, CommonStockCapitalSharesReservedForFutureIssuance
, BusinessAcquisitionEquityInterestsIssuedOrIssuableNumberOfSharesIssued
, CommonStockSharesAuthorized
, CommonStockSharesOutstanding
How to use
documentAssembler = nlp.DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")
sentence = nlp.SentenceDetector() \
.setInputCols(["document"]) \
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")\
.setContextChars(['.', ',', ';', ':', '!', '?', '*', '-', '(', ')', '”', '’', '$','€'])
embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
.setInputCols(["document", "token"]) \
.setOutputCol("embeddings")\
.setMaxSentenceLength(512)
nerTagger = finance.NerModel.pretrained('finner_10q_xbrl_lg_contra_stock_equity', 'en', 'finance/models')\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
pipeline = nlp.Pipeline(stages=[documentAssembler,
sentence,
tokenizer,
embeddings,
nerTagger
])
text = "Common Stock During the three months ended June 30 , 2016 and 2015 , the Company issued shares of its common stock in connection with its financing activities and for services received , including exercised warrants totaling 498,707 and 2,952,084 , respectively ."
df = spark.createDataFrame([[text]]).toDF("text")
fit = pipeline.fit(df)
result = fit.transform(df)
result_df = result.select(F.explode(F.arrays_zip(result.token.result,result.ner.result, result.ner.metadata)).alias("cols"))\
.select(F.expr("cols['0']").alias("token"),\
F.expr("cols['1']").alias("ner_label"),\
F.expr("cols['2']['confidence']").alias("confidence"))
result_df.show(50, truncate=100)
Results
+------------+----------------------------------------+----------+
|token |ner_label |confidence|
+------------+----------------------------------------+----------+
|Common |O |1.0 |
|Stock |O |1.0 |
|During |O |1.0 |
|the |O |1.0 |
|three |O |1.0 |
|months |O |1.0 |
|ended |O |1.0 |
|June |O |1.0 |
|30 |O |1.0 |
|, |O |1.0 |
|2016 |O |0.9999 |
|and |O |1.0 |
|2015 |O |1.0 |
|, |O |1.0 |
|the |O |1.0 |
|Company |O |1.0 |
|issued |O |1.0 |
|shares |O |0.9998 |
|of |O |1.0 |
|its |O |1.0 |
|common |O |1.0 |
|stock |O |1.0 |
|in |O |1.0 |
|connection |O |1.0 |
|with |O |1.0 |
|its |O |1.0 |
|financing |O |1.0 |
|activities |O |1.0 |
|and |O |1.0 |
|for |O |1.0 |
|services |O |1.0 |
|received |O |1.0 |
|, |O |0.9999 |
|including |O |1.0 |
|exercised |O |1.0 |
|warrants |O |1.0 |
|totaling |O |1.0 |
|498,707 |B-StockIssuedDuringPeriodSharesNewIssues|0.6729 |
|and |O |1.0 |
|2,952,084 |B-StockIssuedDuringPeriodSharesNewIssues|0.7104 |
|, |O |1.0 |
|respectively|O |1.0 |
|. |O |1.0 |
+------------+----------------------------------------+----------+
Model Information
Model Name: | finner_10q_xbrl_lg_contra_stock_equity |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 16.4 MB |
References
An in-house modified version of https://huggingface.co/datasets/nlpaueb/finer-139, re-splited and filtered to focus on sentences with bigger density of tags.
Benchmarking
label precision recall f1-score support
B-AntidilutiveSecuritiesExcludedFromComputationOfEarningsPerShareAmount 0.9913 0.9933 0.9923 1487
B-BusinessAcquisitionEquityInterestsIssuedOrIssuableNumberOfSharesIssued 0.8814 0.8062 0.8421 129
B-CommonStockCapitalSharesReservedForFutureIssuance 0.9515 0.9290 0.9401 169
B-CommonStockParOrStatedValuePerShare 0.9249 0.9467 0.9357 169
B-CommonStockSharesAuthorized 0.9500 0.9301 0.9399 143
B-CommonStockSharesOutstanding 0.8443 0.9463 0.8924 149
B-ProceedsFromIssuanceOfCommonStock 0.7550 0.8444 0.7972 135
B-SaleOfStockNumberOfSharesIssuedInTransaction 0.4486 0.8836 0.5951 232
B-SaleOfStockPricePerShare 0.5774 0.9262 0.7113 149
B-SharePrice 0.9338 0.7056 0.8038 180
B-StockIssuedDuringPeriodSharesNewIssues 0.7725 0.4417 0.5621 369
I-AntidilutiveSecuritiesExcludedFromComputationOfEarningsPerShareAmount 1.0000 1.0000 1.0000 1
I-CommonStockSharesAuthorized 1.0000 1.0000 1.0000 1
I-SaleOfStockNumberOfSharesIssuedInTransaction 0.0000 0.0000 0.0000 2
I-StockIssuedDuringPeriodSharesNewIssues 0.0000 0.0000 0.0000 7
O 0.9991 0.9978 0.9984 97395
accuracy - - 0.9938 100717
macro-avg 0.7519 0.7719 0.7506 100717
weighted-avg 0.9950 0.9938 0.9940 100717