Financial 10K Filings NER

Description

IMPORTANT: Don’t run this model on the whole financial report. Instead:

  • Split by paragraphs;
  • Use the finclf_form_10k_summary_item Text Classifier to select only these paragraphs;

This Financial NER Model is aimed to process the first summary page of 10K filings and extract the information about the Company submitting the filing, trading data, address / phones, CFN, IRS, etc.

Predicted Entities

ADDRESS, CFN, FISCAL_YEAR, IRS, PHONE, ORG, STOCK_EXCHANGE, STATE, TICKER, TITLE_CLASS, TITLE_CLASS_VALUE

Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = nlp.SentenceDetector() \
    .setInputCols(["document"]) \
    .setOutputCol("sentence") \
    .setCustomBounds(["\n\n"])

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

embeddings = nlp.WordEmbeddingsModel.pretrained("finance_word_embeddings","en","finance/models")\
	.setInputCols(["sentence","token"])\
	.setOutputCol("embeddings")

ner_model = finance.NerModel.pretrained("finner_10k_summary_fe","en","finance/models")\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")\

ner_converter = nlp.NerConverter()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")

pipeline = nlp.Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    embeddings,
    ner_model,
    ner_converter   
    ])

model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

data = spark.createDataFrame([["""ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES AND EXCHANGE ACT OF 1934
For the annual period ended January 31, 2021
or
TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934
For the transition period from________to_______
Commission File Number: 001-38856
PAGERDUTY, INC.
(Exact name of registrant as specified in its charter)
Delaware
27-2793871
(State or other jurisdiction of
incorporation or organization)
(I.R.S. Employer
Identification Number)
600 Townsend St., Suite 200, San Francisco, CA 94103
(844) 800-3889
(Address, including zip code, and telephone number, including area code, of registrant’s principal executive offices)
Securities registered pursuant to Section 12(b) of the Act:
Title of each class
Trading symbol(s)
Name of each exchange on which registered
Common Stock, $0.000005 par value,
PD
New York Stock Exchange"""]]).toDF("text")

result = model.transform(data)

Results

+----------------------------------------------+-----------------+
|ticker                                        |label            |
+----------------------------------------------+-----------------+
|January 31, 2021                              |FISCAL_YEAR      |
|001-38856                                     |CFN              |
|PAGERDUTY, INC                                |ORG              |
|Delaware                                      |STATE            |
|27-2793871                                    |IRS              |
|600 Townsend St., Suite 200, San Francisco, CA|ADDRESS          |
|(844) 800-3889                                |PHONE            |
|Common Stock                                  |TITLE_CLASS      |
|$0.000005                                     |TITLE_CLASS_VALUE|
|PD                                            |TICKER           |
|New York Stock Exchange                       |STOCK_EXCHANGE   |
+----------------------------------------------+-----------------+

Model Information

Model Name: finner_sec_10k_summary_fe
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 14.8 MB

References

Manual annotations on 10-K Filings

Benchmarking

label	 tp	 fp	 fn	 prec	 rec	 f1
B-TITLE_CLASS	 16	 0	 1	 1.0	 0.9411765	 0.969697
I-ORG	 62	 16	 17	 0.7948718	 0.7848101	 0.789809
B-STOCK_EXCHANGE	 13	 0	 1	 1.0	 0.9285714	 0.9629629
B-PHONE	 15	 0	 1	 1.0	 0.9375	 0.9677419
B-STATE	 10	 0	 1	 1.0	 0.90909094	 0.95238096
B-IRS	 11	 1	 0	 0.9166667	 1.0	 0.95652175
I-PHONE	 46	 1	 0	 0.9787234	 1.0	 0.9892473
I-TITLE_CLASS	 22	 0	 1	 1.0	 0.95652175	 0.9777778
B-CFN	 15	 0	 1	 1.0	 0.9375	 0.9677419
B-ADDRESS	 12	 0	 2	 1.0	 0.85714287	 0.9230769
I-ADDRESS	 118	 5	 1	 0.9593496	 0.99159664	 0.9752066
I-STOCK_EXCHANGE	 45	 0	 3	 1.0	 0.9375	 0.9677419
B-TICKER	 13	 0	 1	 1.0	 0.9285714	 0.9629629
I-FISCAL_YEAR	 131	 3	 45	 0.97761196	 0.7443182	 0.84516126
B-TITLE_CLASS_VALUE	 16	 0	 0	 1.0	 1.0	 1.0
B-ORG	 55	 20	 9	 0.73333335	 0.859375	 0.79136693
B-FISCAL_YEAR	 51	 1	 17	 0.9807692	 0.75	 0.85

Macro-average	 prec: 0.9612545, rec: 0.90962785, f1: 0.9347289
Micro-average	 prec: 0.93266475, rec: 0.8656915, f1: 0.897931