Classify Edgar Financial Filings and Schedules

Description

This is a multiclass model, which analyzes the first 512 tokens of your document and retrieves if it is one of the supported classes (see Predicted entities).

The class schedule includes TO-C, 13D, TO-T, 14F1, 14D9, 14N, 13G, TO-I, 13E3. 3 means SEC’s FORM-3. 4 means SEC’s FORM-4.

Predicted Entities

schedule, other, 10-K, 10-Q, 3, 4, 8-K, S-8

Download Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")
  
embeddings = nlp.UniversalSentenceEncoder.pretrained()\
  .setInputCols("document") \
  .setOutputCol("sentence_embeddings")
    
doc_classifier = finance.ClassifierDLModel.pretrained("finclf_sec_schedules_filings", "en", "finance/models")\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("category")
    
nlpPipeline = nlp.Pipeline(stages=[
    document_assembler, 
    embeddings,
    doc_classifier
])
 
text = """SECURITIES AND EXCHANGE COMMISSION
WASHINGTON, DC 20549
SCHEDULE 13D
(Rule 13d-101)
INFORMATION TO BE INCLUDED IN STATEMENTS FILED PURSUANT TO RULE 13d-1(a) 
AND AMENDMENTS THERETO FILED PURSUANT TO RULE 13d-2(a)
Under the Securities Exchange Act of 1934
(Amendment No. 2)*
TILE SHOP HOLDINGS, INC.
(Name of Issuer)
...."""

df = spark.createDataFrame([[text]]).toDF("text")

model = nlpPipeline.fit(df)

result = model.transform(df)

Results

['schedule']

Model Information

Model Name:	finclf_sec_schedules_filings
Compatibility:	Finance NLP 1.0.0+
License:	Licensed
Edition:	Official
Input Labels:	[sentence_embeddings]
Output Labels:	[category]
Language:	en
Size:	22.7 MB

References

SEC’s Edgar

Benchmarking

label precision    recall  f1-score   support
        10-K       0.93      0.90      0.92        42
        10-Q       0.95      0.95      0.95        38
           3       0.62      0.61      0.62        33
           4       0.82      0.78      0.80        54
         8-K       0.86      0.91      0.88        33
         S-8       0.93      0.96      0.95        28
       other       1.00      1.00      1.00       238
    schedule       0.94      0.96      0.95        50
    accuracy          -            -          0.93       516
   macro-avg       0.88      0.88      0.88       516
weighted-avg       0.93      0.93      0.93       516

PREVIOUSApplicable Law Clause NER Model

NEXTMap Companies to their Acquisitions (wikipedia, en)