Finance Pipeline (Headers / Subheaders)

Description

This is a finance pretrained pipeline that will help you split long financial documents into smaller sections. To do that, it detects Headers and Subheaders of different sections. You can then use the beginning and end information in the metadata to retrieve the text between those headers.

PART I, PART II, etc are HEADERS Item 1, Item 2, etc are also HEADERS Item 1A, 2B, etc are SUBHEADERS 1., 2., 2.1, etc. are SUBHEADERS

Copy S3 URI

How to use

finance_pipeline = nlp.PretrainedPipeline("finpipe_header_subheader", "en", "finance/models")

text = ["""
Item 2. Definitions. 
For purposes of this Agreement, the following terms have the meanings ascribed thereto in this Section 1. 2. Appointment as Reseller.

Item 2A. Appointment. 
The Company hereby [***]. Allscripts may also disclose Company's pricing information relating to its Merchant Processing Services and facilitate procurement of Merchant Processing Services on behalf of Sublicensed Customers, including, without limitation by references to such pricing information and Merchant Processing Services in Customer Agreements. 6

Item 2B. Customer Agreements."""]

result = finance_pipeline.annotate(text)

Results

|                        chunks | begin | end |  entities |
|------------------------------:|------:|----:|----------:|
|          Item 2. Definitions. |     1 |  21 |    HEADER |
|         Item 2A. Appointment. |   158 | 179 | SUBHEADER |
| Item 2B. Customer Agreements. |   538 | 566 | SUBHEADER |

Model Information

Model Name: finpipe_header_subheader
Type: pipeline
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Language: en
Size: 23.6 KB

Included Models

  • DocumentAssembler
  • TokenizerModel
  • ContextualParserModel
  • ContextualParserModel
  • ChunkMergeModel