Description
This is a finance pretrained pipeline that will help you split long financial documents into smaller sections. To do that, it detects Headers and Subheaders of different sections. You can then use the beginning and end information in the metadata to retrieve the text between those headers.
PART I, PART II, etc are HEADERS Item 1, Item 2, etc are also HEADERS Item 1A, 2B, etc are SUBHEADERS 1., 2., 2.1, etc. are SUBHEADERS
How to use
finance_pipeline = nlp.PretrainedPipeline("finpipe_header_subheader", "en", "finance/models")
text = ["""
Item 2. Definitions.
For purposes of this Agreement, the following terms have the meanings ascribed thereto in this Section 1. 2. Appointment as Reseller.
Item 2A. Appointment.
The Company hereby [***]. Allscripts may also disclose Company's pricing information relating to its Merchant Processing Services and facilitate procurement of Merchant Processing Services on behalf of Sublicensed Customers, including, without limitation by references to such pricing information and Merchant Processing Services in Customer Agreements. 6
Item 2B. Customer Agreements."""]
result = finance_pipeline.annotate(text)
Results
| chunks | begin | end | entities |
|------------------------------:|------:|----:|----------:|
| Item 2. Definitions. | 1 | 21 | HEADER |
| Item 2A. Appointment. | 158 | 179 | SUBHEADER |
| Item 2B. Customer Agreements. | 538 | 566 | SUBHEADER |
Model Information
Model Name: | finpipe_header_subheader |
Type: | pipeline |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Language: | en |
Size: | 23.6 KB |
Included Models
- DocumentAssembler
- TokenizerModel
- ContextualParserModel
- ContextualParserModel
- ChunkMergeModel