Financial Relation Extraction on 10K filings (Small)

Description

This model extracts relations between amounts, counts, percentages, dates and the financial entities extracted with one of these models: finner_financial_small finner_financial_medium finner_financial_large

We highly recommend using it with finner_financial_large.

Predicted Entities

has_amount, has_amount_date, has_percentage_date, has_percentage, other

Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler()\
        .setInputCol("text")\
        .setOutputCol("document")

sentencizer = nlp.SentenceDetectorDLModel\
        .pretrained("sentence_detector_dl", "en") \
        .setInputCols(["document"])\
        .setOutputCol("sentence")
                      
tokenizer = nlp.Tokenizer()\
        .setInputCols(["sentence"])\
        .setOutputCol("token")\
        .setContextChars(['.', ',', ';', ':', '!', '?', '*', '-', '(', ')', '”', '’', '$','€'])

bert_embeddings= nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en")\
        .setInputCols(["sentence", "token"])\
        .setOutputCol("bert_embeddings")

ner_model = finance.NerModel.pretrained("finner_financial_large", "en", "finance/models")\
    .setInputCols(["sentence", "token", "bert_embeddings"])\
    .setOutputCol("ner")\

ner_converter = nlp.NerConverter()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")

# ===========
# This is needed only to filter relation pairs using finance.RENerChunksFilter (see below)
# ===========
pos = nlp.PerceptronModel.pretrained("pos_anc", 'en')\
          .setInputCols("sentence", "token")\
          .setOutputCol("pos")

dependency_parser = nlp.DependencyParserModel.pretrained("dependency_conllu", "en") \
    .setInputCols(["sentence", "pos", "token"]) \
    .setOutputCol("dependencies")


ENTITIES = ['PROFIT', 'PROFIT_INCREASE', 'PROFIT_DECLINE', 'CF', 'CF_INCREASE', 'CF_DECREASE', 'LIABILITY', 'EXPENSE', 'EXPENSE_INCREASE', 'EXPENSE_DECREASE']

ENTITY_PAIRS = [f"{x}-AMOUNT" for x in ENTITIES]
ENTITY_PAIRS.extend([f"{x}-COUNT" for x in ENTITIES])
ENTITY_PAIRS.extend([f"{x}-PERCENTAGE" for x in ENTITIES])
ENTITY_PAIRS.append(f"AMOUNT-FISCAL_YEAR")
ENTITY_PAIRS.append(f"AMOUNT-DATE")
ENTITY_PAIRS.append(f"AMOUNT-CURRENCY")

re_ner_chunk_filter = finance.RENerChunksFilter() \
    .setInputCols(["ner_chunk", "dependencies"])\
    .setOutputCol("re_ner_chunk")\
    .setRelationPairs(ENTITY_PAIRS)\
    .setMaxSyntacticDistance(5)

# ===========

reDL = finance.RelationExtractionDLModel.pretrained('finre_financial_small', 'en', 'finance/models')\
    .setInputCols(["re_ner_chunk", "sentence"])\
    .setOutputCol("relations")

pipeline = nlp.Pipeline(stages=[
        documentAssembler,
        sentencizer,
        tokenizer,
        bert_embeddings,
        ner_model,
        ner_converter,
        pos,
        dependency_parser,
        re_ner_chunk_filter,
        reDL])

text = "In the third quarter of fiscal 2021, we received net proceeds of $342.7 million, after deducting underwriters discounts and commissions and offering costs of $31.8 million, including  the exercise of the underwriters option to purchase additional shares. "

data = spark.createDataFrame([[text]]).toDF("text")

model = pipeline.fit(data)

results = model.transform(data)

Results

   relation   entity1 entity1_begin entity1_end                          chunk1 entity2 entity2_begin entity2_end         chunk2 confidence
 has_amount        CF            49          60                    net proceeds  AMOUNT            66          78  342.7 million  0.9999101
 has_amount  CURRENCY            65          65                               $  AMOUNT            66          78  342.7 million  0.9925425
 has_amount   EXPENSE           125         154  commissions and offering costs  AMOUNT           160         171   31.8 million  0.9997677
 has_amount  CURRENCY           159         159                               $  AMOUNT           160         171   31.8 million   0.998896

Model Information

Model Name: finre_financial_small
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Language: en
Size: 405.7 MB

References

In-house annotations of 10K filings.

Benchmarking

Relation           Recall Precision        F1   Support
has_amount          0.997     0.997     0.997       670
has_amount_date     0.996     0.994     0.995       470
has_percentage      1.000     1.000     1.000        87
has_percentage_date     0.985     1.000     0.993        68
other               1.000     1.000     1.000       205
Avg.                0.996     0.998     0.997 1583
Weighted-Avg.       0.997     0.997     0.997 1583