Financial Zero-shot Relation Extraction

Description

This is a Zero-shot Relation Extraction Model, meaning that it does not require any training data, just few examples of of the relations types you are looking for, to output a proper result.

Make sure you keep the proper syntax of the relations you want to extract. For example:

re_model.setRelationalCategories({
    "DECREASE": ["{PROFIT_DECLINE} decrease {AMOUNT}", "{PROFIT_DECLINE}} decrease {PERCENTAGE}",
    "INCREASE": ["{PROFIT_INCREASE} increase {AMOUNT}", "{PROFIT_INCREASE}} increase {PERCENTAGE}"]
})
  • The keys of the dictionary are the name of the relations (DECREASE, INCREASE)
  • The values are list of sentences with similar examples of the relation
  • The values in brackets are the NER labels extracted by an NER component before

Predicted Entities

Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base", "en") \
  .setInputCols("sentence", "token") \
  .setOutputCol("embeddings")\
  .setMaxSentenceLength(512)

ner_model = finance.NerModel.pretrained("finner_10k", "en", "finance/models")\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")\

ner_converter = nlp.NerConverter()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")

re_model = finance.ZeroShotRelationExtractionModel.pretrained("finre_zero_shot", "en", "finance/models")\
    .setInputCols(["ner_chunk", "sentence"]) \
    .setOutputCol("relations")

# Remember it's 2 curly brackets instead of one if you are using Spark NLP < 4.0
re_model.setRelationalCategories({
    "DECREASE": ["{PROFIT_DECLINE} decrease {AMOUNT}", "{PROFIT_DECLINE} decrease {PERCENTAGE}"],
    "INCREASE": ["{PROFIT_INCREASE} increase {AMOUNT}", "{PROFIT_INCREASE} increase {PERCENTAGE}"]
})

pipeline = sparknlp.base.Pipeline() \
    .setStages([document_assembler,  
                sentence_detector,
                tokenizer, 
                embeddings,
                ner_model,
                ner_converter,
                re_model
               ])
               
sample_text = """License fees revenue decreased 40 %, or $ 0.5 million to $ 0.7 million for the year ended December 31, 2020 
compared to $ 1.2 million for the year ended December 31, 2019. Services revenue increased 4 %, or $ 1.1 million, to $ 25.6 million 
for the year ended December 31, 2020 from $ 24.5 million for the year ended December 31, 2019.
Costs of revenue, excluding depreciation and amortization increased by $ 0.1 million, or 2 %, to $ 8.8 million for the year ended December 31, 2020 
from $ 8.7 million for the year ended December 31, 2019. The increase was primarily related to increase in internal staff costs of $ 1.1 million as 
we increased delivery staff and work performed on internal projects, partially offset by a decrease in third party consultant costs of $ 0.6 million 
as these were converted to internal staff or terminated. Also, a decrease in travel costs of $ 0.4 million due to travel restrictions caused by the global pandemic. 
As a percentage of revenue, cost of revenue, excluding depreciation and amortization was 34 % for each of the years ended December 31, 2020 and 2019. 
Sales and marketing expenses decreased 20 %, or $ 1.5 million, to $ 6.0 million for the year ended December 31, 2020 from $ 7.5 million for the year ended December 31, 2019"
"""

data = spark.createDataFrame([[sample_text]]).toDF("text")
model = pipeline.fit(data)
results = model.transform(data)

# ner output
results.selectExpr("explode(ner_chunk) as ner").show(truncate=False)

# relations output
results.selectExpr("explode(relations) as relation").show(truncate=False)

Results

+--------------------------------------------------------------------------------------------------------------------------+
|ner                                                                                                                       |
+--------------------------------------------------------------------------------------------------------------------------+
|[chunk, 0, 19, License fees revenue, [entity -> PROFIT_DECLINE, sentence -> 0, chunk -> 0, confidence -> 0.41060004], []] |
|[chunk, 31, 32, 40, [entity -> PERCENTAGE, sentence -> 0, chunk -> 1, confidence -> 0.9995], []]                          |
|[chunk, 40, 40, $, [entity -> CURRENCY, sentence -> 0, chunk -> 2, confidence -> 0.9995], []]                             |
|[chunk, 42, 52, 0.5 million, [entity -> AMOUNT, sentence -> 0, chunk -> 3, confidence -> 0.99995], []]                    |
|[chunk, 57, 57, $, [entity -> CURRENCY, sentence -> 0, chunk -> 4, confidence -> 0.9998], []]                             |
|[chunk, 59, 69, 0.7 million, [entity -> AMOUNT, sentence -> 0, chunk -> 5, confidence -> 0.99985003], []]                 |
|[chunk, 90, 106, December 31, 2020, [entity -> FISCAL_YEAR, sentence -> 0, chunk -> 6, confidence -> 0.977525], []]       |
|[chunk, 121, 121, $, [entity -> CURRENCY, sentence -> 0, chunk -> 7, confidence -> 0.9996], []]                           |
|[chunk, 123, 133, 1.2 million, [entity -> AMOUNT, sentence -> 0, chunk -> 8, confidence -> 0.99975], []]                  |
|[chunk, 154, 170, December 31, 2019, [entity -> FISCAL_YEAR, sentence -> 0, chunk -> 9, confidence -> 0.96227497], []]    |
|[chunk, 173, 188, Services revenue, [entity -> PROFIT_INCREASE, sentence -> 1, chunk -> 10, confidence -> 0.57490003], []]|
|[chunk, 200, 200, 4, [entity -> PERCENTAGE, sentence -> 1, chunk -> 11, confidence -> 0.9997], []]                        |
|[chunk, 208, 208, $, [entity -> CURRENCY, sentence -> 1, chunk -> 12, confidence -> 0.999], []]                           |
|[chunk, 210, 220, 1.1 million, [entity -> AMOUNT, sentence -> 1, chunk -> 13, confidence -> 0.99995], []]                 |
|[chunk, 226, 226, $, [entity -> CURRENCY, sentence -> 1, chunk -> 14, confidence -> 0.9982], []]                          |
|[chunk, 228, 239, 25.6 million, [entity -> AMOUNT, sentence -> 1, chunk -> 15, confidence -> 0.99975], []]                |
|[chunk, 261, 277, December 31, 2020, [entity -> FISCAL_YEAR, sentence -> 1, chunk -> 16, confidence -> 0.97915], []]      |
|[chunk, 284, 284, $, [entity -> CURRENCY, sentence -> 1, chunk -> 17, confidence -> 0.9991], []]                          |
|[chunk, 286, 297, 24.5 million, [entity -> AMOUNT, sentence -> 1, chunk -> 18, confidence -> 0.99965], []]                |
|[chunk, 318, 334, December 31, 2019, [entity -> FISCAL_YEAR, sentence -> 1, chunk -> 19, confidence -> 0.9588], []]       |
+--------------------------------------------------------------------------------------------------------------------------+

+--------+
|relation                                                                                                                                                 +--------+
|[category, 0, 217, DECREASE, [entity1_begin -> 0, relation -> DECREASE, hypothesis -> License fees revenue decrease 40, confidence -> 0.9931541, nli_prediction -> entail, entity1 -> PROFIT_DECLINE, syntactic_distance -> undefined, chunk2 -> 40, entity2_end -> 32, entity1_end -> 19, entity2_begin -> 31, entity2 -> PERCENTAGE, chunk1 -> License fees revenue, sentence -> 0], []]                  |
|[category, 672, 898, DECREASE, [entity1_begin -> 0, relation -> DECREASE, hypothesis -> License fees revenue decrease 1.2 million, confidence -> 0.7394818, nli_prediction -> entail, entity1 -> PROFIT_DECLINE, syntactic_distance -> undefined, chunk2 -> 1.2 million, entity2_end -> 133, entity1_end -> 19, entity2_begin -> 123, entity2 -> AMOUNT, chunk1 -> License fees revenue, sentence -> 0], []]|
|[category, 445, 671, DECREASE, [entity1_begin -> 0, relation -> DECREASE, hypothesis -> License fees revenue decrease 0.7 million, confidence -> 0.99002415, nli_prediction -> entail, entity1 -> PROFIT_DECLINE, syntactic_distance -> undefined, chunk2 -> 0.7 million, entity2_end -> 69, entity1_end -> 19, entity2_begin -> 59, entity2 -> AMOUNT, chunk1 -> License fees revenue, sentence -> 0], []] |
|[category, 218, 444, DECREASE, [entity1_begin -> 0, relation -> DECREASE, hypothesis -> License fees revenue decrease 0.5 million, confidence -> 0.99084955, nli_prediction -> entail, entity1 -> PROFIT_DECLINE, syntactic_distance -> undefined, chunk2 -> 0.5 million, entity2_end -> 52, entity1_end -> 19, entity2_begin -> 42, entity2 -> AMOUNT, chunk1 -> License fees revenue, sentence -> 0], []] |
+--------+

Model Information

Model Name: finre_zero_shot
Type: finance
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Language: en
Size: 406.4 MB
Case sensitive: true

References

Bert Base (cased) trained on the GLUE MNLI dataset.