Description
This is an Assertion Status model which allows you to detect if a mentioned amount or percentage is stated to be increased or decreased in context.
Predicted Entities
INCREASE
, DECREASE
, NOT_STATED
How to use
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = nlp.SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = nlp.Tokenizer() \
.setInputCols(["sentence"]) \
.setOutputCol("token")\
.setContextChars(['.', ',', ';', ':', '!', '?', '*', '-', '(', ')', '"', "'", '%', '&'])
embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
.setInputCols(["sentence", "token"]) \
.setOutputCol("embeddings")
ner_model = finance.BertForTokenClassification.pretrained("finner_responsibility_reports", "en", "finance/models")\
.setInputCols(["sentence", "token"])\
.setOutputCol("ner")\
.setCaseSensitive(True)\
.setMaxSentenceLength(512)
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")\
.setWhiteList(["AMOUNT", "PERCENTAGE"])
fin_assertion = finance.AssertionDLModel.pretrained("finassertion_increase_decrease_amounts", "en", "finance/models")\
.setInputCols(["sentence", "ner_chunk", "embeddings"])\
.setOutputCol("assertion")\
nlpPipeline = nlp.Pipeline(stages=[
document_assembler,
sentence_detector,
tokenizer,
embeddings,
ner_model,
ner_converter,
fin_assertion
])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
text_list = ["""This reduction in GHG emissions from the previous year can be attributed to a decrease in Scope 2 emissions from indirect energy use, which decreased from 13,907 metric tons CO2e in 2020 to 12,297 metric tons CO2e in 2021.""",
"""Cal Water's year-over-year total energy consumption increased slightly from 584,719 GJ in 2020 to 587,923 GJ in 2021.""",
"""In 2020, 89 % of our employees received a year-end performance review while in 2021, this increased to 93 %.""",
"""With over 80,000 consultants and professionals in 400 locations globally, CGI has a strong presence in the technology sector, offering end-to-end services to over 5,500 clients ."""]
df = spark.createDataFrame(pd.DataFrame({"text" : text_list}))
result = model.transform(df)
Results
+-------+----------+----------+
|chunk |ner_label |assertion |
+-------+----------+----------+
|13,907 |AMOUNT |DECREASE |
|12,297 |AMOUNT |DECREASE |
|584,719|AMOUNT |INCREASE |
|587,923|AMOUNT |INCREASE |
|89 % |PERCENTAGE|INCREASE |
|93 % |PERCENTAGE|INCREASE |
|80,000 |AMOUNT |NOT_STATED|
|400 |AMOUNT |NOT_STATED|
|5,500 |AMOUNT |NOT_STATED|
+-------+----------+----------+
Model Information
Model Name: | finassertion_increase_decrease_amounts |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document, ner_chunk, embeddings] |
Output Labels: | [assertion] |
Language: | en |
Size: | 2.2 MB |
References
In-house annotations on Responsibility and ESG Reports
Benchmarking
label precision recall f1-score support
DECREASE 0.88 0.91 0.89 97
INCREASE 0.84 0.77 0.80 56
NOT_STATED 0.89 0.90 0.89 94
accuracy - - 0.87 247
macro-avg 0.87 0.86 0.86 247
weighted-avg 0.87 0.87 0.87 247