Description
This Financial NER model can extract up to 20 quantifiable entities, including KPI, from the Responsibility and ESG Reports of companies. This medium
model has been trained with more data.
If you look for a small
version of the model, you can find it here
Predicted Entities
AGE
, AMOUNT
, COUNTABLE_ITEM
, DATE_PERIOD
, ECONOMIC_ACTION
, ECONOMIC_KPI
, ENVIRONMENTAL_ACTION
, ENVIRONMENTAL_KPI
, ENVIRONMENTAL_UNIT
, ESG_ROLE
, FACILITY_PLACE
, ISO
, PERCENTAGE
, PROFESSIONAL_GROUP
, RELATIVE_METRIC
, SOCIAL_ACTION
, SOCIAL_KPI
, TARGET_GROUP
, TARGET_GROUP_BUSINESS
, WASTE
How to use
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")\
sentence_detector = nlp.SentenceDetector()\
.setInputCols(["document"])\
.setOutputCol("sentence")\
tokenizer = nlp.Tokenizer() \
.setInputCols(["sentence"]) \
.setOutputCol("token")\
.setContextChars(['.', ',', ';', ':', '!', '?', '*', '-', '(', ')', '"', "'", '%', '&'])
embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base", "en") \
.setInputCols("sentence", "token") \
.setOutputCol("embeddings")\
.setMaxSentenceLength(512)\
.setCaseSensitive(True)
ner_model = finance.NerModel.pretrained("finner_responsibility_reports_md", "en", "finance/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = nlp.NerConverter()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
nlpPipeline = nlp.Pipeline(stages=[
document_assembler,
sentence_detector,
tokenizer,
embeddings,
ner_model,
ner_converter
])
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)
text = """The company has reduced its direct GHG emissions from 12,135 million tonnes of CO2e in 2017 to 4 million tonnes of CO2e in 2021. The indirect GHG emissions (scope 2) are mainly from imported energy, including electricity, heat, steam, and cooling, and the company has reduced its scope 2 emissions from 3 million tonnes of CO2e in 2017-2018 to 4 million tonnes of CO2e in 2020-2021. The scope 3 emissions are mainly from the use of sold products, and the emissions have increased from 377 million tonnes of CO2e in 2017 to 408 million tonnes of CO2e in 2021."""
data = spark.createDataFrame([[text]]).toDF("text")
result = model.transform(data)
result.select(F.explode(F.arrays_zip('ner_chunk.result', 'ner_chunk.metadata')).alias("cols")) \
.select(F.expr("cols['0']").alias("chunk"),
F.expr("cols['1']['entity']").alias("label")).show(50, truncate = False)
Results
+----------------------+------------------+
|chunk |label |
+----------------------+------------------+
|direct GHG emissions |ENVIRONMENTAL_KPI |
|12,135 million |AMOUNT |
|tonnes of CO2e |ENVIRONMENTAL_UNIT|
|2017 |DATE_PERIOD |
|4 million |AMOUNT |
|tonnes of CO2e |ENVIRONMENTAL_UNIT|
|2021 |DATE_PERIOD |
|indirect GHG emissions|ENVIRONMENTAL_KPI |
|scope 2 |ENVIRONMENTAL_KPI |
|imported energy |ENVIRONMENTAL_KPI |
|electricity |ENVIRONMENTAL_KPI |
|heat |ENVIRONMENTAL_KPI |
|steam |ENVIRONMENTAL_KPI |
|cooling |ENVIRONMENTAL_KPI |
|scope 2 emissions |ENVIRONMENTAL_KPI |
|3 million |AMOUNT |
|tonnes of CO2e |ENVIRONMENTAL_UNIT|
|2017-2018 |DATE_PERIOD |
|4 million |AMOUNT |
|tonnes of CO2e |ENVIRONMENTAL_UNIT|
|2020-2021 |DATE_PERIOD |
|scope 3 emissions |ENVIRONMENTAL_KPI |
|sold |ECONOMIC_ACTION |
|products |SOCIAL_KPI |
|emissions |ENVIRONMENTAL_KPI |
|377 million |AMOUNT |
|tonnes of CO2e |ENVIRONMENTAL_UNIT|
|2017 |DATE_PERIOD |
|408 million |AMOUNT |
|tonnes of CO2e |ENVIRONMENTAL_UNIT|
|2021 |DATE_PERIOD |
+----------------------+------------------+
Model Information
Model Name: | finner_responsibility_reports_md |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | en |
Size: | 16.4 MB |
References
In-house annotations on Responsibility and ESG Reports
Benchmarking
label precision recall f1-score support
B-AMOUNT 0.97 0.97 0.97 1207
I-AMOUNT 0.97 0.94 0.96 361
B-ENVIRONMENTAL_KPI 0.79 0.81 0.80 1051
I-ENVIRONMENTAL_KPI 0.74 0.88 0.81 716
B-DATE_PERIOD 0.94 0.95 0.94 980
I-DATE_PERIOD 0.90 0.95 0.92 498
B-PERCENTAGE 0.99 0.99 0.99 695
I-PERCENTAGE 0.99 1.00 1.00 692
B-SOCIAL_KPI 0.66 0.74 0.70 481
I-SOCIAL_KPI 0.56 0.33 0.41 43
B-ENVIRONMENTAL_UNIT 0.94 0.96 0.95 459
I-ENVIRONMENTAL_UNIT 0.91 0.86 0.88 268
B-PROFESSIONAL_GROUP 0.85 0.92 0.88 358
I-PROFESSIONAL_GROUP 0.94 0.94 0.94 32
B-TARGET_GROUP 0.89 0.85 0.87 337
I-TARGET_GROUP 0.76 0.95 0.84 59
B-ENVIRONMENTAL_ACTION 0.72 0.68 0.70 341
I-ENVIRONMENTAL_ACTION 1.00 0.56 0.71 18
B-SOCIAL_ACTION 0.59 0.72 0.65 241
B-ESG_ROLE 0.76 0.72 0.74 109
I-ESG_ROLE 0.81 0.84 0.83 305
B-ECONOMIC_KPI 0.77 0.67 0.71 219
I-ECONOMIC_KPI 0.47 0.70 0.56 50
B-RELATIVE_METRIC 0.92 0.98 0.95 147
I-RELATIVE_METRIC 0.89 0.99 0.94 178
B-FACILITY_PLACE 0.74 0.89 0.81 139
I-FACILITY_PLACE 0.77 0.93 0.84 89
B-COUNTABLE_ITEM 0.64 0.69 0.67 154
I-COUNTABLE_ITEM 0.25 1.00 0.40 1
B-WASTE 0.84 0.64 0.73 126
I-WASTE 0.91 0.51 0.65 57
B-ECONOMIC_ACTION 0.73 0.74 0.73 91
I-ECONOMIC_ACTION 0.00 0.00 0.00 1
B-TARGET_GROUP_BUSINESS 0.93 0.85 0.89 74
I-TARGET_GROUP_BUSINESS 0.00 0.00 0.00 1
B-AGE 0.74 0.70 0.72 37
I-AGE 0.90 0.65 0.75 40
B-ISO 0.84 0.72 0.78 36
I-ISO 0.91 0.80 0.85 25
micro avg 0.86 0.88 0.87 10716
macro avg 0.75 0.75 0.74 10716
weighted avg 0.86 0.88 0.87 10716