Description
This model classifies financial texts / news into three classes: Environment, Social and Governance. This model can be use to build a ESG score board for companies.
If you look for an augmented version of this model, with more fine-grain verticals (Green House Emissions, Business Ethics, etc), please look for the finance_sequence_classifier_augmented_esg model in Models Hub.
Predicted Entities
Environment
, Social
, Governance
How to use
document_assembler = nlp.DocumentAssembler() \
.setInputCol('text') \
.setOutputCol('document')
tokenizer = nlp.Tokenizer() \
.setInputCols(['document']) \
.setOutputCol('token')
sequenceClassifier = finance.BertForSequenceClassification.pretrained("finclf_esg", "en", "finance/models")\
.setInputCols(["document",'token'])\
.setOutputCol("class")
pipeline = nlp.Pipeline(stages=[
document_assembler,
tokenizer,
sequenceClassifier
])
# couple of simple examples
example = spark.createDataFrame([["""The Canadian Environmental Assessment Agency (CEAA) concluded that in June 2016 the company had not made an effort
to protect public drinking water and was ignoring concerns raised by its own scientists about the potential levels of pollutants in the local water supply.
At the time, there were concerns that the company was not fully testing onsite wells for contaminants and did not use the proper methods for testing because
of its test kits now manufactured in China.A preliminary report by the company in June 2016 was commissioned by the Alberta government to provide recommendations
to Alberta Environment officials"""]]).toDF("text")
result = pipeline.fit(example).transform(example)
# result is a DataFrame
result.select("text", "class.result").show()
Results
+--------------------+---------------+
| text| result|
+--------------------+---------------+
|The Canadian Envi...|[Environmental]|
+--------------------+---------------+
Model Information
Model Name: | finclf_esg |
Type: | finance |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document, token] |
Output Labels: | [class] |
Language: | en |
Size: | 412.2 MB |
Case sensitive: | true |
Max sentence length: | 512 |
References
In-house annotations from scrapped annual reports and tweets about ESG
Benchmarking
label precision recall f1-score support
Environmental 0.99 0.97 0.98 97
Social 0.95 0.96 0.95 162
Governance 0.91 0.90 0.91 71
accuracy - - 0.95 330
macro-avg 0.95 0.94 0.95 330
weighted-avg 0.95 0.95 0.95 330