ESG Text Classification (3 classes)

Description

This model classifies financial texts / news into three classes: Environment, Social and Governance. This model can be use to build a ESG score board for companies.

If you look for an augmented version of this model, with more fine-grain verticals (Green House Emissions, Business Ethics, etc), please look for the finance_sequence_classifier_augmented_esg model in Models Hub.

Predicted Entities

Environment, Social, Governance

Live Demo Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler() \
    .setInputCol('text') \
    .setOutputCol('document')

tokenizer = nlp.Tokenizer() \
    .setInputCols(['document']) \
    .setOutputCol('token')

sequenceClassifier = finance.BertForSequenceClassification.pretrained("finclf_esg", "en", "finance/models")\
  .setInputCols(["document",'token'])\
  .setOutputCol("class")

pipeline = nlp.Pipeline(stages=[
    document_assembler, 
    tokenizer,
    sequenceClassifier    
])

# couple of simple examples
example = spark.createDataFrame([["""The Canadian Environmental Assessment Agency (CEAA) concluded that in June 2016 the company had not made an effort
 to protect public drinking water and was ignoring concerns raised by its own scientists about the potential levels of pollutants in the local water supply.
  At the time, there were concerns that the company was not fully testing onsite wells for contaminants and did not use the proper methods for testing because 
  of its test kits now manufactured in China.A preliminary report by the company in June 2016 was commissioned by the Alberta government to provide recommendations 
  to Alberta Environment officials"""]]).toDF("text")

result = pipeline.fit(example).transform(example)

# result is a DataFrame
result.select("text", "class.result").show()

Results

+--------------------+---------------+
|                text|         result|
+--------------------+---------------+
|The Canadian Envi...|[Environmental]|
+--------------------+---------------+

Model Information

Model Name: finclf_esg
Type: finance
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [document, token]
Output Labels: [class]
Language: en
Size: 412.2 MB
Case sensitive: true
Max sentence length: 512

References

In-house annotations from scrapped annual reports and tweets about ESG

Benchmarking

        label   precision    recall  f1-score   support
Environmental        0.99      0.97      0.98        97
       Social        0.95      0.96      0.95       162
   Governance        0.91      0.90      0.91        71
     accuracy           -         -      0.95       330
    macro-avg        0.95      0.94      0.95       330
 weighted-avg        0.95      0.95      0.95       330