Financial NER on Aspect-Based Sentiment Analysis

Description

This NER model identifies entities that can be associated with a financial sentiment. The model is designed to be used with the associated Assertion Status model that classifies the entities into a sentiment category.

Predicted Entities

ASSET, CASHFLOW, EXPENSE, FREE_CASH_FLOW, GAINS, KPI, LIABILITY, LOSSES, PROFIT, REVENUE

Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

# Sentence Detector annotator, processes various sentences per line
sentenceDetector = nlp.SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

# Tokenizer splits words in a relevant format for NLP
tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

bert_embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base", "en")\
    .setInputCols("sentence", "token")\
    .setOutputCol("embeddings")\
    .setMaxSentenceLength(512)


ner_model = finance.NerModel().pretrained("finner_aspect_based_sentiment_md", "en", "finance/models")\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")

ner_converter = nlp.NerConverter()\
    .setInputCols(["sentence","token","ner"])\
    .setOutputCol("ner_chunk")

nlpPipeline = nlp.Pipeline(stages=[
        documentAssembler,
        sentenceDetector,
        tokenizer,
        bert_embeddings,
        ner_model,
        ner_converter])

empty_data = spark.createDataFrame([[""]]).toDF("text")
model = nlpPipeline.fit(empty_data)

text = ["""Equity and earnings of affiliates in Latin America increased to $4.8 million in the quarter from $2.2 million in the prior year as the commodity markets in Latin America remain strong through the end of the quarter."""]
result = model.transform(spark.createDataFrame([text]).toDF("text"))

from pyspark.sql import functions as F

result.select(F.explode(F.arrays_zip(result.ner_chunk.result, result.ner_chunk.begin, result.ner_chunk.end, result.ner_chunk.metadata)).alias("cols")) \
               .select(F.expr("cols['0']").alias("chunk"),
                       F.expr("cols['1']").alias("begin"),
                       F.expr("cols['2']").alias("end"),
                       F.expr("cols['3']['entity']").alias("ner_label")
                       ).show(100, truncate=False)

Results

+--------+-----+---+---------+
|chunk   |begin|end|ner_label|
+--------+-----+---+---------+
|Equity  |1    |6  |LIABILITY|
|earnings|12   |19 |PROFIT   |
+--------+-----+---+---------+

Model Information

Model Name: finner_aspect_based_sentiment_md
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 16.5 MB

Benchmarking

 label           precision  recall  f1-score  support 
 ASSET           0.50       0.72    0.59      53      
 CASHFLOW        0.78       0.60    0.68      30      
 EXPENSE         0.71       0.68    0.70      151     
 FREE_CASH_FLOW  1.00       1.00    1.00      19      
 GAINS           0.80       0.78    0.79      55      
 KPI             0.72       0.58    0.64      106     
 LIABILITY       0.65       0.51    0.57      39      
 LOSSES          0.77       0.59    0.67      29      
 PROFIT          0.77       0.74    0.75      101     
 REVENUE         0.74       0.78    0.76      231     
 micro-avg       0.72       0.71    0.71      814     
 macro-avg       0.74       0.70    0.71      814     
 weighted-avg    0.73       0.71    0.71      814