Financial NER for Aspect-based Sentiment Analysis (sm, Small)

Description

This NER model identifies entities that can be associated with a financial sentiment. The model is designed to be used with the associated Assertion Status model that classifies the entities into a sentiment category.

Predicted Entities

REVENUE, EXPENSE, PROFIT, KPI, GAINS, ASSET, LIABILITY, CASHFLOW, LOSSES, FREE_CASH_FLOW

Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = nlp.SentenceDetector() \
    .setInputCols(["document"]) \
    .setOutputCol("sentence") \
    .setCustomBounds(["\n\n"])

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")\
    .setCaseSensitive(True)\
    .setMaxSentenceLength(512)

ner_model = finance.NerModel.pretrained("finner_absa_sm", "en", "finance/models")\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")\

ner_converter = finance.NerConverterInternal()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")

pipeline = nlp.Pipeline(stages=[
    document_assembler,
    sentence_detector,
    tokenizer,
    embeddings,
    ner_model,
    ner_converter   
    ])

model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))


text = "Equity and earnings of affiliates in Latin America increased to $4.8 million in the quarter from $2.2 million in the prior year as the commodity markets in Latin America remain strong through the end of the quarter."

spark_df = spark.createDataFrame([[text]]).toDF("text")

result = model. Transform(spark_df)
result. Select(F.explode(F.arrays_zip('ner_chunk.result', 'ner_chunk.metadata')).alias("cols")) \
               .select(F.expr("cols['0']").alias("entity"),
                       F.expr("cols['1']['entity']").alias("label")).show(50, truncate = False)

Results

+--------+---------+
|entity  |label    |
+--------+---------+
|Equity  |LIABILITY|
|earnings|PROFIT   |
+--------+---------+

Model Information

Model Name: finner_absa_sm
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en
Size: 16.3 MB

References

In-house annotations of earning call transcripts.

Benchmarking

         label    precision    recall  f1-score   support

         B-ASSET     0.6000    0.2400    0.3429        25
      B-CASHFLOW     0.7000    0.5833    0.6364        12
       B-EXPENSE     0.7222    0.6500    0.6842        60
B-FREE_CASH_FLOW     1.0000    1.0000    1.0000         8
         B-GAINS     0.7333    0.5946    0.6567        37
           B-KPI     0.7143    0.5556    0.6250        36
     B-LIABILITY     0.5000    0.2778    0.3571        18
        B-LOSSES     0.7143    0.7143    0.7143         7
        B-PROFIT     0.8462    0.8919    0.8684        37
       B-REVENUE     0.7385    0.8000    0.7680        60
         I-ASSET     0.8000    0.3636    0.5000        11
      I-CASHFLOW     0.9091    0.9091    0.9091        11
       I-EXPENSE     0.7451    0.6230    0.6786        61
I-FREE_CASH_FLOW     1.0000    1.0000    1.0000        17
         I-GAINS     0.8333    0.6667    0.7407        30
           I-KPI     0.8500    0.5000    0.6296        34
     I-LIABILITY     0.5000    0.5000    0.5000         6
        I-LOSSES     0.7143    0.6250    0.6667         8
        I-PROFIT     0.8621    0.9615    0.9091        26
       I-REVENUE     0.7600    0.7308    0.7451        26
               O     0.9839    0.9923    0.9880      8660