Financial Assertion of Sentiment (sm, Small)

Description

This assertion model classifies financial entities into a sentiment. It is designed to be used together with the associated NER model.

Predicted Entities

POSITIVE, NEGATIVE, NEUTRAL

Download Copy S3 URI

How to use

documentAssembler = (
    nlp.DocumentAssembler().setInputCol("text").setOutputCol("document")
)

# Sentence Detector annotator, processes various sentences per line
sentenceDetector = (
    nlp.SentenceDetector()
    .setInputCols(["document"])
    .setOutputCol("sentence")
)

# Tokenizer splits words in a relevant format for NLP
tokenizer = (
    nlp.Tokenizer().setInputCols(["sentence"]).setOutputCol("token")
)

bert_embeddings = (
    nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base", "en")
    .setInputCols("document", "token")
    .setOutputCol("embeddings")
    .setMaxSentenceLength(512)
)

clinical_ner = (
    finance.NerModel.pretrained("finner_absa_sm", "en", "finance/models")
    .setInputCols(["sentence", "token", "embeddings"])
    .setOutputCol("ner")
)

ner_converter = (
    finance.NerConverterInternal()
    .setInputCols(["sentence", "token", "ner"])
    .setOutputCol("ner_chunk")
)

assertion_model = (
    finance.AssertionDLModel.pretrained("finassertion_absa_sm", "en", "finance/models")
    .setInputCols(["sentence", "ner_chunk", "embeddings"])
    .setOutputCol("assertion")
)

nlpPipeline = nlp.Pipeline(
    stages=[
        documentAssembler,
        sentenceDetector,
        tokenizer,
        bert_embeddings,
        clinical_ner,
        ner_converter,
        assertion_model,
    ]
)


text = "Equity and earnings of affiliates in Latin America increased to $4.8 million in the quarter from $2.2 million in the prior year as the commodity markets in Latin America remain strong through the end of the quarter."

spark_df = spark.createDataFrame([[text]]).toDF("text")

result = model.fit(spark_df ).transform(spark_df)

result.select(
    F.explode(
        F.arrays_zip("ner_chunk.result", "ner_chunk.metadata")
    ).alias("cols")
).select(
    F.expr("cols['0']").alias("entity"),
    F.expr("cols['1']['entity']").alias("label"),
).show(
    50, truncate=False
)

Results

+--------+---------+
|entity  |label    |
+--------+---------+
|Equity  |LIABILITY|
|earnings|PROFIT   |
+--------+---------+

Model Information

Model Name:	finassertion_absa_sm
Compatibility:	Finance NLP 1.0.0+
License:	Licensed
Edition:	Official
Input Labels:	[document, chunk, embeddings]
Output Labels:	[assertion]
Language:	en
Size:	2.7 MB

References

In-house annotations of earning call transcripts.

Benchmarking

     label    precision    recall  f1-score   support

    NEGATIVE       0.57      0.42      0.48        74
     NEUTRAL       0.51      0.70      0.59       184
    POSITIVE       0.75      0.64      0.69       324

PREVIOUSLegal Multilabel Classifier on Law Stack Exchange

NEXTFinance E5 Embedding Base