Sentiment Analysis on Financial Texts

Description

This Sentiment Analysis Text Classifier has been trained on a collection of financial news articles and tweets that have been labeled with three different classes: Bullish, Bearish and Neutral. The dataset on which the model has been trained on covers a wide range of financial topics including stocks, bonds, currencies, and commodities.

Predicted Entities

Bearish, Bullish, Neutral

Copy S3 URI

How to use

 
document_assembler = nlp.DocumentAssembler() \
    .setInputCol('text') \
    .setOutputCol('document')

tokenizer = nlp.Tokenizer() \
    .setInputCols(['document']) \
    .setOutputCol('token')

sequenceClassifier_loaded = finance.BertForSequenceClassification.pretrained("finclf_bert_news_tweets_sentiment_analysis", "en", "finance/models")\
  .setInputCols(["document",'token'])\
  .setOutputCol("class")

pipeline = nlp.Pipeline(stages=[
    document_assembler, 
    tokenizer,
    sequenceClassifier_loaded    
])

# Generating example
example = spark.createDataFrame([['''Operating profit , excluding non-recurring items , totaled EUR 0.2 mn , down from EUR 0.8 mn in the corresponding period in 2006 .''']]).toDF("text")

result = pipeline.fit(example).transform(example)

# Checking results
result.select("text", "class.result").show(truncate=False)

Results


+----------------------------------------------------------------------------------------------------------------------------------+---------+
|text                                                                                                                              |result   |
+----------------------------------------------------------------------------------------------------------------------------------+---------+
|Operating profit , excluding non-recurring items , totaled EUR 0.2 mn , down from EUR 0.8 mn in the corresponding period in 2006 .|[Bearish]|
+----------------------------------------------------------------------------------------------------------------------------------+---------+

Model Information

Model Name: finclf_bert_news_tweets_sentiment_analysis
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [document, token]
Output Labels: [class]
Language: en
Size: 402.4 MB
Case sensitive: true
Max sentence length: 128

References

In-house dataset

Benchmarking

label              precision    recall  f1-score   support
     Bearish       0.84      0.88      0.86       487
     Bullish       0.87      0.91      0.89       872
     Neutral       0.90      0.84      0.87      1001
    accuracy         -        -        0.87      2360
   macro-avg       0.87      0.88      0.87      2360
weighted-avg       0.87      0.87      0.87      2360