Financial Finbert Sentiment Analysis

Description

This model is a pre-trained NLP model to analyze sentiment of financial text. It is built by further training the BERT language model in the finance domain, using a large financial corpus and thereby fine-tuning it for financial sentiment classification. Financial PhraseBank by Malo et al. (2014) and in-house JSL documents and annotations have been used for fine-tuning.

Predicted Entities

positive, negative, neutral

Live Demo Download Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler() \
    .setInputCol('text') \
    .setOutputCol('document')

tokenizer = nlp.Tokenizer() \
    .setInputCols(['document']) \
    .setOutputCol('token')

sequenceClassifier_loaded = finance.BertForSequenceClassification.pretrained("finclf_bert_sentiment_phrasebank", "en", "finance/models")\
  .setInputCols(["document",'token'])\
  .setOutputCol("class")

pipeline = nlp.Pipeline(stages=[
    document_assembler, 
    tokenizer,
    sequenceClassifier_loaded    
])

# couple of simple examples
example = spark.createDataFrame([["Stocks rallied and the British pound gained."]]).toDF("text")

result = pipeline.fit(example).transform(example)

# result is a DataFrame
result.select("text", "class.result").show()

Results

+--------------------+----------+
|                text|    result|
+--------------------+----------+
|Stocks rallied an...|[positive]|
+--------------------+----------+

Model Information

Model Name:	finclf_bert_sentiment_phrasebank
Type:	finance
Compatibility:	Finance NLP 1.0.0+
License:	Licensed
Edition:	Official
Input Labels:	[document, token]
Output Labels:	[class]
Language:	en
Size:	409.9 MB
Case sensitive:	true
Max sentence length:	512

References

In-house financial documents and Financial PhraseBank by Malo et al. (2014)

Benchmarking

       label  precision    recall  f1-score   support
    positive       0.76      0.89      0.82       253
    negative       0.87      0.86      0.87       133
     neutral       0.94      0.87      0.90       584
    accuracy         -         -       0.87       970
   macro-avg       0.86      0.87      0.86       970
weighted-avg       0.88      0.87      0.88       970

PREVIOUSESG Text Classification (3 classes)

NEXTReceipts Binary Classification