Financial Finbert Sentiment Analysis

Description

This model is a pre-trained NLP model to analyze sentiment of financial text. It is built by further training the BERT language model in the finance domain, using a large financial corpus and thereby fine-tuning it for financial sentiment classification. Financial PhraseBank by Malo et al. (2014) and in-house JSL documents and annotations have been used for fine-tuning.

Predicted Entities

positive, negative, neutral

Live Demo Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler() \
    .setInputCol('text') \
    .setOutputCol('document')

tokenizer = nlp.Tokenizer() \
    .setInputCols(['document']) \
    .setOutputCol('token')

sequenceClassifier_loaded = finance.BertForSequenceClassification.pretrained("finclf_bert_sentiment_phrasebank", "en", "finance/models")\
  .setInputCols(["document",'token'])\
  .setOutputCol("class")

pipeline = nlp.Pipeline(stages=[
    document_assembler, 
    tokenizer,
    sequenceClassifier_loaded    
])

# couple of simple examples
example = spark.createDataFrame([["Stocks rallied and the British pound gained."]]).toDF("text")

result = pipeline.fit(example).transform(example)

# result is a DataFrame
result.select("text", "class.result").show()

Results

+--------------------+----------+
|                text|    result|
+--------------------+----------+
|Stocks rallied an...|[positive]|
+--------------------+----------+

Model Information

Model Name: finclf_bert_sentiment_phrasebank
Type: finance
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [document, token]
Output Labels: [class]
Language: en
Size: 409.9 MB
Case sensitive: true
Max sentence length: 512

References

In-house financial documents and Financial PhraseBank by Malo et al. (2014)

Benchmarking

       label  precision    recall  f1-score   support
    positive       0.76      0.89      0.82       253
    negative       0.87      0.86      0.87       133
     neutral       0.94      0.87      0.90       584
    accuracy         -         -       0.87       970
   macro-avg       0.86      0.87      0.86       970
weighted-avg       0.88      0.87      0.88       970