Financial Finbert Sentiment Analysis (DistilRoBerta)

Description

This model is a pre-trained NLP model to analyze sentiment of financial text. It is built by further training the DistilRoBerta language model in the finance domain, using a financial corpus and thereby fine-tuning it for financial sentiment classification. Financial PhraseBank by Malo et al. (2014) and in-house JSL documents and annotations have been used for fine-tuning.

Predicted Entities

positive, negative, neutral

Live Demo Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

tokenizer = nlp.Tokenizer() \
    .setInputCols("document") \
    .setOutputCol("token")

classifier = nlp.RoBertaForSequenceClassification.pretrained("finclf_distilroberta_sentiment_analysis","en", "finance/models") \
    .setInputCols(["document", "token"]) \
    .setOutputCol("class")


nlpPipeline = nlp.Pipeline(
      stages = [
          documentAssembler,
          tokenizer,
          classifier])
    

# couple of simple examples
example = spark.createDataFrame([["Stocks rallied and the British pound gained."]]).toDF("text")

result = nlpPipeline.fit(example).transform(example)

# result is a DataFrame
result.select("text", "class.result").show()

Results

+--------------------+----------+
|                text|    result|
+--------------------+----------+
|Stocks rallied an...|[positive]|
+--------------------+----------+

Model Information

Model Name: finclf_distilroberta_sentiment_analysis
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [document, token]
Output Labels: [class]
Language: en
Size: 309.1 MB
Case sensitive: true
Max sentence length: 256

References

In-house financial documents and Financial PhraseBank by Malo et al. (2014)

Benchmarking

       label  precision    recall  f1-score   support
    positive       0.77      0.88      0.81       253
    negative       0.86      0.85      0.88       133
     neutral       0.93      0.86      0.90       584
    accuracy         -         -       0.86       970
   macro-avg       0.85      0.86      0.85       970
weighted-avg       0.87      0.86      0.87       970