Description
This Sentiment Analysis Text Classifier has been trained on a collection of financial news articles and tweets that have been labeled with three different classes: Bullish
, Bearish
and Neutral
. The dataset on which the model has been trained on covers a wide range of financial topics including stocks, bonds, currencies, and commodities.
Predicted Entities
Bearish
, Bullish
, Neutral
How to use
document_assembler = nlp.DocumentAssembler() \
.setInputCol('text') \
.setOutputCol('document')
tokenizer = nlp.Tokenizer() \
.setInputCols(['document']) \
.setOutputCol('token')
sequenceClassifier_loaded = finance.BertForSequenceClassification.pretrained("finclf_bert_news_tweets_sentiment_analysis", "en", "finance/models")\
.setInputCols(["document",'token'])\
.setOutputCol("class")
pipeline = nlp.Pipeline(stages=[
document_assembler,
tokenizer,
sequenceClassifier_loaded
])
# Generating example
example = spark.createDataFrame([['''Operating profit , excluding non-recurring items , totaled EUR 0.2 mn , down from EUR 0.8 mn in the corresponding period in 2006 .''']]).toDF("text")
result = pipeline.fit(example).transform(example)
# Checking results
result.select("text", "class.result").show(truncate=False)
Results
+----------------------------------------------------------------------------------------------------------------------------------+---------+
|text |result |
+----------------------------------------------------------------------------------------------------------------------------------+---------+
|Operating profit , excluding non-recurring items , totaled EUR 0.2 mn , down from EUR 0.8 mn in the corresponding period in 2006 .|[Bearish]|
+----------------------------------------------------------------------------------------------------------------------------------+---------+
Model Information
Model Name: | finclf_bert_news_tweets_sentiment_analysis |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document, token] |
Output Labels: | [class] |
Language: | en |
Size: | 402.4 MB |
Case sensitive: | true |
Max sentence length: | 128 |
References
In-house dataset
Benchmarking
label precision recall f1-score support
Bearish 0.84 0.88 0.86 487
Bullish 0.87 0.91 0.89 872
Neutral 0.90 0.84 0.87 1001
accuracy - - 0.87 2360
macro-avg 0.87 0.88 0.87 2360
weighted-avg 0.87 0.87 0.87 2360