Description
This model is designed to perform sentiment analysis on Twitter data, extracting three primary sentiments: Bullish
, Bearish
, and Neutral
.
Predicted Entities
Bearish
, Bullish
, Neutral
How to use
document_assembler = nlp.DocumentAssembler() \
.setInputCol('text') \
.setOutputCol('document')
tokenizer = nlp.Tokenizer() \
.setInputCols(['document']) \
.setOutputCol('token')
sequenceClassifier = finance.BertForSequenceClassification.pretrained("finclf_bert_twitter_financial_news_sentiment", "en", "finance/models")\
.setInputCols(["document",'token'])\
.setOutputCol("class")
pipeline = nlp.Pipeline(stages=[
document_assembler,
tokenizer,
sequenceClassifier
])
data = [["""$MPLX $MPC - MPLX cut at Credit Suisse on potential dilution from Marathon strategic review https://t.co/0BFQy4ZU6W"""],["""Biogen stock price target raised to $392 from $320 at Instinet"""],["""Luckin Coffee shares halted in premarket; news pending https://t.co/6Kz4NwnNFN"""]]
empty_data = spark.createDataFrame([[""]]).toDF("text")
model = pipeline.fit(empty_data)
example = model.transform(spark.createDataFrame(data).toDF("text"))
example.select("text", "class.result").show(truncate=False)
Results
+-------------------------------------------------------------------------------------------------------------------+---------+
|text |result |
+-------------------------------------------------------------------------------------------------------------------+---------+
|$MPLX $MPC - MPLX cut at Credit Suisse on potential dilution from Marathon strategic review https://t.co/0BFQy4ZU6W|[Bearish]|
|Biogen stock price target raised to $392 from $320 at Instinet |[Bullish]|
|Luckin Coffee shares halted in premarket; news pending https://t.co/6Kz4NwnNFN |[Neutral]|
+-------------------------------------------------------------------------------------------------------------------+---------+
Model Information
Model Name: | finclf_bert_twitter_financial_news_sentiment |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document, token] |
Output Labels: | [class] |
Language: | en |
Size: | 406.4 MB |
Case sensitive: | true |
Max sentence length: | 512 |
References
In-house annotations on financial reports
Benchmarking
label precision recall f1-score support
Bearish 0.80 0.72 0.76 379
Bullish 0.82 0.78 0.80 468
Neutral 0.90 0.94 0.92 1540
accuracy 0.87 2387
macro-avg 0.84 0.81 0.83 2387
weighted-avg 0.87 0.87 0.87 2387