Financial Twitter News Sentiment Analysis

Description

This model is designed to perform sentiment analysis on Twitter data, extracting three primary sentiments: Bullish, Bearish, and Neutral.

Predicted Entities

Bearish, Bullish, Neutral

Download Copy S3 URI

How to use

document_assembler = nlp.DocumentAssembler() \
    .setInputCol('text') \
    .setOutputCol('document')

tokenizer = nlp.Tokenizer() \
    .setInputCols(['document']) \
    .setOutputCol('token')

sequenceClassifier = finance.BertForSequenceClassification.pretrained("finclf_bert_twitter_financial_news_sentiment", "en", "finance/models")\
  .setInputCols(["document",'token'])\
  .setOutputCol("class")
  
pipeline = nlp.Pipeline(stages=[
    document_assembler, 
    tokenizer,
    sequenceClassifier  
])

data = [["""$MPLX $MPC - MPLX cut at Credit Suisse on potential dilution from Marathon strategic review https://t.co/0BFQy4ZU6W"""],["""Biogen stock price target raised to $392 from $320 at Instinet"""],["""Luckin Coffee shares halted in premarket; news pending https://t.co/6Kz4NwnNFN"""]]

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_data)

example = model.transform(spark.createDataFrame(data).toDF("text"))

example.select("text", "class.result").show(truncate=False)

Results

+-------------------------------------------------------------------------------------------------------------------+---------+
|text                                                                                                               |result   |
+-------------------------------------------------------------------------------------------------------------------+---------+
|$MPLX $MPC - MPLX cut at Credit Suisse on potential dilution from Marathon strategic review https://t.co/0BFQy4ZU6W|[Bearish]|
|Biogen stock price target raised to $392 from $320 at Instinet                                                     |[Bullish]|
|Luckin Coffee shares halted in premarket; news pending https://t.co/6Kz4NwnNFN                                     |[Neutral]|
+-------------------------------------------------------------------------------------------------------------------+---------+

Model Information

Model Name:	finclf_bert_twitter_financial_news_sentiment
Compatibility:	Finance NLP 1.0.0+
License:	Licensed
Edition:	Official
Input Labels:	[document, token]
Output Labels:	[class]
Language:	en
Size:	406.4 MB
Case sensitive:	true
Max sentence length:	512

References

In-house annotations on financial reports

Benchmarking

label              precision    recall  f1-score   support
     Bearish       0.80      0.72      0.76       379
     Bullish       0.82      0.78      0.80       468
     Neutral       0.90      0.94      0.92      1540
    accuracy                           0.87      2387
   macro-avg       0.84      0.81      0.83      2387
weighted-avg       0.87      0.87      0.87      2387

PREVIOUSSentence Entity Resolver for ICD10-CM (general 3 character codes)

NEXTFinancial Twitter Texts Sentiment Analysis