Finance-related Tweets Classifier

Description

This is a Multiclass classification model which classifies financial tweets with one of the following topics: Company_or_Product_News, Stock_Movement, General_News_or_Opinion, Earnings, Macro, Fed_or_Central_Banks, Politics, Stock_Commentary, Financials, M&A_or_Investments, Legal_or_Regulation, Personnel_Change, Markets, Energy_or_Oil, Dividend, Analyst_Update, Treasuries_or_Corporate_Debt, Currencies.

Predicted Entities

Company_or_Product_News, Stock_Movement, General_News_or_Opinion, Earnings, Macro, Fed_or_Central_Banks, Politics, Stock_Commentary, Financials, M&A_or_Investments, Legal_or_Regulation, Personnel_Change, Markets, Energy_or_Oil, Dividend, Analyst_Update, Treasuries_or_Corporate_Debt, Currencies

Copy S3 URI

How to use

documentAssembler = nlp.DocumentAssembler() \
    .setInputCols(["text"]) \
    .setOutputCols("document")

tokenizer = nlp.Tokenizer() \
    .setInputCols("document") \
    .setOutputCol("token")

seq_classifier = finance.BertForSequenceClassification.pretrained("finclf_twitter_news", "en", "finance/models") \
    .setInputCols(["document", "token"]) \
    .setOutputCol("class")
    
pipeline = nlp.Pipeline(stages=[documentAssembler, tokenizer, seq_classifier])

data = spark.createDataFrame([["Barclays believes earnings for these underperforming stocks may surprise Wall Street"]]).toDF("text")

result = pipeline.fit(data).transform(data)

Results

+----------------+
|          result|
+----------------+
|[Analyst_Update]|
+----------------+

Model Information

Model Name: finclf_twitter_news
Compatibility: Finance NLP 1.0.0+
License: Licensed
Edition: Official
Input Labels: [document, token]
Output Labels: [class]
Language: en
Size: 408.7 MB
Case sensitive: true
Max sentence length: 512

References

Train dataset available here

Benchmarking

label                         precision  recall  f1-score  support 
Analyst_Update                0.79       0.79    0.79      38      
Company_or_Product_News       0.71       0.78    0.74      112     
Currencies                    0.80       1.00    0.89      12      
Dividend                      1.00       0.94    0.97      31      
Earnings                      0.95       0.97    0.96      100     
Energy_or_Oil                 0.78       0.89    0.83      55      
Fed_or_Central_Banks          0.82       0.78    0.80      95      
Financials                    0.90       0.93    0.92      60      
General_News_or_Opinion       0.71       0.74    0.72      80      
Legal_or_Regulation           0.85       0.75    0.80      52      
M&A_or_Investments            0.85       0.90    0.87      49      
Macro                         0.81       0.70    0.75      84      
Markets                       0.91       0.84    0.87      49      
Personnel_Change              0.96       0.94    0.95      50      
Politics                      0.83       0.82    0.82      83      
Stock_Commentary              0.87       0.94    0.90      63      
Stock_Movement                0.94       0.90    0.92      89      
Treasuries_or_Corporate_Debt  0.80       0.73    0.76      33      
accuracy                      -          -       0.84      1135    
macro-avg                     0.85       0.85    0.85      1135    
weighted-avg                  0.84       0.84    0.84      1135