Description
This is a Multiclass classification model which classifies financial tweets with one of the following topics: Company_or_Product_News, Stock_Movement, General_News_or_Opinion, Earnings, Macro, Fed_or_Central_Banks, Politics, Stock_Commentary, Financials, M&A_or_Investments, Legal_or_Regulation, Personnel_Change, Markets, Energy_or_Oil, Dividend, Analyst_Update, Treasuries_or_Corporate_Debt, Currencies.
Predicted Entities
Company_or_Product_News, Stock_Movement, General_News_or_Opinion, Earnings, Macro, Fed_or_Central_Banks, Politics, Stock_Commentary, Financials, M&A_or_Investments, Legal_or_Regulation, Personnel_Change, Markets, Energy_or_Oil, Dividend, Analyst_Update, Treasuries_or_Corporate_Debt, Currencies
How to use
documentAssembler = nlp.DocumentAssembler() \
.setInputCols(["text"]) \
.setOutputCols("document")
tokenizer = nlp.Tokenizer() \
.setInputCols("document") \
.setOutputCol("token")
seq_classifier = finance.BertForSequenceClassification.pretrained("finclf_twitter_news", "en", "finance/models") \
.setInputCols(["document", "token"]) \
.setOutputCol("class")
pipeline = nlp.Pipeline(stages=[documentAssembler, tokenizer, seq_classifier])
data = spark.createDataFrame([["Barclays believes earnings for these underperforming stocks may surprise Wall Street"]]).toDF("text")
result = pipeline.fit(data).transform(data)
Results
+----------------+
| result|
+----------------+
|[Analyst_Update]|
+----------------+
Model Information
| Model Name: | finclf_twitter_news |
| Compatibility: | Finance NLP 1.0.0+ |
| License: | Licensed |
| Edition: | Official |
| Input Labels: | [document, token] |
| Output Labels: | [class] |
| Language: | en |
| Size: | 408.7 MB |
| Case sensitive: | true |
| Max sentence length: | 512 |
References
Train dataset available here
Benchmarking
label precision recall f1-score support
Analyst_Update 0.79 0.79 0.79 38
Company_or_Product_News 0.71 0.78 0.74 112
Currencies 0.80 1.00 0.89 12
Dividend 1.00 0.94 0.97 31
Earnings 0.95 0.97 0.96 100
Energy_or_Oil 0.78 0.89 0.83 55
Fed_or_Central_Banks 0.82 0.78 0.80 95
Financials 0.90 0.93 0.92 60
General_News_or_Opinion 0.71 0.74 0.72 80
Legal_or_Regulation 0.85 0.75 0.80 52
M&A_or_Investments 0.85 0.90 0.87 49
Macro 0.81 0.70 0.75 84
Markets 0.91 0.84 0.87 49
Personnel_Change 0.96 0.94 0.95 50
Politics 0.83 0.82 0.82 83
Stock_Commentary 0.87 0.94 0.90 63
Stock_Movement 0.94 0.90 0.92 89
Treasuries_or_Corporate_Debt 0.80 0.73 0.76 33
accuracy - - 0.84 1135
macro-avg 0.85 0.85 0.85 1135
weighted-avg 0.84 0.84 0.84 1135