Description
This model is a pre-trained NLP model to analyze sentiment of financial text. It is built by further training the BERT language model in the finance domain, using a large financial corpus and thereby fine-tuning it for financial sentiment classification. Financial PhraseBank by Malo et al. (2014) and in-house JSL documents and annotations have been used for fine-tuning.
Predicted Entities
positive
, negative
, neutral
How to use
document_assembler = nlp.DocumentAssembler() \
.setInputCol('text') \
.setOutputCol('document')
tokenizer = nlp.Tokenizer() \
.setInputCols(['document']) \
.setOutputCol('token')
sequenceClassifier_loaded = finance.BertForSequenceClassification.pretrained("finclf_bert_sentiment_phrasebank", "en", "finance/models")\
.setInputCols(["document",'token'])\
.setOutputCol("class")
pipeline = nlp.Pipeline(stages=[
document_assembler,
tokenizer,
sequenceClassifier_loaded
])
# couple of simple examples
example = spark.createDataFrame([["Stocks rallied and the British pound gained."]]).toDF("text")
result = pipeline.fit(example).transform(example)
# result is a DataFrame
result.select("text", "class.result").show()
Results
+--------------------+----------+
| text| result|
+--------------------+----------+
|Stocks rallied an...|[positive]|
+--------------------+----------+
Model Information
Model Name: | finclf_bert_sentiment_phrasebank |
Type: | finance |
Compatibility: | Finance NLP 1.0.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [document, token] |
Output Labels: | [class] |
Language: | en |
Size: | 409.9 MB |
Case sensitive: | true |
Max sentence length: | 512 |
References
In-house financial documents and Financial PhraseBank by Malo et al. (2014)
Benchmarking
label precision recall f1-score support
positive 0.76 0.89 0.82 253
negative 0.87 0.86 0.87 133
neutral 0.94 0.87 0.90 584
accuracy - - 0.87 970
macro-avg 0.86 0.87 0.86 970
weighted-avg 0.88 0.87 0.88 970