Description
This is a Lithuanian Sentiment Analysis Text Classifier, which will retrieve if a text is either expression a Positive Emotion or a Negative one.
Predicted Entities
POS,NEG
How to use
# Test classifier in Spark NLP pipeline
document_assembler = nlp.DocumentAssembler() \
.setInputCol('text') \
.setOutputCol('document')
tokenizer = nlp.Tokenizer() \
.setInputCols(['document']) \
.setOutputCol('token')
# Load newly trained classifier
sequenceClassifier_loaded = finance.BertForSequenceClassification.pretrained("finclf_bert_sentiment_analysis", "lt", "finance/models")\
.setInputCols(["document",'token'])\
.setOutputCol("class")
pipeline = nlp.Pipeline(stages=[
document_assembler,
tokenizer,
sequenceClassifier_loaded
])
# Generating example
example = spark.createDataFrame([["Pagalbos paraðiuto laukiantis verslas priemones vertina teigiamai tik yra keli „jeigu“"]]).toDF("text")
result = pipeline.fit(example).transform(example)
# Checking results
result.select("text", "class.result").show(truncate=False)
Results
+---------------------------------------------------------------------------------------+------+
|text |result|
+---------------------------------------------------------------------------------------+------+
|Pagalbos paraðiuto laukiantis verslas priemones vertina teigiamai tik yra keli „jeigu“|[POS] |
+---------------------------------------------------------------------------------------+------+
Model Information
| Model Name: | finclf_bert_sentiment_analysis |
| Compatibility: | Finance NLP 1.0.0+ |
| License: | Licensed |
| Edition: | Official |
| Input Labels: | [document, token] |
| Output Labels: | [class] |
| Language: | lt |
| Size: | 406.6 MB |
| Case sensitive: | true |
| Max sentence length: | 128 |
References
An in-house augmented version of this dataset removing NEU tag
Benchmarking
label precision recall f1-score support
NEG 0.80 0.76 0.78 509
POS 0.90 0.92 0.91 1167
accuracy - - 0.87 1676
macro-avg 0.85 0.84 0.84 1676
weighted-avg 0.87 0.87 0.87 1676