Description
This is a Lithuanian Sentiment Analysis Text Classifier, which will retrieve if a text is either expression a Positive Emotion or a Negative one.
Predicted Entities
POS,NEG
How to use
# Test classifier in Spark NLP pipeline
document_assembler = nlp.DocumentAssembler() \
    .setInputCol('text') \
    .setOutputCol('document')
tokenizer = nlp.Tokenizer() \
    .setInputCols(['document']) \
    .setOutputCol('token')
# Load newly trained classifier
sequenceClassifier_loaded = finance.BertForSequenceClassification.pretrained("finclf_bert_sentiment_analysis", "lt", "finance/models")\
  .setInputCols(["document",'token'])\
  .setOutputCol("class")
pipeline = nlp.Pipeline(stages=[
    document_assembler, 
    tokenizer,
    sequenceClassifier_loaded    
])
# Generating example
example = spark.createDataFrame([["Pagalbos paraðiuto laukiantis verslas priemones vertina teigiamai  tik yra keli „jeigu“"]]).toDF("text")
result = pipeline.fit(example).transform(example)
# Checking results
result.select("text", "class.result").show(truncate=False)
Results
+---------------------------------------------------------------------------------------+------+
|text                                                                                   |result|
+---------------------------------------------------------------------------------------+------+
|Pagalbos paraðiuto laukiantis verslas priemones vertina teigiamai  tik yra keli „jeigu“|[POS] |
+---------------------------------------------------------------------------------------+------+
Model Information
| Model Name: | finclf_bert_sentiment_analysis | 
| Compatibility: | Finance NLP 1.0.0+ | 
| License: | Licensed | 
| Edition: | Official | 
| Input Labels: | [document, token] | 
| Output Labels: | [class] | 
| Language: | lt | 
| Size: | 406.6 MB | 
| Case sensitive: | true | 
| Max sentence length: | 128 | 
References
An in-house augmented version of this dataset removing NEU tag
Benchmarking
       label    precision    recall  f1-score   support
         NEG       0.80      0.76      0.78       509
         POS       0.90      0.92      0.91      1167
    accuracy         -         -       0.87      1676
   macro-avg       0.85      0.84      0.84      1676
weighted-avg       0.87      0.87      0.87      1676