Sentiment Analysis for Thai (sentiment_jager_use)

Description

Analyze sentiment in reviews by classifying them as positive and negative. When the sentiment probability is below a customizable threshold (default to 0.6) then resulting document will be labeled as neutral. This model is trained using the multilingual UniversalSentenceEncoder sentence embeddings, and uses DL approach to classify the sentiments.

Predicted Entities

positive, negative, neutral

Open in Colab Download Copy S3 URI

How to use

Use in the pipeline with the pretrained multi-language UniversalSentenceEncoder annotator tfhub_use_multi_lg.

document_assembler = DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

use = UniversalSentenceEncoder.pretrained("tfhub_use_multi_lg", "xx") \
    .setInputCols(["document"])\
    .setOutputCol("sentence_embeddings")
sentimentdl = SentimentDLModel.pretrained("sentiment_jager_use", "th")\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("sentiment")
pipeline = Pipeline(stages = [document_assembler, use, sentimentdl])

example = spark.createDataFrame([['เเพ้ตอนnctโผล่มาตลอดเลยค่ะเเอด5555555']], ["text"])
result = pipeline.fit(example).transform(example)

val document_assembler = DocumentAssembler()
        .setInputCol("text")
        .setOutputCol("document")

val use = UniversalSentenceEncoder.pretrained("tfhub_use_multi_lg", "xx")
    .setInputCols(Array("document")
    .setOutputCol("sentence_embeddings")
val sentimentdl = SentimentDLModel.pretrained("sentiment_jager_use", "th")
    .setInputCols(Array("sentence_embeddings"))
    .setOutputCol("sentiment")
val pipeline = new Pipeline().setStages(Array(document_assembler, use, sentimentdl))
val data = Seq("เเพ้ตอนnctโผล่มาตลอดเลยค่ะเเอด5555555").toDF("text")
val result = pipeline.fit(data).transform(data)

import nlu

text = ["""เเพ้ตอนnctโผล่มาตลอดเลยค่ะเเอด5555555"""]
sentiment_df = nlu.load('th.classify.sentiment').predict(text)
sentiment_df

Results

+-------------------------------------+----------+
|text                                 |result    |
+-------------------------------------+----------+
|เเพ้ตอนnctโผล่มาตลอดเลยค่ะเเอด5555555  |[positive] |
+-------------------------------------+----------+

Model Information

Model Name:	sentiment_jager_use
Compatibility:	Spark NLP 2.7.1+
License:	Open Source
Edition:	Official
Input Labels:	[sentence_embeddings]
Output Labels:	[sentiment]
Language:	th

Data Source

The model was trained on the custom corpus from Jager V3.

Benchmarking

| sentiment    | precision | recall | f1-score | support |
|--------------|-----------|--------|----------|---------|
| negative     | 0.94      | 0.99   | 0.96     | 82      |
| positive     | 0.97      | 0.87   | 0.92     | 38      |
| accuracy     |           |        | 0.95     | 120     |
| macro avg    | 0.96      | 0.93   | 0.94     | 120     |
| weighted avg | 0.95      | 0.95   | 0.95     | 120     |

PREVIOUSThai Word Segmentation

NEXTSentiment Analysis of IMDB Reviews Pipeline (analyze_sentimentdl_glove_imdb)