Description
Analyze sentiment in reviews by classifying them as positive
and negative
. When the sentiment probability is below a customizable threshold (default to 0.6
) then resulting document will be labeled as neutral
. This model is trained using the multilingual UniversalSentenceEncoder
sentence embeddings, and uses DL approach to classify the sentiments.
Predicted Entities
positive
, negative
, neutral
How to use
Use in the pipeline with the pretrained multi-language UniversalSentenceEncoder
annotator tfhub_use_multi_lg
.
...
use = UniversalSentenceEncoder.pretrained("tfhub_use_multi_lg", "xx") \
.setInputCols(["document"])\
.setOutputCol("sentence_embeddings")
sentimentdl = SentimentDLModel.pretrained("sentiment_jager_use", "th")\
.setInputCols(["sentence_embeddings"])\
.setOutputCol("sentiment")
pipeline = Pipeline(stages = [document_assembler, use, sentimentdl])
example = spark.createDataFrame(pd.DataFrame({'text': ["เเพ้ตอนnctโผล่มาตลอดเลยค่ะเเอด5555555"]}))
result = pipeline.fit(example).transform(example)
...
val use = UniversalSentenceEncoder.pretrained("tfhub_use_multi_lg", "xx")
.setInputCols(Array("document")
.setOutputCol("sentence_embeddings")
val sentimentdl = SentimentDLModel.pretrained("sentiment_jager_use", "th")
.setInputCols(Array("sentence_embeddings"))
.setOutputCol("sentiment")
val pipeline = new Pipeline().setStages(Array(document_assembler, use, sentimentdl))
val result = pipeline.fit(Seq.empty["เเพ้ตอนnctโผล่มาตลอดเลยค่ะเเอด5555555"].toDS.toDF("text")).transform(data)
import nlu
text = ["""เเพ้ตอนnctโผล่มาตลอดเลยค่ะเเอด5555555"""]
sentiment_df = nlu.load('th.classify.sentiment').predict(text)
sentiment_df
Results
+-------------------------------------+----------+
|text |result |
+-------------------------------------+----------+
|เเพ้ตอนnctโผล่มาตลอดเลยค่ะเเอด5555555 |[positive] |
+-------------------------------------+----------+
Model Information
Model Name: | sentiment_jager_use |
Compatibility: | Spark NLP 2.7.1+ |
License: | Open Source |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [sentiment] |
Language: | th |
Data Source
The model was trained on the custom corpus from Jager V3.
Benchmarking
| sentiment | precision | recall | f1-score | support |
|--------------|-----------|--------|----------|---------|
| negative | 0.94 | 0.99 | 0.96 | 82 |
| positive | 0.97 | 0.87 | 0.92 | 38 |
| accuracy | | | 0.95 | 120 |
| macro avg | 0.96 | 0.93 | 0.94 | 120 |
| weighted avg | 0.95 | 0.95 | 0.95 | 120 |