Sentiment Analysis of Tweets (sentimentdl_use_twitter)

Description

Classify sentiment in tweets as negative or positive using Universal Sentence Encoder embeddings.

Predicted Entities

positive, negative

Live Demo Open in Colab Download

How to use

document_assembler = DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

use = UniversalSentenceEncoder.pretrained('tfhub_use', lang="en") \
    .setInputCols(["document"])\
    .setOutputCol("sentence_embeddings")

classifier = SentimentDLModel().pretrained('sentimentdl_use_twitter')\
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("sentiment")

nlp_pipeline = Pipeline(stages=[document_assembler,
                                use,
                                classifier
                                ])

l_model = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

annotations = l_model.fullAnnotate(["im meeting up with one of my besties tonight! Cant wait!!  - GIRL TALK!!", "is upset that he can't update his Facebook by texting it... and might cry as a result  School today also. Blah!"])

Results

|    | document                                                                                                         | sentiment   |
|---:|:---------------------------------------------------------------------------------------------------------------- |:------------|
|  0 | im meeting up with one of my besties tonight! Cant wait!!  - GIRL TALK!!                                         | positive    |
|  1 | is upset that he can't update his Facebook by texting it... and might cry as a result  School today also. Blah!  | negative    |

Model Information

Model Name: sentimentdl_use_twitter
Compatibility: Spark NLP 2.7.1+
License: Open Source
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [sentiment]
Language: en
Dependencies: tfhub_use

Data Source

Trained on Sentiment140 dataset comprising of 1.6M tweets. https://www.kaggle.com/kazanova/sentiment140

Benchmarking

loss: 7930.071 - acc: 0.80694044 - val_acc: 80.00508 - batches: 16000