sparknlp.annotator.sentiment.vivekn_sentiment#

Contains classes for ViveknSentiment.

Module Contents#

Classes#

ViveknSentimentApproach

Trains a sentiment analyser inspired by the algorithm by Vivek Narayanan.

ViveknSentimentModel

Sentiment analyser inspired by the algorithm by Vivek Narayanan.

class ViveknSentimentApproach[source]#

Trains a sentiment analyser inspired by the algorithm by Vivek Narayanan.

The analyzer requires sentence boundaries to give a score in context. Tokenization is needed to make sure tokens are within bounds. Transitivity requirements are also required.

The training data needs to consist of a column for normalized text and a label column (either "positive" or "negative").

For extended examples of usage, see the Spark NLP Workshop.

Input Annotation types

Output Annotation type

TOKEN, DOCUMENT

SENTIMENT

Parameters:
sentimentCol

column with the sentiment result of every row. Must be ‘positive’ or ‘negative’

pruneCorpus

Removes unfrequent scenarios from scope. The higher the better performance. Defaults 1

References

The algorithm is based on the paper “Fast and accurate sentiment classification using an enhanced Naive Bayes model”.

https://github.com/vivekn/sentiment/

Examples

>>> import sparknlp
>>> from sparknlp.base import *
>>> from sparknlp.annotator import *
>>> from pyspark.ml import Pipeline
>>> document = DocumentAssembler() \
...     .setInputCol("text") \
...     .setOutputCol("document")
>>> token = Tokenizer() \
...     .setInputCols(["document"]) \
...     .setOutputCol("token")
>>> normalizer = Normalizer() \
...     .setInputCols(["token"]) \
...     .setOutputCol("normal")
>>> vivekn = ViveknSentimentApproach() \
...     .setInputCols(["document", "normal"]) \
...     .setSentimentCol("train_sentiment") \
...     .setOutputCol("result_sentiment")
>>> finisher = Finisher() \
...     .setInputCols(["result_sentiment"]) \
...     .setOutputCols("final_sentiment")
>>> pipeline = Pipeline().setStages([document, token, normalizer, vivekn, finisher])
>>> training = spark.createDataFrame([
...     ("I really liked this movie!", "positive"),
...     ("The cast was horrible", "negative"),
...     ("Never going to watch this again or recommend it to anyone", "negative"),
...     ("It's a waste of time", "negative"),
...     ("I loved the protagonist", "positive"),
...     ("The music was really really good", "positive")
... ]).toDF("text", "train_sentiment")
>>> pipelineModel = pipeline.fit(training)
>>> data = spark.createDataFrame([
...     ["I recommend this movie"],
...     ["Dont waste your time!!!"]
... ]).toDF("text")
>>> result = pipelineModel.transform(data)
>>> result.select("final_sentiment").show(truncate=False)
+---------------+
|final_sentiment|
+---------------+
|[positive]     |
|[negative]     |
+---------------+
sentimentCol[source]#
pruneCorpus[source]#
importantFeatureRatio[source]#
unimportantFeatureStep[source]#
featureLimit[source]#
setSentimentCol(self, value)[source]#

Sets column with the sentiment result of every row.

Must be either ‘positive’ or ‘negative’.

Parameters:
valuestr

Name of the column

setPruneCorpus(self, value)[source]#

Sets the removal of unfrequent scenarios from scope, by default 1.

The higher the better performance.

Parameters:
valueint

The frequency

class ViveknSentimentModel(classname='com.johnsnowlabs.nlp.annotators.sda.vivekn.ViveknSentimentModel', java_model=None)[source]#

Sentiment analyser inspired by the algorithm by Vivek Narayanan.

This is the instantiated model of the ViveknSentimentApproach. For training your own model, please see the documentation of that class.

The analyzer requires sentence boundaries to give a score in context. Tokenization is needed to make sure tokens are within bounds. Transitivity requirements are also required.

For extended examples of usage, see the Spark NLP Workshop.

Input Annotation types

Output Annotation type

TOKEN, DOCUMENT

SENTIMENT

Parameters:
None

References

The algorithm is based on the paper “Fast and accurate sentiment classification using an enhanced Naive Bayes model”.

https://github.com/vivekn/sentiment/

name = ViveknSentimentModel[source]#
importantFeatureRatio[source]#
unimportantFeatureStep[source]#
featureLimit[source]#
static pretrained(name='sentiment_vivekn', lang='en', remote_loc=None)[source]#

Downloads and loads a pretrained model.

Parameters:
namestr, optional

Name of the pretrained model, by default “sentiment_vivekn”

langstr, optional

Language of the pretrained model, by default “en”

remote_locstr, optional

Optional remote address of the resource, by default None. Will use Spark NLPs repositories otherwise.

Returns:
ViveknSentimentModel

The restored model