`sparknlp.annotator.sentiment.vivekn_sentiment`#

Contains classes for ViveknSentiment.

Module Contents#

Classes#

`ViveknSentimentApproach`	Trains a sentiment analyser inspired by the algorithm by Vivek Narayanan.
`ViveknSentimentModel`	Sentiment analyser inspired by the algorithm by Vivek Narayanan.

class ViveknSentimentApproach[source]#

Trains a sentiment analyser inspired by the algorithm by Vivek Narayanan.

The analyzer requires sentence boundaries to give a score in context. Tokenization is needed to make sure tokens are within bounds. Transitivity requirements are also required.

The training data needs to consist of a column for normalized text and a label column (either "positive" or "negative").

For extended examples of usage, see the Examples.

Input Annotation types	Output Annotation type
`TOKEN, DOCUMENT`	`SENTIMENT`

Parameters:

sentimentCol: column with the sentiment result of every row. Must be ‘positive’ or ‘negative’
pruneCorpus: Removes unfrequent scenarios from scope. The higher the better performance. Defaults 1

References

The algorithm is based on the paper “Fast and accurate sentiment classification using an enhanced Naive Bayes model”.

vivekn/sentiment

Examples

>>> import sparknlp
>>> from sparknlp.base import *
>>> from sparknlp.annotator import *
>>> from pyspark.ml import Pipeline
>>> document = DocumentAssembler() \
...     .setInputCol("text") \
...     .setOutputCol("document")
>>> token = Tokenizer() \
...     .setInputCols(["document"]) \
...     .setOutputCol("token")
>>> normalizer = Normalizer() \
...     .setInputCols(["token"]) \
...     .setOutputCol("normal")
>>> vivekn = ViveknSentimentApproach() \
...     .setInputCols(["document", "normal"]) \
...     .setSentimentCol("train_sentiment") \
...     .setOutputCol("result_sentiment")
>>> finisher = Finisher() \
...     .setInputCols(["result_sentiment"]) \
...     .setOutputCols("final_sentiment")
>>> pipeline = Pipeline().setStages([document, token, normalizer, vivekn, finisher])
>>> training = spark.createDataFrame([
...     ("I really liked this movie!", "positive"),
...     ("The cast was horrible", "negative"),
...     ("Never going to watch this again or recommend it to anyone", "negative"),
...     ("It's a waste of time", "negative"),
...     ("I loved the protagonist", "positive"),
...     ("The music was really really good", "positive")
... ]).toDF("text", "train_sentiment")
>>> pipelineModel = pipeline.fit(training)
>>> data = spark.createDataFrame([
...     ["I recommend this movie"],
...     ["Dont waste your time!!!"]
... ]).toDF("text")
>>> result = pipelineModel.transform(data)
>>> result.select("final_sentiment").show(truncate=False)
+---------------+
|final_sentiment|
+---------------+
|[positive]     |
|[negative]     |
+---------------+

setSentimentCol(value)[source]#

Sets column with the sentiment result of every row.

Must be either ‘positive’ or ‘negative’.

Parameters:

valuestr: Name of the column

setPruneCorpus(value)[source]#

Sets the removal of unfrequent scenarios from scope, by default 1.

The higher the better performance.

Parameters:

valueint: The frequency

class ViveknSentimentModel(classname='com.johnsnowlabs.nlp.annotators.sda.vivekn.ViveknSentimentModel', java_model=None)[source]#

Sentiment analyser inspired by the algorithm by Vivek Narayanan.

This is the instantiated model of the ViveknSentimentApproach. For training your own model, please see the documentation of that class.

The analyzer requires sentence boundaries to give a score in context. Tokenization is needed to make sure tokens are within bounds. Transitivity requirements are also required.

For extended examples of usage, see the Examples.

Input Annotation types	Output Annotation type
`TOKEN, DOCUMENT`	`SENTIMENT`

Parameters:

None

References

The algorithm is based on the paper “Fast and accurate sentiment classification using an enhanced Naive Bayes model”.

vivekn/sentiment

static pretrained(name='sentiment_vivekn', lang='en', remote_loc=None)[source]#

Downloads and loads a pretrained model.

Parameters:

namestr, optional: Name of the pretrained model, by default “sentiment_vivekn”
langstr, optional: Language of the pretrained model, by default “en”
remote_locstr, optional: Optional remote address of the resource, by default None. Will use Spark NLPs repositories otherwise.

Returns:

ViveknSentimentModel: The restored model

sparknlp.annotator.sentiment.vivekn_sentiment#

Module Contents#

Classes#

`sparknlp.annotator.sentiment.vivekn_sentiment`#