SparkNLP - Examples

Last updated:

Scala examples

In this section, we present with different example use cases of both training and running predictions with SparkNLP in Scala with Spark. Please lookup our Annotators page for reference.

Vivekn Sentiment Analysis

In the following example, we walk-through Sentiment Analysis training and prediction using Spark NLP Annotators, Light Pipelines and Spark ML Pipelines

The ViveknSentimentApproach annotator will compute Vivek Narayanan algorithm with either a column in training dataset with rows labelled 'positive' or 'negative' or a folder full of positive text and a folder with negative text. Using n-grams and negation of sequences, this statistical model can achieve high accuracy if trained properly.

In this use case we are training with spark datasets passed to fit() and transform(). Since we are dealing with small amounts of data, we put in practice LightPipelines.

Take me to code!

Python notebooks

In this section, we present with different example use cases of both training and running predictions with SparkNLP in Python PySpark. Please lookup our Annotators page for reference.

Vivekn Sentiment Analysis

In the following example, we walk-through Sentiment Analysis training and prediction using Spark NLP Annotators.

The ViveknSentimentApproach annotator will compute Vivek Narayanan algorithm with either a column in training dataset with rows labelled 'positive' or 'negative' or a folder full of positive text and a folder with negative text. Using n-grams and negation of sequences, this statistical model can achieve high accuracy if trained properly.

Spark can be leveraged in training by utilizing ReadAs.Dataset setting. Spark will be used during prediction by default.

We also include in this pipeline a spell checker which shall correct our sentences for better Sentiment Analysis accuracy

Take me to notebook!

Rule-based Sentiment Analysis

In the following example, we walk-through a simple use case for our straight forward SentimentDetector annotator.

This annotator will work on top of a list of labeled sentences which can have any of the following features

  • positive
  • negative
  • revert
  • increment
  • decrement
Each of these sentences will be used for giving a score to text

Take me to notebook!

CRF Named Entity Recognition

In the following example, we walk-through a Conditional Random Fields NER model training and prediction.

This challenging annotator will require the user to provide either a labeled dataset during fit() stage, or use external CoNLL 2003 resources to train. It may optionally use an external word embeddings set and a list of additional entities to extract.

The CRF Annotator will also required Part-of-speech tags so we add those in the same Pipeline. Also, we could use our special RecursivePipeline, which will tell SparkNLP's NER CRF approach to use the same pipeline for tagging external resources.

Take me to notebook!

CNN Deep Learning NER

In the following example, we walk-through a Convolutional neural network + LSTM NER model training and prediction. This annotator is implemented on top of TensorFlow.

This annotator will take a series of word embedding vectors, training CoNLL dataset, plus a validation dataset. We include our own predefined Tensorflow Graphs, but it will train all layers during fit() stage.

DL NER will compute several layers of BI-LSTM in order to auto generate entity extraction, and it will leverage batch-based distributed calls to native TensorFlow libraries during prediction.

Take me to notebook!

Simple Text Matching

In the following example, we walk-through our straight forward Text Matcher

This annotator will take a list of sentences in a text file and lookup them in the target dataset.

This annotator is an AnnotatorModel and does not require training.

Take me to notebook!

Assertion Status with LogReg

In the following example, we walk-through a negation LogReg based annotator, which will be able to identify whether a scenario is happening or not.

This machine learning AnnotatorApproach will take a set of Word embeddings vector, and compute the training only with them. The prediction of a negated/not negated dataset will return the appropriate result.

Take me to notebook!

Deep Learning Assertion Status

In the following example, we walk-through a Deep Learning based annotator, which will be able to identify whether a scenario is happening or not.

This AnnotatorApproach will utilize precomputed TensorFlow graphs, and learn from a series of word embedding vectors. The rest, is all on Spark NLP. TensorFlow graphs may be redesigned if needed.

Take me to notebook!

Retrieving Pretrained models

In the following example, we walk-through different use cases of our newest resource downloader.

Some of our models may be retrieved by using the AnnotatorModel class, such as the PerceptronModel for retrieveing a POS model. For Pipelines, we have Basic and AdvancedPipelines which are predesigned, but we also allow downloading other models and Pipelines by name.

Such components may then be injected seamlessly into further pipelines, and so on.

Take me to notebook!