In this section, we present with different example use cases of both training and running predictions with SparkNLP in Python PySpark. Please lookup our Annotators page for reference.
In the following example, we walk-through Sentiment Analysis training and prediction using Spark NLP Annotators.
The ViveknSentimentApproach annotator will compute Vivek Narayanan algorithm with either a column in training dataset with rows labelled 'positive' or 'negative' or a folder full of positive text and a folder with negative text. Using n-grams and negation of sequences, this statistical model can achieve high accuracy if trained properly.
Spark can be leveraged in training by utilizing ReadAs.Dataset setting. Spark will be used during prediction by default.
We also include in this pipeline a spell checker which shall correct our sentences for better Sentiment Analysis accuracy
In the following example, we walk-through a simple use case for our straight forward SentimentDetector annotator.
This annotator will work on top of a list of labeled sentences which can have any of the following features
In the following example, we walk-through a Conditional Random Fields NER model training and prediction.
This challenging annotator will require the user to provide either a labeled dataset during fit() stage, or use external CoNLL 2003 resources to train. It may optionally use an external word embeddings set and a list of additional entities to extract.
The CRF Annotator will also required Part-of-speech tags so we add those in the same Pipeline. Also, we could use our special RecursivePipeline, which will tell SparkNLP's NER CRF approach to use the same pipeline for tagging external resources.
In the following example, we walk-through a Convolutional neural network + LSTM NER model training and prediction. This annotator is implemented on top of TensorFlow.
This annotator will take a series of word embedding vectors, training CoNLL dataset, plus a validation dataset. We include our own predefined Tensorflow Graphs, but it will train all layers during fit() stage.
DL NER will compute several layers of BI-LSTM in order to auto generate entity extraction, and it will leverage batch-based distributed calls to native TensorFlow libraries during prediction.
In the following example, we walk-through our straight forward Text Matcher
This annotator will take a list of sentences in a text file and lookup them in the target dataset.
This annotator is an AnnotatorModel and does not require training.
In the following example, we walk-through a negation LogReg based annotator, which will be able to identify whether a scenario is happening or not.
This machine learning AnnotatorApproach will take a set of Word embeddings vector, and compute the training only with them. The prediction of a negated/not negated dataset will return the appropriate result.
In the following example, we walk-through a Deep Learning based annotator, which will be able to identify whether a scenario is happening or not.
This AnnotatorApproach will utilize precomputed TensorFlow graphs, and learn from a series of word embedding vectors. The rest, is all on Spark NLP. TensorFlow graphs may be redesigned if needed.
In the following example, we walk-through different use cases of our newest resource downloader.
Some of our models may be retrieved by using the AnnotatorModel class, such as the PerceptronModel for retrieveing a POS model. For Pipelines, we have Basic and AdvancedPipelines which are predesigned, but we also allow downloading other models and Pipelines by name.
Such components may then be injected seamlessly into further pipelines, and so on.