SparkNLP - FAQs & Additional Resources

Last updated:


What are the requirements of SparkNLP?

The library works on top of Spark, and nothing else. Make sure you have a working Spark environment and supply SparkNLP as a jar to the Spark JVM classpath.

Does SparkNLP rely on any NLP library?

No, SparkNLP is self contained and all algorithms are developed within the code base.

What do I need to learn in order to use the library?

Either Scala or Python, and then, mostly Spark and SparkML. SparkNLP uses the same logic and syntax than any other machine learning transformer in Spark, and can be included within the same pipelines. So only some review on the examples and you can get going.

What are annotator types?

Each annotator has a type that may be shared with other annotators. Whenever an annotator requires another annotator by a type, it means you can provide in inputCols any annotator’s output column that has such type, for instance Normalizer or SpellChecker are both token type annotators and either or both may be used for a Sentiment Analysis model.

Can I save trained models or pipelines?

Yes, the same way you would do it for any other Spark ML component.

Can I contribute?

Yes! Any kind of contribution is welcome, feedback, ideas, management, documentation, testing, corpus for training and testing, development or even code review. Refer to the contribute page for more information.

Additonal Resources

Browse through our collection of videos, blogs to deepen your knowledge and experience with spark-nlp


Natural Language Understanding at Scale with Spark Native NLP, Spark ML & TensorFlow with Alex Thomas


Building a Natural Language Processing Library for Apache Spark

In this O'Reilly Data Show Podcast, Ben Lorica spoke with David Talby of Pacific.AI on a new NLP library for Spark, and why model development starts after a model gets deployed to production…


Introducing the Natural Language Processing Library for Apache Spark

By David Talby|October 19, 2017