sparknlp.baseΒΆ

Contains all the basic components to create a Spark NLP Pipeline.

This module contains basic transformers and extensions to the Spark Pipeline interface. These are the LightPipeline and RecursivePipeline which offer additional functionality.

Classes

Chunk2Doc

Converts a CHUNK type column back into DOCUMENT.

Doc2Chunk

Converts DOCUMENT type annotations into CHUNK type with the contents of a chunkCol.

DocumentAssembler

Prepares data into a format that is processable by Spark NLP.

EmbeddingsFinisher

Extracts embeddings from Annotations into a more easily usable form.

Finisher

Converts annotation results into a format that easier to use.

GraphFinisher

Helper class to convert the knowledge graph from GraphExtraction into a generic format, such as RDF.

HasRecursiveFit

Properties for the implementation of the RecursivePipeline.

HasRecursiveTransform

Properties for the implementation of the RecursivePipeline.

LightPipeline

Creates a LightPipeline from a Spark PipelineModel.

RecursivePipeline

Recursive pipelines are Spark NLP specific pipelines that allow a Spark ML Pipeline to know about itself on every Pipeline Stage task.

RecursivePipelineModel

Fitted RecursivePipeline.

TokenAssembler

This transformer reconstructs a DOCUMENT type annotation from tokens, usually after these have been normalized, lemmatized, normalized, spell checked, etc, in order to use this document annotation in further annotators.