sparknlp_jsl.text_to_documents_columns#

Module Contents#

Classes#

TextToDocumentsColumns

Converts a DataFrame with columns of string texts into a DataFrame with columns of Document annotations.

class TextToDocumentsColumns(spark: pyspark.sql.SparkSession, columns: List[str])#

Converts a DataFrame with columns of string texts into a DataFrame with columns of Document annotations.

Parameters:
  • spark (SparkSession) – The current SparkSession.

  • columns (List[str]) – The name of the column to convert.

columns#
instance#
spark#
toDocumentsColumns(df: pyspark.sql.DataFrame)#

Converts a DataFrame with columns of string texts into a DataFrame with columns of Document annotations.

toDocumentsColumnsWithId(df: pyspark.sql.DataFrame, id_column: str)#

Converts a DataFrame with columns of string texts into a DataFrame with columns of Document annotations having id in the metadata.