sparknlp.functions.map_annotations

sparknlp.functions.map_annotations(f, output_type: pyspark.sql.types.DataType)[source]

Creates a Spark UDF to map over an Annotator’s results.

Parameters
ffunction

The function to be applied over the results

output_typepyspark.sql.types.DataType

Output type of the data

Returns
pyspark.sql.functions.udf()

Spark UserDefinedFunction (udf)

Examples

>>> from sparknlp.pretrained import PretrainedPipeline
>>> explain_document_pipeline = PretrainedPipeline("explain_document_dl")
>>> data = spark.createDataFrame([["U.N. official Ekeus heads for Baghdad."]]).toDF("text")
>>> result = explain_document_pipeline.transform(data)

The array type must be provided in order to tell Spark the expected output type of our column. We are using an Annotation array here.

>>> from sparknlp.functions import *
>>> def nnp_tokens(annotations: List[Row]):
...     return list(
...         filter(lambda annotation: annotation.result == 'NNP', annotations)
...     )
>>> result.select(
...     map_annotations(nnp_tokens, Annotation.arrayType())('pos').alias("nnp")
... ).selectExpr("explode(nnp) as nnp").show(truncate=False)
+-----------------------------------------+
|nnp                                      |
+-----------------------------------------+
|[pos, 0, 2, NNP, [word -> U.N], []]      |
|[pos, 14, 18, NNP, [word -> Epeus], []]  |
|[pos, 30, 36, NNP, [word -> Baghdad], []]|
+-----------------------------------------+