sparknlp.functions.map_annotations#
- sparknlp.functions.map_annotations(f, output_type: pyspark.sql.types.DataType)[source]#
Creates a Spark UDF to map over an Annotator’s results.
- Parameters
- ffunction
The function to be applied over the results
- output_type
pyspark.sql.types.DataType
Output type of the data
- Returns
pyspark.sql.functions.udf()
Spark UserDefinedFunction (udf)
Examples
>>> from sparknlp.pretrained import PretrainedPipeline >>> explain_document_pipeline = PretrainedPipeline("explain_document_dl") >>> data = spark.createDataFrame([["U.N. official Ekeus heads for Baghdad."]]).toDF("text") >>> result = explain_document_pipeline.transform(data)
The array type must be provided in order to tell Spark the expected output type of our column. We are using an Annotation array here.
>>> from sparknlp.functions import * >>> def nnp_tokens(annotations: List[Row]): ... return list( ... filter(lambda annotation: annotation.result == 'NNP', annotations) ... ) >>> result.select( ... map_annotations(nnp_tokens, Annotation.arrayType())('pos').alias("nnp") ... ).selectExpr("explode(nnp) as nnp").show(truncate=False) +-----------------------------------------+ |nnp | +-----------------------------------------+ |[pos, 0, 2, NNP, [word -> U.N], []] | |[pos, 14, 18, NNP, [word -> Epeus], []] | |[pos, 30, 36, NNP, [word -> Baghdad], []]| +-----------------------------------------+