sparknlp.functions.map_annotations_cols

sparknlp.functions.map_annotations_cols(dataframe: pyspark.sql.dataframe.DataFrame, f, columns: list, output_column: str, annotatyon_type: str, output_type: pyspark.sql.types.DataType = ArrayType(StructType(List(StructField(annotatorType, StringType, false), StructField(begin, IntegerType, false), StructField(end, IntegerType, false), StructField(result, StringType, false), StructField(metadata, MapType(StringType, StringType, true), false), StructField(embeddings, ArrayType(FloatType, true), false))), true))[source]

Creates a Spark UDF to map over multiple columns of Annotation results.

Parameters
dataframeDataFrame

Input DataFrame

ffunction

Function to apply to the column

columnslist

Name of the input column

output_columnstr

Name of the output column

annotatyon_typestr

Annotator type

output_typeDataType, optional

Output type, by default Annotation.arrayType()

Returns
pyspark.sql.DataFrame

Transformed DataFrame

Examples

>>> from sparknlp.pretrained import PretrainedPipeline
>>> from sparknlp.functions import *
>>> explain_document_pipeline = PretrainedPipeline("explain_document_dl")
>>> data = spark.createDataFrame([["U.N. official Ekeus heads for Baghdad."]]).toDF("text")
>>> result = explain_document_pipeline.transform(data)
>>> chunks_df = map_annotations_cols(
...     result,
...     lambda x: [
...         Annotation("tag", a.begin, a.end, a.result, a.metadata, a.embeddings)
...         for a in x
...     ],
...     ["pos", "ner"],
...     "tags",
...     "chunk"
... )
>>> chunks_df.selectExpr("explode(tags)").show(truncate=False)
+-------------------------------------------+
|col                                        |
+-------------------------------------------+
|[tag, 0, 2, NNP, [word -> U.N], []]        |
|[tag, 3, 3, ., [word -> .], []]            |
|[tag, 5, 12, JJ, [word -> official], []]   |
|[tag, 14, 18, NNP, [word -> Epeus], []]    |
|[tag, 20, 24, VBZ, [word -> heads], []]    |
|[tag, 26, 28, IN, [word -> for], []]       |
|[tag, 30, 36, NNP, [word -> Baghdad], []]  |
|[tag, 37, 37, ., [word -> .], []]          |
|[tag, 0, 2, B-ORG, [word -> U.N], []]      |
|[tag, 3, 3, O, [word -> .], []]            |
|[tag, 5, 12, O, [word -> official], []]    |
|[tag, 14, 18, B-PER, [word -> Ekeus], []]  |
|[tag, 20, 24, O, [word -> heads], []]      |
|[tag, 26, 28, O, [word -> for], []]        |
|[tag, 30, 36, B-LOC, [word -> Baghdad], []]|
|[tag, 37, 37, O, [word -> .], []]          |
+-------------------------------------------+