sparknlp.functions.filter_by_annotations_col

sparknlp.functions.filter_by_annotations_col(dataframe, f, column)[source]

Applies a filter over a column of Annotations.

Parameters
dataframeDataFrame

Input DataFrame

ffunction

Filter function

columnstr

Name of the column

Returns
pyspark.sql.DataFrame

Filtered DataFrame

Examples

>>> from sparknlp.pretrained import PretrainedPipeline
>>> from sparknlp.functions import *
>>> explain_document_pipeline = PretrainedPipeline("explain_document_dl")
>>> data = spark.createDataFrame([["U.N. official Ekeus heads for Baghdad."]]).toDF("text")
>>> result = explain_document_pipeline.transform(data)
>>> def filter_pos(annotation: Annotation):
...     return annotation.result == "NNP"
>>> filter_by_annotations_col(
...     explode_annotations_col(result, "pos", "pos"), filter_pos, "pos"
... ).select("pos").show(truncate=False)
+-----------------------------------------+
|pos                                      |
+-----------------------------------------+
|[pos, 0, 2, NNP, [word -> U.N], []]      |
|[pos, 14, 18, NNP, [word -> Epeus], []]  |
|[pos, 30, 36, NNP, [word -> Baghdad], []]|
+-----------------------------------------+