sparknlp.functions.filter_by_annotations_col#
- sparknlp.functions.filter_by_annotations_col(dataframe, f, column)[source]#
Applies a filter over a column of Annotations.
- Parameters
- dataframeDataFrame
Input DataFrame
- ffunction
Filter function
- columnstr
Name of the column
- Returns
pyspark.sql.DataFrame
Filtered DataFrame
Examples
>>> from sparknlp.pretrained import PretrainedPipeline >>> from sparknlp.functions import * >>> explain_document_pipeline = PretrainedPipeline("explain_document_dl") >>> data = spark.createDataFrame([["U.N. official Ekeus heads for Baghdad."]]).toDF("text") >>> result = explain_document_pipeline.transform(data) >>> def filter_pos(annotation: Annotation): ... return annotation.result == "NNP" >>> filter_by_annotations_col( ... explode_annotations_col(result, "pos", "pos"), filter_pos, "pos" ... ).select("pos").show(truncate=False) +-----------------------------------------+ |pos | +-----------------------------------------+ |[pos, 0, 2, NNP, [word -> U.N], []] | |[pos, 14, 18, NNP, [word -> Epeus], []] | |[pos, 30, 36, NNP, [word -> Baghdad], []]| +-----------------------------------------+