Packages

class DocumentFiltererByNER extends AnnotatorModel[DocumentFiltererByNER] with HasSimpleAnnotate[DocumentFiltererByNER] with WhiteAndBlackListParams with CheckLicense

Filters documents by the entity fields of the chunks. Documents are filtered by the white list and black list. The white list is a list of classifier results that are allowed to pass the filter. The black list is a list of classifier results that are not allowed to pass the filter. The filter is case sensitive. If the caseSensitive is set to false, the filter is case in-sensitive. If the outputAsDocument is set to true, the output will be a single document with all sentences joined. The joinString parameter is used to add the delimiter between results of annotations when combining them into a single result.

The input annotators are expected to be of type DOCUMENT and CHUNK. The output annotation type is DOCUMENT.

Note

A document may contain multiple chunks. If any of the chunks in the document is in the white list, the document will pass the filter. And white list has priority over black list.

Linear Supertypes
CheckLicense, WhiteAndBlackListParams, HasSimpleAnnotate[DocumentFiltererByNER], AnnotatorModel[DocumentFiltererByNER], CanBeLazy, RawAnnotator[DocumentFiltererByNER], HasOutputAnnotationCol, HasInputAnnotationCols, HasOutputAnnotatorType, ParamsAndFeaturesWritable, HasFeatures, DefaultParamsWritable, MLWritable, Model[DocumentFiltererByNER], Transformer, PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. DocumentFiltererByNER
  2. CheckLicense
  3. WhiteAndBlackListParams
  4. HasSimpleAnnotate
  5. AnnotatorModel
  6. CanBeLazy
  7. RawAnnotator
  8. HasOutputAnnotationCol
  9. HasInputAnnotationCols
  10. HasOutputAnnotatorType
  11. ParamsAndFeaturesWritable
  12. HasFeatures
  13. DefaultParamsWritable
  14. MLWritable
  15. Model
  16. Transformer
  17. PipelineStage
  18. Logging
  19. Params
  20. Serializable
  21. Serializable
  22. Identifiable
  23. AnyRef
  24. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DocumentFiltererByNER()
  2. new DocumentFiltererByNER(uid: String)

Type Members

  1. type AnnotationContent = Seq[Row]
    Attributes
    protected
    Definition Classes
    AnnotatorModel
  2. type AnnotatorType = String
    Definition Classes
    HasOutputAnnotatorType

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T
    Attributes
    protected
    Definition Classes
    Params
  4. def $$[T](feature: StructFeature[T]): T
    Attributes
    protected
    Definition Classes
    HasFeatures
  5. def $$[K, V](feature: MapFeature[K, V]): Map[K, V]
    Attributes
    protected
    Definition Classes
    HasFeatures
  6. def $$[T](feature: SetFeature[T]): Set[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  7. def $$[T](feature: ArrayFeature[T]): Array[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  8. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  9. def _transform(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): DataFrame
    Attributes
    protected
    Definition Classes
    AnnotatorModel
  10. def afterAnnotate(dataset: DataFrame): DataFrame
    Attributes
    protected
    Definition Classes
    AnnotatorModel
  11. def annotate(annotations: Seq[Annotation]): Seq[Annotation]
    Definition Classes
    DocumentFiltererByNER → HasSimpleAnnotate
  12. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  13. def beforeAnnotate(dataset: Dataset[_]): Dataset[_]
    Attributes
    protected
    Definition Classes
    AnnotatorModel
  14. val blackList: StringArrayParam

    If defined, list of entities to ignore.

    If defined, list of entities to ignore. The rest will be processed. Should not include IOB prefix on labels. Default: Array()

    Definition Classes
    WhiteAndBlackListParams
  15. val caseSensitive: BooleanParam

    Determines whether the definitions of the white listed and black listed entities are case sensitive or not.

    Determines whether the definitions of the white listed and black listed entities are case sensitive or not. Default: true

    Definition Classes
    WhiteAndBlackListParams
  16. final def checkSchema(schema: StructType, inputAnnotatorType: String): Boolean
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  17. def checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit
    Definition Classes
    CheckLicense
  18. def checkValidScope(scope: String): Unit
    Definition Classes
    CheckLicense
  19. def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  20. def checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  21. final def clear(param: Param[_]): DocumentFiltererByNER.this.type
    Definition Classes
    Params
  22. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  23. def copy(extra: ParamMap): DocumentFiltererByNER
    Definition Classes
    RawAnnotator → Model → Transformer → PipelineStage → Params
  24. def copyValues[T <: Params](to: T, extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  25. final def defaultCopy[T <: Params](extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  26. def dfAnnotate: UserDefinedFunction
    Definition Classes
    HasSimpleAnnotate
  27. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  28. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  29. def evaluateFilter(filter: String): Boolean

    Filter annotations by blackList and whiteList, taking into account the caseSensitive param.

    Filter annotations by blackList and whiteList, taking into account the caseSensitive param.

    Attributes
    protected
    Definition Classes
    WhiteAndBlackListParams
  30. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  31. def explainParams(): String
    Definition Classes
    Params
  32. def extraValidate(structType: StructType): Boolean
    Attributes
    protected
    Definition Classes
    RawAnnotator
  33. def extraValidateMsg: String
    Attributes
    protected
    Definition Classes
    RawAnnotator
  34. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  35. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  36. val features: ArrayBuffer[Feature[_, _, _]]
    Definition Classes
    HasFeatures
  37. def filterByEntityField(annotation: Annotation): Boolean

    Filter annotation by blackList and whiteList, taking into account the caseSensitive param.

    Filter annotation by blackList and whiteList, taking into account the caseSensitive param. It filters by annotation.metadata.getOrElse("entity", annotation.metadata.getOrElse("identifier", "")).toString

    returns

    Boolean

    Attributes
    protected
    Definition Classes
    WhiteAndBlackListParams
  38. def filterByEntityField(annotations: Seq[Annotation]): Seq[Annotation]

    Filter annotations by blackList and whiteList, taking into account the caseSensitive param.

    Filter annotations by blackList and whiteList, taking into account the caseSensitive param. It filters by annotation.metadata.getOrElse("entity", annotation.metadata.getOrElse("identifier", "")).toString

    Attributes
    protected
    Definition Classes
    WhiteAndBlackListParams
  39. def filterByWhiteAndBlackList(annotation: Annotation): Boolean

    Filter annotation by blackList and whiteList, taking into account the caseSensitive param.

    Filter annotation by blackList and whiteList, taking into account the caseSensitive param. It filters by annotation.result

    returns

    Boolean

    Attributes
    protected
    Definition Classes
    WhiteAndBlackListParams
  40. def filterByWhiteAndBlackList(annotations: Seq[Annotation]): Seq[Annotation]

    Filter annotations by blackList and whiteList, taking into account the caseSensitive param.

    Filter annotations by blackList and whiteList, taking into account the caseSensitive param. It filters by annotation.result

    Attributes
    protected
    Definition Classes
    WhiteAndBlackListParams
  41. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  42. def get[T](feature: StructFeature[T]): Option[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  43. def get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  44. def get[T](feature: SetFeature[T]): Option[Set[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  45. def get[T](feature: ArrayFeature[T]): Option[Array[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  46. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  47. def getBlackList: Array[String]

    Gets blackList param

    Gets blackList param

    Definition Classes
    WhiteAndBlackListParams
  48. def getCaseSensitive: Boolean

    Gets caseSensitive param

    Gets caseSensitive param

    Definition Classes
    WhiteAndBlackListParams
  49. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  50. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  51. def getInputCols: Array[String]
    Definition Classes
    HasInputAnnotationCols
  52. def getJoinString: String

    Get joinString param.

  53. def getLazyAnnotator: Boolean
    Definition Classes
    CanBeLazy
  54. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  55. def getOutputAsDocument: Boolean

    Get whether to return all sentences joined into a single document.

  56. final def getOutputCol: String
    Definition Classes
    HasOutputAnnotationCol
  57. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  58. def getWhiteList: Array[String]

    Gets whiteList param

    Gets whiteList param

    Definition Classes
    WhiteAndBlackListParams
  59. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  60. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  61. def hasParent: Boolean
    Definition Classes
    Model
  62. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  63. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  64. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  65. val inputAnnotatorTypes: Array[String]
    Definition Classes
    DocumentFiltererByNER → HasInputAnnotationCols
  66. final val inputCols: StringArrayParam
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  67. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  68. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  69. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  70. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  71. def isValueInList(value: String, list: Array[String]): Boolean
    Attributes
    protected
    Definition Classes
    WhiteAndBlackListParams
  72. def isWhiteListAndBlacklistEmpty: Boolean
    Attributes
    protected
    Definition Classes
    WhiteAndBlackListParams
  73. val joinString: Param[String]

    This parameter specifies the string that will be inserted between results of annotations when combining them into a single result if outputAsDocument is set to true.

    This parameter specifies the string that will be inserted between results of annotations when combining them into a single result if outputAsDocument is set to true. It acts as a delimiter, ensuring that the elements are properly separated and organized in the final result of annotation. Default: " "

  74. val lazyAnnotator: BooleanParam
    Definition Classes
    CanBeLazy
  75. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  76. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  77. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  78. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  79. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  80. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  81. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  82. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  83. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  84. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  85. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  86. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  87. def msgHelper(schema: StructType): String
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  88. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  89. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  90. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  91. def onWrite(path: String, spark: SparkSession): Unit
    Attributes
    protected
    Definition Classes
    ParamsAndFeaturesWritable
  92. val optionalInputAnnotatorTypes: Array[String]
    Definition Classes
    HasInputAnnotationCols
  93. val outputAnnotatorType: AnnotatorType
    Definition Classes
    DocumentFiltererByNER → HasOutputAnnotatorType
  94. val outputAsDocument: BooleanParam

    Whether to return all sentences joined into a single document.

    Whether to return all sentences joined into a single document. Default: false.

  95. final val outputCol: Param[String]
    Attributes
    protected
    Definition Classes
    HasOutputAnnotationCol
  96. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  97. var parent: Estimator[DocumentFiltererByNER]
    Definition Classes
    Model
  98. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  99. def set[T](feature: StructFeature[T], value: T): DocumentFiltererByNER.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  100. def set[K, V](feature: MapFeature[K, V], value: Map[K, V]): DocumentFiltererByNER.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  101. def set[T](feature: SetFeature[T], value: Set[T]): DocumentFiltererByNER.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  102. def set[T](feature: ArrayFeature[T], value: Array[T]): DocumentFiltererByNER.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  103. final def set(paramPair: ParamPair[_]): DocumentFiltererByNER.this.type
    Attributes
    protected
    Definition Classes
    Params
  104. final def set(param: String, value: Any): DocumentFiltererByNER.this.type
    Attributes
    protected
    Definition Classes
    Params
  105. final def set[T](param: Param[T], value: T): DocumentFiltererByNER.this.type
    Definition Classes
    Params
  106. def setAllowList(list: String*): DocumentFiltererByNER.this.type
    Definition Classes
    WhiteAndBlackListParams
  107. def setAllowList(list: Array[String]): DocumentFiltererByNER.this.type
    Definition Classes
    WhiteAndBlackListParams
  108. def setBlackList(list: String*): DocumentFiltererByNER.this.type
    Definition Classes
    WhiteAndBlackListParams
  109. def setBlackList(list: Array[String]): DocumentFiltererByNER.this.type

    If defined, list of entities to ignore.

    If defined, list of entities to ignore. The rest will be processed. Should not include IOB prefix on labels. Default: Array()

    Definition Classes
    WhiteAndBlackListParams
  110. def setCaseSensitive(value: Boolean): DocumentFiltererByNER.this.type

    Determines whether the definitions of the white listed and black listed entities are case sensitive or not.

    Determines whether the definitions of the white listed and black listed entities are case sensitive or not. Default: true

    Definition Classes
    WhiteAndBlackListParams
  111. def setDefault[T](feature: StructFeature[T], value: () ⇒ T): DocumentFiltererByNER.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  112. def setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): DocumentFiltererByNER.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  113. def setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): DocumentFiltererByNER.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  114. def setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): DocumentFiltererByNER.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  115. final def setDefault(paramPairs: ParamPair[_]*): DocumentFiltererByNER.this.type
    Attributes
    protected
    Definition Classes
    Params
  116. final def setDefault[T](param: Param[T], value: T): DocumentFiltererByNER.this.type
    Attributes
    protected[org.apache.spark.ml]
    Definition Classes
    Params
  117. def setDenyList(list: String*): DocumentFiltererByNER.this.type
    Definition Classes
    WhiteAndBlackListParams
  118. def setDenyList(list: Array[String]): DocumentFiltererByNER.this.type
    Definition Classes
    WhiteAndBlackListParams
  119. final def setInputCols(value: String*): DocumentFiltererByNER.this.type
    Definition Classes
    HasInputAnnotationCols
  120. def setInputCols(value: Array[String]): DocumentFiltererByNER.this.type
    Definition Classes
    HasInputAnnotationCols
  121. def setJoinString(value: String): DocumentFiltererByNER.this.type

    Set the string that will be inserted between results of annotations when combining them into a single result if outputAsDocument is set to true.

    Set the string that will be inserted between results of annotations when combining them into a single result if outputAsDocument is set to true. Default: " "

  122. def setLazyAnnotator(value: Boolean): DocumentFiltererByNER.this.type
    Definition Classes
    CanBeLazy
  123. def setOutputAsDocument(mode: Boolean): DocumentFiltererByNER.this.type

    Set whether to return all sentences joined into a single document.

    Set whether to return all sentences joined into a single document. Default: false.

  124. final def setOutputCol(value: String): DocumentFiltererByNER.this.type
    Definition Classes
    HasOutputAnnotationCol
  125. def setParent(parent: Estimator[DocumentFiltererByNER]): DocumentFiltererByNER
    Definition Classes
    Model
  126. def setWhiteList(list: String*): DocumentFiltererByNER.this.type
    Definition Classes
    WhiteAndBlackListParams
  127. def setWhiteList(list: Array[String]): DocumentFiltererByNER.this.type

    Sets the list of entities to process.

    Sets the list of entities to process. The rest will be ignored. Should not include IOB prefix on labels. Default: Array()

    Definition Classes
    WhiteAndBlackListParams
  128. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  129. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  130. final def transform(dataset: Dataset[_]): DataFrame
    Definition Classes
    AnnotatorModel → Transformer
  131. def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" )
  132. def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" ) @varargs()
  133. final def transformSchema(schema: StructType): StructType
    Definition Classes
    RawAnnotator → PipelineStage
  134. def transformSchema(schema: StructType, logging: Boolean): StructType
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  135. val uid: String
    Definition Classes
    DocumentFiltererByNER → Identifiable
  136. def validate(schema: StructType): Boolean
    Attributes
    protected
    Definition Classes
    RawAnnotator
  137. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  138. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  139. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  140. val whiteList: StringArrayParam

    If defined, list of entities to process.

    If defined, list of entities to process. The rest will be ignored. Should not include IOB prefix on labels. Default: Array()

    Definition Classes
    WhiteAndBlackListParams
  141. def wrapColumnMetadata(col: Column): Column
    Attributes
    protected
    Definition Classes
    RawAnnotator
  142. def write: MLWriter
    Definition Classes
    ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable

Inherited from CheckLicense

Inherited from WhiteAndBlackListParams

Inherited from HasSimpleAnnotate[DocumentFiltererByNER]

Inherited from AnnotatorModel[DocumentFiltererByNER]

Inherited from CanBeLazy

Inherited from RawAnnotator[DocumentFiltererByNER]

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from HasOutputAnnotatorType

Inherited from ParamsAndFeaturesWritable

Inherited from HasFeatures

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from Model[DocumentFiltererByNER]

Inherited from Transformer

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Members

Parameter setters

Parameter getters