VectorDBPostProcessor

Companion object VectorDBPostProcessor

class VectorDBPostProcessor extends AnnotatorModel[VectorDBPostProcessor] with HasSimpleAnnotate[VectorDBPostProcessor] with CheckLicense with HasFeatures

VectorDBPostProcessor is used to filter and sort the annotations from the com.johnsnowlabs.nlp.annotators.resolution.VectorDBModel.

Linear Supertypes

CheckLicense, HasSimpleAnnotate[VectorDBPostProcessor], AnnotatorModel[VectorDBPostProcessor], CanBeLazy, RawAnnotator[VectorDBPostProcessor], HasOutputAnnotationCol, HasInputAnnotationCols, HasOutputAnnotatorType, ParamsAndFeaturesWritable, HasFeatures, DefaultParamsWritable, MLWritable, Model[VectorDBPostProcessor], Transformer, PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any

Ordering

Grouped
Alphabetic
By Inheritance

Inherited

VectorDBPostProcessor
CheckLicense
HasSimpleAnnotate
AnnotatorModel
CanBeLazy
RawAnnotator
HasOutputAnnotationCol
HasInputAnnotationCols
HasOutputAnnotatorType
ParamsAndFeaturesWritable
HasFeatures
DefaultParamsWritable
MLWritable
Model
Transformer
PipelineStage
Logging
Params
Serializable
Serializable
Identifiable
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Instance Constructors

new VectorDBPostProcessor()
new VectorDBPostProcessor(uid: String)
uid
a unique identifier for the instantiated AnnotatorModel

Type Members

type AnnotationContent = Seq[Row]

Attributes
protected
Definition Classes
AnnotatorModel
type AnnotatorType = String

Definition Classes
HasOutputAnnotatorType

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def $[T](param: Param[T]): T

Attributes
protected
Definition Classes
Params
def $$[T](feature: StructFeature[T]): T

Attributes
protected
Definition Classes
HasFeatures
def $$[K, V](feature: MapFeature[K, V]): Map[K, V]

Attributes
protected
Definition Classes
HasFeatures
def $$[T](feature: SetFeature[T]): Set[T]

Attributes
protected
Definition Classes
HasFeatures
def $$[T](feature: ArrayFeature[T]): Array[T]

Attributes
protected
Definition Classes
HasFeatures
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def _transform(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): DataFrame

Attributes
protected
Definition Classes
AnnotatorModel
def afterAnnotate(dataset: DataFrame): DataFrame

Attributes
protected
Definition Classes
AnnotatorModel
val allowZeroContentAfterFiltering: BooleanParam
Whether to allow zero annotation after filtering.
Whether to allow zero annotation after filtering. If set to true, the output may contain zero annotation if all annotations are filtered out. If set to false, The output is tried to contain at least one annotation. Default: false
final def annotate(annotations: Seq[Annotation]): Seq[Annotation]

Definition Classes
VectorDBPostProcessor → HasSimpleAnnotate
final def asInstanceOf[T0]: T0

Definition Classes
Any
def beforeAnnotate(dataset: Dataset[_]): Dataset[_]

Attributes
protected
Definition Classes
AnnotatorModel
val caseSensitive: BooleanParam
Whether the criteria of the string operators are case sensitive or not.
Whether the criteria of the string operators are case sensitive or not. For example, if set to false, the operator "equals" will match "John" with "john". Default: false
final def checkSchema(schema: StructType, inputAnnotatorType: String): Boolean

Attributes
protected
Definition Classes
HasInputAnnotationCols
def checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit

Definition Classes
CheckLicense
def checkValidScope(scope: String): Unit

Definition Classes
CheckLicense
def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit

Definition Classes
CheckLicense
def checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit

Definition Classes
CheckLicense
final def clear(param: Param[_]): VectorDBPostProcessor.this.type

Definition Classes
Params
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
def copy(extra: ParamMap): VectorDBPostProcessor

Definition Classes
RawAnnotator → Model → Transformer → PipelineStage → Params
def copyValues[T <: Params](to: T, extra: ParamMap): T

Attributes
protected
Definition Classes
Params
final def defaultCopy[T <: Params](extra: ParamMap): T

Attributes
protected
Definition Classes
Params
def dfAnnotate: UserDefinedFunction

Definition Classes
HasSimpleAnnotate
val diversityThreshold: FloatParam
The diversityThreshold parameter is used to set the threshold for the diversityByThreshold filter.
The diversityThreshold parameter is used to set the threshold for the diversityByThreshold filter. The diversityByThreshold filter selects the annotations by the distance between the sorted annotations. The diversityThreshold parameter must be greater than 0. Default: 0.01f
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def explainParam(param: Param[_]): String

Definition Classes
Params
def explainParams(): String

Definition Classes
Params
def extraValidate(structType: StructType): Boolean

Attributes
protected
Definition Classes
RawAnnotator
def extraValidateMsg: String

Attributes
protected
Definition Classes
RawAnnotator
final def extractParamMap(): ParamMap

Definition Classes
Params
final def extractParamMap(extra: ParamMap): ParamMap

Definition Classes
Params
val features: ArrayBuffer[Feature[_, _, _]]

Definition Classes
HasFeatures
val filterBy: Param[String]
The filterBy parameter is used to select and prioritize filter options.
The filterBy parameter is used to select and prioritize filter options. Options: "metadata", "diversity_by_threshold". Options can be given as a comma separated string like "metadata, diversity_by_threshold". The order of the options will be used to filter the annotations. "metadata" - Filter by metadata fields. The metadataCriteria parameter should be set. "diversity_by_threshold" - Filter by diversity threshold. Filter by the distance between the sorted annotations. diversityThreshold parameter is used to set the threshold. Default: "metadata"
def filterByDate(metadataStrValue: String, criteria: MetadataCriteria, value: String): Boolean
def filterByDiversityThreshold(annotations: ListBuffer[Annotation]): ListBuffer[Annotation]
def filterByFloat(metadataStrValue: String, criteria: MetadataCriteria, value: String): Boolean
def filterByInt(metadataStrValue: String, criteria: MetadataCriteria, value: String): Boolean
def filterByString(metadataStrValue: String, criteria: MetadataCriteria, value: String): Boolean
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def get[T](feature: StructFeature[T]): Option[T]

Attributes
protected
Definition Classes
HasFeatures
def get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]

Attributes
protected
Definition Classes
HasFeatures
def get[T](feature: SetFeature[T]): Option[Set[T]]

Attributes
protected
Definition Classes
HasFeatures
def get[T](feature: ArrayFeature[T]): Option[Array[T]]

Attributes
protected
Definition Classes
HasFeatures
final def get[T](param: Param[T]): Option[T]

Definition Classes
Params
def getAllowZeroContentAfterFiltering: Boolean
Get allowZeroContentAfterFiltering param
def getCaseSensitive: Boolean
Get caseSensitive param
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
final def getDefault[T](param: Param[T]): Option[T]

Definition Classes
Params
def getDiversityThreshold: Float
Get diversityThreshold param
def getFilterBy: String
Get filterBy param
def getInputCols: Array[String]

Definition Classes
HasInputAnnotationCols
def getLazyAnnotator: Boolean

Definition Classes
CanBeLazy
def getMaxTopKAfterFiltering: Int
Get maxTopKAfterFiltering param
def getMetadataCriteria: Array[MetadataCriteria]
Get metadataCriteria param
final def getOrDefault[T](param: Param[T]): T

Definition Classes
Params
final def getOutputCol: String

Definition Classes
HasOutputAnnotationCol
def getParam(paramName: String): Param[Any]

Definition Classes
Params
def getSortBy: String
Get sortBy param
final def hasDefault[T](param: Param[T]): Boolean

Definition Classes
Params
def hasParam(paramName: String): Boolean

Definition Classes
Params
def hasParent: Boolean

Definition Classes
Model
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

Attributes
protected
Definition Classes
Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes
protected
Definition Classes
Logging
val inputAnnotatorTypes: Array[String]
Input annotator types: VECTOR_SIMILARITY_RANKINGS
Input annotator types: VECTOR_SIMILARITY_RANKINGS

Definition Classes
VectorDBPostProcessor → HasInputAnnotationCols
final val inputCols: StringArrayParam

Attributes
protected
Definition Classes
HasInputAnnotationCols
final def isDefined(param: Param[_]): Boolean

Definition Classes
Params
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def isSet(param: Param[_]): Boolean

Definition Classes
Params
def isTraceEnabled(): Boolean

Attributes
protected
Definition Classes
Logging
val lazyAnnotator: BooleanParam

Definition Classes
CanBeLazy
def log: Logger

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logName: String

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
val maxTopKAfterFiltering: IntParam
The maxTopKAfterFiltering parameter is used to set the maximum number of annotations to return after filtering.
The maxTopKAfterFiltering parameter is used to set the maximum number of annotations to return after filtering. If the number of annotations after filtering is greater than maxTopKAfterFiltering, the top maxTopKAfterFiltering annotations are selected. maxTopKAfterFiltering must be greater than 0. Default: 20
val metadataCriteria: StructFeature[Array[MetadataCriteria]]
The metadataCriteria parameter is used to filter the annotations by metadata fields.
The metadataCriteria parameter is used to filter the annotations by metadata fields. The metadataCriteria is an array of MetadataCriteria. Default: Array.empty
def msgHelper(schema: StructType): String

Attributes
protected
Definition Classes
HasInputAnnotationCols
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
def onWrite(path: String, spark: SparkSession): Unit

Attributes
protected
Definition Classes
ParamsAndFeaturesWritable
val optionalInputAnnotatorTypes: Array[String]

Definition Classes
HasInputAnnotationCols
val outputAnnotatorType: String
outputAnnotatorType: VECTOR_SIMILARITY_RANKINGS
outputAnnotatorType: VECTOR_SIMILARITY_RANKINGS

Definition Classes
VectorDBPostProcessor → HasOutputAnnotatorType
final val outputCol: Param[String]

Attributes
protected
Definition Classes
HasOutputAnnotationCol
lazy val params: Array[Param[_]]

Definition Classes
Params
var parent: Estimator[VectorDBPostProcessor]

Definition Classes
Model
def save(path: String): Unit

Definition Classes
MLWritable
Annotations
@Since( "1.6.0" ) @throws( ... )
def set[T](feature: StructFeature[T], value: T): VectorDBPostProcessor.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[K, V](feature: MapFeature[K, V], value: Map[K, V]): VectorDBPostProcessor.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[T](feature: SetFeature[T], value: Set[T]): VectorDBPostProcessor.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[T](feature: ArrayFeature[T], value: Array[T]): VectorDBPostProcessor.this.type

Attributes
protected
Definition Classes
HasFeatures
final def set(paramPair: ParamPair[_]): VectorDBPostProcessor.this.type

Attributes
protected
Definition Classes
Params
final def set(param: String, value: Any): VectorDBPostProcessor.this.type

Attributes
protected
Definition Classes
Params
final def set[T](param: Param[T], value: T): VectorDBPostProcessor.this.type

Definition Classes
Params
def setAllowZeroContentAfterFiltering(value: Boolean): VectorDBPostProcessor.this.type
Set the allowZeroContentAfterFiltering parameter.
Set the allowZeroContentAfterFiltering parameter. If set to true, the output may contain zero annotation if all annotations are filtered out. If set to false, The output is tried to contain at least one annotation. Default: false
def setCaseSensitive(value: Boolean): VectorDBPostProcessor.this.type
Set whether the criteria of the string operators are case sensitive or not.
Set whether the criteria of the string operators are case sensitive or not. For example, if set to false, the operator "equals" will match "John" with "john". Default: false
def setDefault[T](feature: StructFeature[T], value: () ⇒ T): VectorDBPostProcessor.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): VectorDBPostProcessor.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): VectorDBPostProcessor.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): VectorDBPostProcessor.this.type

Attributes
protected
Definition Classes
HasFeatures
final def setDefault(paramPairs: ParamPair[_]*): VectorDBPostProcessor.this.type

Attributes
protected
Definition Classes
Params
final def setDefault[T](param: Param[T], value: T): VectorDBPostProcessor.this.type

Attributes
protected[org.apache.spark.ml]
Definition Classes
Params
def setDiversityThreshold(value: Float): VectorDBPostProcessor.this.type
Set the diversityThreshold parameter.
Set the diversityThreshold parameter. The diversityByThreshold filter selects the annotations by the distance between the sorted annotations. maxTopKAfterFiltering must be greater than 0. Default: 0.01f
def setFilterBy(value: String): VectorDBPostProcessor.this.type
Set the filterBy parameter.
Set the filterBy parameter. Options: "metadata", "diversity_by_threshold". Options can be given as a comma separated string like "metadata, diversity_by_threshold". The order of the options will be used to filter the annotations. "metadata" - Filter by metadata fields. The metadataCriteria parameter should be set. "diversity_by_threshold" - Filter by diversity threshold. Filter by the distance between the sorted annotations. diversityThreshold parameter is used to set the threshold. Default: "metadata"
final def setInputCols(value: String*): VectorDBPostProcessor.this.type

Definition Classes
HasInputAnnotationCols
def setInputCols(value: Array[String]): VectorDBPostProcessor.this.type

Definition Classes
HasInputAnnotationCols
def setLazyAnnotator(value: Boolean): VectorDBPostProcessor.this.type

Definition Classes
CanBeLazy
def setMaxTopKAfterFiltering(value: Int): VectorDBPostProcessor.this.type
Set the maxTopKAfterFiltering parameter.
Set the maxTopKAfterFiltering parameter. If the number of annotations after filtering is greater than maxTopKAfterFiltering, the top maxTopKAfterFiltering annotations are selected. maxTopKAfterFiltering must be greater than 0. Default: 20
def setMetadataCriteria(value: Array[MetadataCriteria]): VectorDBPostProcessor.this.type
Set the metadataCriteria parameter.
Set the metadataCriteria parameter. The metadataCriteria is an array of MetadataCriteria. Default: Array.empty
def setMetadataCriteriaAsStr(value: String): VectorDBPostProcessor.this.type
Set the metadataCriteria parameter as a string.
Set the metadataCriteria parameter as a string. The metadataCriteria param is a list of dictionaries. A dictionary should contain the following keys:
- field: The field of the metadata to filter.
- fieldType: The type of the field to filter. Options: string, int, float, date.
- operator: The operator to apply to the filter. Options: equals, not_equals, greater_than, greater_than_or_equals, less_than, less_than_or_equals, contains, not_contains, regex.
- value: The value to filter.
- matchMode: The match mode to apply to the filter. Options: any, all, none.
- matchValues: The values to filter.
- dateFormats: The date formats to parse the date metadata field.
- converterFallback: The converter fallback when hitting cast exception. Options: filter, not_filter, error.
Notes:
- field, fieldType, and operator are required. Other keys are optional.
- fieldType is set to string, supported operators are: equals, not_equals, contains, not_contains, regex.
- fieldType is set to int or float or date, supported operators are: equals, not_equals, greater_than, greater_than_or_equals, less_than, less_than_or_equals.
- If matchMode and matchValues are not set, value must be set.
- If value is set, matchMode and matchValues are ignored.
- If fieldType is set to date, dateFormats must be set.
- matchMode and matchValues must be set together.
- If converterFallback is set to error, the filter will throw an error when hitting cast exception. Default 'error'.
final def setOutputCol(value: String): VectorDBPostProcessor.this.type

Definition Classes
HasOutputAnnotationCol
def setParent(parent: Estimator[VectorDBPostProcessor]): VectorDBPostProcessor

Definition Classes
Model
def setSortBy(value: String): VectorDBPostProcessor.this.type
Set the sortBy parameter.
Set the sortBy parameter. Options: "ascending", "descending", "lost_in_the_middle", "diversity" "ascending" - Sort by ascending order of distance. "descending" - Sort by descending order of distance. "lost_in_the_middle" - Sort by lost in the middle ranker. Let's say we have 5 annotations with distances [1, 2, 3, 4, 5]. The lost in the middle ranker will sort them as [1, 3, 5, 4, 2]. "diversity" - Sort by diversity ranker. The annotations are sorted by distance and the first annotation select, and then the next annotation is selected by the maximum average distance from the selected annotations. Default: "ascending"
val sortBy: Param[String]
The sortBy parameter is used to select sorting option.
The sortBy parameter is used to select sorting option. Options: "ascending", "descending", "lost_in_the_middle", "diversity" "ascending" - Sort by ascending order of distance. "descending" - Sort by descending order of distance. "lost_in_the_middle" - Sort by lost in the middle ranker. Let's say we have 5 annotations with distances [1, 2, 3, 4, 5]. The lost in the middle ranker will sort them as [1, 3, 5, 4, 2]. "diversity" - Sort by diversity ranker. The annotations are sorted by distance and the first annotation select, and then the next annotation is selected by the maximum average distance from the selected annotations. Default: "ascending"
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
Identifiable → AnyRef → Any
final def transform(dataset: Dataset[_]): DataFrame

Definition Classes
AnnotatorModel → Transformer
def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame

Definition Classes
Transformer
Annotations
@Since( "2.0.0" )
def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame

Definition Classes
Transformer
Annotations
@Since( "2.0.0" ) @varargs()
final def transformSchema(schema: StructType): StructType

Definition Classes
RawAnnotator → PipelineStage
def transformSchema(schema: StructType, logging: Boolean): StructType

Attributes
protected
Definition Classes
PipelineStage
Annotations
@DeveloperApi()
val uid: String

Definition Classes
VectorDBPostProcessor → Identifiable
def validate(schema: StructType): Boolean

Attributes
protected
Definition Classes
RawAnnotator
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
def wrapColumnMetadata(col: Column): Column

Attributes
protected
Definition Classes
RawAnnotator
def write: MLWriter

Definition Classes
ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable

Packages

VectorDBPostProcessor

Companion object VectorDBPostProcessor

class VectorDBPostProcessor extends AnnotatorModel[VectorDBPostProcessor] with HasSimpleAnnotate[VectorDBPostProcessor] with CheckLicense with HasFeatures

Instance Constructors

Type Members

Value Members

Inherited from CheckLicense

Inherited from HasSimpleAnnotate[VectorDBPostProcessor]

Inherited from AnnotatorModel[VectorDBPostProcessor]

Inherited from CanBeLazy

Inherited from RawAnnotator[VectorDBPostProcessor]

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from HasOutputAnnotatorType

Inherited from ParamsAndFeaturesWritable

Inherited from HasFeatures

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from Model[VectorDBPostProcessor]

Inherited from Transformer

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Annotator types

Members

Parameter setters

Parameter getters

Packages

VectorDBPostProcessor 

Companion object VectorDBPostProcessor

class VectorDBPostProcessor extends AnnotatorModel[VectorDBPostProcessor] with HasSimpleAnnotate[VectorDBPostProcessor] with CheckLicense with HasFeatures

Instance Constructors

Type Members

Value Members

Inherited from CheckLicense

Inherited from HasSimpleAnnotate[VectorDBPostProcessor]

Inherited from AnnotatorModel[VectorDBPostProcessor]

Inherited from CanBeLazy

Inherited from RawAnnotator[VectorDBPostProcessor]

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from HasOutputAnnotatorType

Inherited from ParamsAndFeaturesWritable

Inherited from HasFeatures

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from Model[VectorDBPostProcessor]

Inherited from Transformer

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Annotator types

Members

Parameter setters

Parameter getters

VectorDBPostProcessor