Spark NLP 6.0.4 ScalaDoc - com.johnsnowlabs.nlp.annotators.matcher.TextMatcherInternalParams

final def !=(arg0: Any): Boolean

Definition Classes: AnyRef → Any

final def ##(): Int

Definition Classes: AnyRef → Any

final def $[T](param: Param[T]): T

Attributes: protected
Definition Classes: Params

def $$[T](feature: StructFeature[T]): T

Attributes: protected
Definition Classes: HasFeatures

def $$[K, V](feature: MapFeature[K, V]): Map[K, V]

Attributes: protected
Definition Classes: HasFeatures

def $$[T](feature: SetFeature[T]): Set[T]

Attributes: protected
Definition Classes: HasFeatures

def $$[T](feature: ArrayFeature[T]): Array[T]

Attributes: protected
Definition Classes: HasFeatures

final def ==(arg0: Any): Boolean

Definition Classes: AnyRef → Any

final def asInstanceOf[T0]: T0

Definition Classes: Any

def cartesianTokenVariants(tokens: Seq[Annotation], lemmaDictionary: Map[String, String]): Seq[Seq[String]]

Attributes: protected

val caseSensitive: BooleanParam

Whether to match regardless of case (Default: true)

val cleanKeywords: StringArrayParam

A parameter defining additional keywords to be removed during text processing, in addition to the standard stopwords.

These keywords are appended to the default stopwords list and will be excluded from the text when cleanStopWords is enabled.

By default, this parameter is an empty array, meaning no additional keywords are filtered unless specified.

val cleanStopWords: BooleanParam

Parameter indicating whether to clean stop words during text processing.

Parameter indicating whether to clean stop words during text processing. Defaults to true.

final def clear(param: Param[_]): TextMatcherInternalParams.this.type

Definition Classes: Params

def clone(): AnyRef

Attributes: protected[lang]
Definition Classes: AnyRef
Annotations: @throws( ... ) @native()

def copyValues[T <: Params](to: T, extra: ParamMap): T

Attributes: protected
Definition Classes: Params

final def defaultCopy[T <: Params](extra: ParamMap): T

Attributes: protected
Definition Classes: Params

val enableLemmatizer: BooleanParam

A Boolean parameter that controls whether lemmatization should be applied during text processing.

Lemmatization is the process of reducing words to their base or dictionary form (lemma). When this parameter is set to true: - The incoming tokens (words from the input text) are lemmatized. - The predefined entities (the terms you want to match against) are also lemmatized.

This allows for more flexible and accurate matching. For example, words like "running", "ran", or "runs" will all be reduced to "run", and can match consistently even if the exact form in the text differs.

Default value is false, meaning lemmatization is disabled unless explicitly turned on.

val enableStemmer: BooleanParam

A Boolean parameter that controls whether stemming should be applied during text processing.

Stemming reduces words to their root forms (e.g., "running", "runs", and "runner" → "run"). This can help match different word forms more effectively in tasks such as keyword matching and entity recognition.

When this parameter is set to true, stemming is applied in addition to the original form: - Input tokens are matched both in their original and stemmed forms. - Target entities can also be matched using their stemmed forms.

This does not replace original matching — it complements it. Matching is performed using both the original and processed (stemmed) versions to improve recall and flexibility.

Default value is false.

final def eq(arg0: AnyRef): Boolean

Definition Classes: AnyRef

def equals(arg0: Any): Boolean

Definition Classes: AnyRef → Any

val excludePunctuation: BooleanParam

A parameter indicating whether punctuation marks should be removed during text processing.

When set to true, most punctuation characters will be excluded from the processed text. This is typically used to clean text by removing non-word characters.

Defaults to true, meaning punctuation is removed unless explicitly disabled. Some characters may be preserved if specifically handled by other parameters (e.g., safe keywords).

val excludeRegexPatterns: StringArrayParam

A parameter specifying regular expression patterns used to exclude matching chunks during text processing.

Each string in this array is a regex pattern. If a detected chunk matches any of these patterns, it will be discarded and excluded from the final output.

This is useful for removing unwanted matches based on pattern rules (e.g., specific codes, formats, or noise). By default, this parameter is empty, meaning no chunks are dropped based on regex.

def explainParam(param: Param[_]): String

Definition Classes: Params

def explainParams(): String

Definition Classes: Params

final def extractParamMap(): ParamMap

Definition Classes: Params

final def extractParamMap(extra: ParamMap): ParamMap

Definition Classes: Params

val features: ArrayBuffer[Feature[_, _, _]]

Definition Classes: HasFeatures

def finalize(): Unit

Attributes: protected[lang]
Definition Classes: AnyRef
Annotations: @throws( classOf[java.lang.Throwable] )

def get[T](feature: StructFeature[T]): Option[T]

Attributes: protected
Definition Classes: HasFeatures

def get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]

Attributes: protected
Definition Classes: HasFeatures

def get[T](feature: SetFeature[T]): Option[Set[T]]

Attributes: protected
Definition Classes: HasFeatures

def get[T](feature: ArrayFeature[T]): Option[Array[T]]

Attributes: protected
Definition Classes: HasFeatures

final def get[T](param: Param[T]): Option[T]

Definition Classes: Params

final def getClass(): Class[_]

Definition Classes: AnyRef → Any
Annotations: @native()

def getCleanKeywords: Array[String]

Retrieves the list of keywords to be filtered out.

returns: an array of strings representing the keywords.

def getCleanStopWords: Boolean

Retrieves the current state of the cleanStopWords parameter.

returns: true if the cleanStopWords option is enabled, false otherwise.

final def getDefault[T](param: Param[T]): Option[T]

Definition Classes: Params

def getEnableLemmatizer: Boolean

Gets the current state of the lemmatizer enablement setting.

returns: true if the lemmatizer is enabled, false otherwise.

def getEnableStemmer: Boolean

Retrieves the current value of the enableStemmer parameter.

returns: true if stemming is enabled, false otherwise

def getExcludeRegexPattern: Array[String]

Retrieves the list of regex patterns used to exclude specific text matches during processing.

returns: an array of strings representing the regex patterns to be excluded.

final def getOrDefault[T](param: Param[T]): T

Definition Classes: Params

def getParam(paramName: String): Param[Any]

Definition Classes: Params

def getReturnChunks: String

Retrieves the current value of the returnChunks parameter.

returns: A string representing the configured value for the returnChunks setting.

def getSafeKeywords: Array[String]

Retrieves the list of keywords to be filtered out.

returns: an array of strings representing the keywords.

def getSkipMatcherAugmentation: Boolean

Gets whether augmentation for matcher patterns is skipped.

returns: true if augmentation for matcher patterns is skipped, false otherwise.

def getSkipSourceTextAugmentation: Boolean

Gets whether augmentation for source text is skipped.

returns: true if augmentation for source text is skipped, false otherwise.

def getStopWords: Array[String]

Retrieves the list of stop words used within the text matching process.

returns: an array of strings representing the stop words.

def getTokenVariants(token: Annotation, lemmaDictionary: Map[String, String]): Seq[String]

Attributes: protected

final def hasDefault[T](param: Param[T]): Boolean

Definition Classes: Params

def hasParam(paramName: String): Boolean

Definition Classes: Params

def hashCode(): Int

Definition Classes: AnyRef → Any
Annotations: @native()

final def isDefined(param: Param[_]): Boolean

Definition Classes: Params

final def isInstanceOf[T0]: Boolean

Definition Classes: Any

final def isSet(param: Param[_]): Boolean

Definition Classes: Params

val lemmaDict: MapFeature[String, String]

lemmaDict

final def ne(arg0: AnyRef): Boolean

Definition Classes: AnyRef

final def notify(): Unit

Definition Classes: AnyRef
Annotations: @native()

final def notifyAll(): Unit

Definition Classes: AnyRef
Annotations: @native()

lazy val params: Array[Param[_]]

Definition Classes: Params

val returnChunks: Param[String]

A string parameter that defines which version of the matched chunks should be returned: "original" or "matched".

- If set to "original" (default): the returned chunks reflect the exact text spans as they appeared in the original input. This ensures that the begin and end character indices accurately map to the source text.

- If set to "matched": the returned chunks are based on the processed form that triggered the match, such as a stemmed or lemmatized version of the phrase. This can be useful to see which normalized entity was matched, but the character indices (begin, end) may not align correctly with the original input text.

Use "original" if accurate text positioning is important (e.g., for highlighting), and "matched" if you want to inspect the normalized form used for the match.

val safeKeywords: StringArrayParam

A parameter representing an array of keywords that should be preserved during text cleaning, when stopword removal (cleanStopWords) is enabled.

When cleanStopWords is set to true, common stopwords are typically removed from the text. However, keywords specified in safeKeywords will be exempt from removal and retained in the processed text.

By default, this parameter is an empty array, meaning no exceptions are made unless explicitly provided.

lazy val safeLemmaDict: Map[String, String]

def set[T](feature: StructFeature[T], value: T): TextMatcherInternalParams.this.type

Attributes: protected
Definition Classes: HasFeatures

def set[K, V](feature: MapFeature[K, V], value: Map[K, V]): TextMatcherInternalParams.this.type

Attributes: protected
Definition Classes: HasFeatures

def set[T](feature: SetFeature[T], value: Set[T]): TextMatcherInternalParams.this.type

Attributes: protected
Definition Classes: HasFeatures

def set[T](feature: ArrayFeature[T], value: Array[T]): TextMatcherInternalParams.this.type

Attributes: protected
Definition Classes: HasFeatures

final def set(paramPair: ParamPair[_]): TextMatcherInternalParams.this.type

Attributes: protected
Definition Classes: Params

final def set(param: String, value: Any): TextMatcherInternalParams.this.type

Attributes: protected
Definition Classes: Params

final def set[T](param: Param[T], value: T): TextMatcherInternalParams.this.type

Definition Classes: Params

def setCleanKeywords(value: ArrayList[String]): TextMatcherInternalParams.this.type

def setCleanKeywords(values: Array[String]): TextMatcherInternalParams.this.type

Sets the list of keywords to be cleaned during text processing.

returns: This instance with the updated configuration for cleaning keywords.

def setCleanStopWords(v: Boolean): TextMatcherInternalParams.this.type

Sets whether to clean stop words during text processing.

v: Boolean value indicating whether to enable (true) or disable (false) the cleaning of stop words.
returns: This instance with the updated configuration for cleaning stop words.

def setDefault[T](feature: StructFeature[T], value: () ⇒ T): TextMatcherInternalParams.this.type

Attributes: protected
Definition Classes: HasFeatures

def setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): TextMatcherInternalParams.this.type

Attributes: protected
Definition Classes: HasFeatures

def setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): TextMatcherInternalParams.this.type

Attributes: protected
Definition Classes: HasFeatures

def setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): TextMatcherInternalParams.this.type

Attributes: protected
Definition Classes: HasFeatures

final def setDefault(paramPairs: ParamPair[_]*): TextMatcherInternalParams.this.type

Attributes: protected
Definition Classes: Params

final def setDefault[T](param: Param[T], value: T): TextMatcherInternalParams.this.type

Attributes: protected[org.apache.spark.ml]
Definition Classes: Params

def setEnableLemmatizer(value: Boolean): TextMatcherInternalParams.this.type

Enables or disables the lemmatizer for text matching.

value: If true, the lemmatizer will be enabled; if false, it will be disabled.
returns: This TextMatcherInternal instance with the updated lemmatizer setting.

def setEnableStemmer(value: Boolean): TextMatcherInternalParams.this.type

Enables or disables the use of a stemmer for text processing.

value: Boolean value indicating whether to enable (true) or disable (false) the stemmer.
returns: Instance of this class with updated configuration.

def setExcludePunctuation(v: Boolean): TextMatcherInternalParams.this.type

Sets the value of the excludePunctuation parameter used for text processing.

v: A boolean value indicating whether to exclude punctuation.
returns: This instance with the updated excludePunctuation configuration.

def setExcludeRegexPatterns(v: Array[String]): TextMatcherInternalParams.this.type

Sets the regular expression patterns for excluding specific elements during text processing.

v: Array of strings where each string represents a regular expression pattern to be used for excluding matching text elements.
returns: This instance with the updated configuration for exclude regex patterns.

def setLemmaDict(value: Map[String, String]): TextMatcherInternalParams.this.type

Sets the internal dictionary used for lemmatization.

value: a map where keys are words and values are their corresponding lemmas.
returns: this

def setReturnChunks(v: String): TextMatcherInternalParams.this.type

Sets the value of the returnChunks parameter used for text processing.

v: A string value that specifies the configuration for returning chunks.
returns: This instance with the updated returnChunks configuration.

def setSafeKeywords(value: ArrayList[String]): TextMatcherInternalParams.this.type

def setSafeKeywords(v: Array[String]): TextMatcherInternalParams.this.type

Sets the list of safe keywords to be used in text processing.

v: Array of strings representing the safe keywords.
returns: This instance with the updated configuration for safe keywords.

def setSkipMatcherAugmentation(value: Boolean): TextMatcherInternalParams.this.type

Sets whether to skip augmentation for matcher patterns.

value: If true, matcher patterns won't be augmented with lemmatization, stemming, etc. If false, matcher patterns will be augmented if the corresponding features are enabled.
returns: This instance with the updated configuration.

def setSkipSourceTextAugmentation(value: Boolean): TextMatcherInternalParams.this.type

Sets whether to skip augmentation for source text.

value: If true, source text won't be augmented with lemmatization, stemming, etc. If false, source text will be augmented if the corresponding features are enabled.
returns: This instance with the updated configuration.

def setStopWords(value: ArrayList[String]): TextMatcherInternalParams.this.type

def setStopWords(v: Array[String]): TextMatcherInternalParams.this.type

Sets the list of stop words to be used in text processing.

v: Array of strings representing the stop words.
returns: This instance with the updated stop words setting.

val skipMatcherAugmentation: BooleanParam

A Boolean parameter that controls whether to skip augmentation (lemmatization, stemming, etc.) for matcher patterns.

When set to true, the matcher patterns won't be augmented with lemmatization, stemming, stopword removal, etc., even if those features are enabled. This applies only to entities/patterns being matched, not the source text.

Default value is false, meaning matcher patterns will be augmented if the corresponding features are enabled.

val skipSourceTextAugmentation: BooleanParam

A Boolean parameter that controls whether to skip augmentation (lemmatization, stemming, etc.) for the source text.

When set to true, the source text won't be augmented with lemmatization, stemming, stopword removal, etc., even if those features are enabled. This applies only to the source text being analyzed, not the matcher patterns.

Default value is false, meaning source text will be augmented if the corresponding features are enabled.

val stopWords: StringArrayParam

A parameter representing the list of stop words to be filtered out during text processing.

By default, it is set to the English stop words provided by Spark ML.

final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes: AnyRef

def toString(): String

Definition Classes: Identifiable → AnyRef → Any

final def wait(): Unit

Definition Classes: AnyRef
Annotations: @throws( ... )

final def wait(arg0: Long, arg1: Int): Unit

Definition Classes: AnyRef
Annotations: @throws( ... )

final def wait(arg0: Long): Unit

Definition Classes: AnyRef
Annotations: @throws( ... ) @native()

Packages

TextMatcherInternalParams

trait TextMatcherInternalParams extends Params with HasFeatures

Abstract Value Members

Concrete Value Members

Inherited from HasFeatures

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Parameter setters

Parameter getters

Members

Packages

TextMatcherInternalParams 

trait TextMatcherInternalParams extends Params with HasFeatures

Abstract Value Members

Concrete Value Members

Inherited from HasFeatures

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Parameter setters

Parameter getters

Members

TextMatcherInternalParams