sparknlp_jsl.annotator.matcher.text_matcher_params#

Module Contents#

Classes#

TextMatcherParams

It is a base class that contains all the params that are common between TextMatcherInternal,

class TextMatcherParams#

It is a base class that contains all the params that are common between TextMatcherInternal, and TextMatcherInternalModel annotators.

Parameters:
  • enableLemmatizer – Whether to enable lemmatizer, by default False.

  • enableStemmer – Whether to enable stemmer, by default False.

  • stopWords – List of stop words to be removed, by default None.

  • cleanStopWords – Whether to clean stop words, by default False.

  • safeKeywords – Keywords to preserve during stopword removal when cleanStopWords is enabled. Defaults to empty.

  • excludePunctuation – If true, punctuation will be removed from the text. Defaults to true.

  • cleanKeywords – Additional keywords to be removed alongside default stopwords. Defaults to empty.

  • excludeRegexPatterns – Regex patterns used to drop matched chunks. Defaults to empty.

  • returnChunks – Controls whether to return the original text chunks from input or the matched (e.g., stemmed/lemmatized) phrases. Can be ‘original’ or ‘matched’. Defaults to ‘original’.

  • skipMatcherAugmentation – Whether to skip matcher augmentation. Defaults to false.

  • skipSourceTextAugmentation – Whether to skip source text augmentation. Defaults to false.

cleanKeywords#
cleanStopWords#
enableLemmatizer#
enableStemmer#
excludePunctuation#
excludeRegexPatterns#
returnChunks#
safeKeywords#
skipMatcherAugmentation#
skipSourceTextAugmentation#
stopWords#
getCleanKeywords()#

Gets the additional keywords to be removed alongside default stopwords.

getExcludeRegexPatterns()#

Gets the regex patterns used to drop matched chunks.

getReturnChunks()#

Gets whether to return the original text chunks from input or the matched (e.g., stemmed/lemmatized) phrases.

getSafeKeywords()#

Gets the keywords to preserve during stopword removal when cleanStopWords is enabled.

getStopWords()#

Gets the stop words to be removed.

setCleanKeywords(b)#

Sets the additional keywords to be removed alongside default stopwords. Defaults to empty.

Parameters:

b (list) – List of additional keywords to be removed

setCleanStopWords(b)#

Sets whether to clean stop words, by default False.

Parameters:

b (bool) – Whether to clean stop words

setEnableLemmatizer(b)#

Sets whether to enable lemmatizer, by default False.

Parameters:

b (bool) – Whether to enable lemmatizer

setEnableStemmer(b)#

Sets whether to enable stemmer, by default False.

Parameters:

b (bool) – Whether to enable stemmer

setExcludePunctuation(b)#

Sets whether to exclude punctuation, by default True.

Parameters:

b (bool) – Whether to exclude punctuation

setExcludeRegexPatterns(b)#

Sets the regex patterns used to drop matched chunks. Defaults to empty.

Parameters:

b (list) – List of regex patterns

setReturnChunks(b)#

Sets whether to return the original text chunks from input or the matched (e.g., stemmed/lemmatized) phrases. Can be ‘original’ or ‘matched’. Defaults to ‘original’.

Parameters:

b (str) – ‘original’ or ‘matched’

setSafeKeywords(b)#

Sets the keywords to preserve during stopword removal when cleanStopWords is enabled. This will filter out the safe keywords from the stopwords list.

Parameters:

b (list) – List of safe keywords

setSkipMatcherAugmentation(b)#

Sets whether to skip matcher augmentation, by default False.

Parameters:

b (bool) – Whether to skip matcher augmentation

setSkipSourceTextAugmentation(b)#

Sets whether to skip source text augmentation, by default False.

Parameters:

b (bool) – Whether to skip source text augmentation

setStopWords(b)#

Sets the stop words to be removed.

Parameters:

b (list) – List of stop words to be removed