`sparknlp_jsl.annotator.matcher.text_matcher_params`#

Module Contents#

Classes#

TextMatcherParams

It is a base class that contains all the params that are common between TextMatcherInternal,

class TextMatcherParams#

It is a base class that contains all the params that are common between TextMatcherInternal, and TextMatcherInternalModel annotators.

Parameters:

enableLemmatizer – Whether to enable lemmatizer, by default False.
enableStemmer – Whether to enable stemmer, by default False.
stopWords – List of stop words to be removed, by default None.
cleanStopWords – Whether to clean stop words, by default False.
safeKeywords – Keywords to preserve during stopword removal when cleanStopWords is enabled. Defaults to empty.
excludePunctuation – If true, punctuation will be removed from the text. Defaults to true.
cleanKeywords – Additional keywords to be removed alongside default stopwords. Defaults to empty.
excludeRegexPatterns – Regex patterns used to drop matched chunks. Defaults to empty.
returnChunks – Controls whether to return the original text chunks from input or the matched (e.g., stemmed/lemmatized) phrases. Can be ‘original’ or ‘matched’. Defaults to ‘original’.
skipMatcherAugmentation – Whether to skip matcher augmentation. Defaults to false.
skipSourceTextAugmentation – Whether to skip source text augmentation. Defaults to false.

cleanKeywords#

cleanStopWords#

enableLemmatizer#

enableStemmer#

excludePunctuation#

excludeRegexPatterns#

returnChunks#

safeKeywords#

skipMatcherAugmentation#

skipSourceTextAugmentation#

stopWords#

getCleanKeywords()#: Gets the additional keywords to be removed alongside default stopwords.

getExcludeRegexPatterns()#: Gets the regex patterns used to drop matched chunks.

getReturnChunks()#: Gets whether to return the original text chunks from input or the matched (e.g., stemmed/lemmatized) phrases.

getSafeKeywords()#: Gets the keywords to preserve during stopword removal when cleanStopWords is enabled.

getStopWords()#: Gets the stop words to be removed.

setCleanKeywords(b)#

Sets the additional keywords to be removed alongside default stopwords. Defaults to empty.

Parameters:: b (list) – List of additional keywords to be removed

setCleanStopWords(b)#

Sets whether to clean stop words, by default False.

Parameters:: b (bool) – Whether to clean stop words

setEnableLemmatizer(b)#

Sets whether to enable lemmatizer, by default False.

Parameters:: b (bool) – Whether to enable lemmatizer

setEnableStemmer(b)#

Sets whether to enable stemmer, by default False.

Parameters:: b (bool) – Whether to enable stemmer

setExcludePunctuation(b)#

Sets whether to exclude punctuation, by default True.

Parameters:: b (bool) – Whether to exclude punctuation

setExcludeRegexPatterns(b)#

Sets the regex patterns used to drop matched chunks. Defaults to empty.

Parameters:: b (list) – List of regex patterns

setReturnChunks(b)#

Sets whether to return the original text chunks from input or the matched (e.g., stemmed/lemmatized) phrases. Can be ‘original’ or ‘matched’. Defaults to ‘original’.

Parameters:: b (str) – ‘original’ or ‘matched’

setSafeKeywords(b)#

Sets the keywords to preserve during stopword removal when cleanStopWords is enabled. This will filter out the safe keywords from the stopwords list.

Parameters:: b (list) – List of safe keywords

setSkipMatcherAugmentation(b)#

Sets whether to skip matcher augmentation, by default False.

Parameters:: b (bool) – Whether to skip matcher augmentation

setSkipSourceTextAugmentation(b)#

Sets whether to skip source text augmentation, by default False.

Parameters:: b (bool) – Whether to skip source text augmentation

setStopWords(b)#

Sets the stop words to be removed.

Parameters:: b (list) – List of stop words to be removed

sparknlp_jsl.annotator.matcher.text_matcher_params#

Module Contents#

Classes#

`sparknlp_jsl.annotator.matcher.text_matcher_params`#