sparknlp_jsl.annotator.matcher.text_matcher_params
#
Module Contents#
Classes#
It is a base class that contains all the params that are common between TextMatcherInternal, |
- class TextMatcherParams#
It is a base class that contains all the params that are common between TextMatcherInternal, and TextMatcherInternalModel annotators.
- Parameters:
enableLemmatizer – Whether to enable lemmatizer, by default False.
enableStemmer – Whether to enable stemmer, by default False.
stopWords – List of stop words to be removed, by default None.
cleanStopWords – Whether to clean stop words, by default False.
safeKeywords – Keywords to preserve during stopword removal when cleanStopWords is enabled. Defaults to empty.
excludePunctuation – If true, punctuation will be removed from the text. Defaults to true.
cleanKeywords – Additional keywords to be removed alongside default stopwords. Defaults to empty.
excludeRegexPatterns – Regex patterns used to drop matched chunks. Defaults to empty.
returnChunks – Controls whether to return the original text chunks from input or the matched (e.g., stemmed/lemmatized) phrases. Can be ‘original’ or ‘matched’. Defaults to ‘original’.
skipMatcherAugmentation – Whether to skip matcher augmentation. Defaults to false.
skipSourceTextAugmentation – Whether to skip source text augmentation. Defaults to false.
- cleanKeywords#
- cleanStopWords#
- enableLemmatizer#
- enableStemmer#
- excludePunctuation#
- excludeRegexPatterns#
- returnChunks#
- safeKeywords#
- skipMatcherAugmentation#
- skipSourceTextAugmentation#
- stopWords#
- getCleanKeywords()#
Gets the additional keywords to be removed alongside default stopwords.
- getExcludeRegexPatterns()#
Gets the regex patterns used to drop matched chunks.
- getReturnChunks()#
Gets whether to return the original text chunks from input or the matched (e.g., stemmed/lemmatized) phrases.
- getSafeKeywords()#
Gets the keywords to preserve during stopword removal when cleanStopWords is enabled.
- getStopWords()#
Gets the stop words to be removed.
- setCleanKeywords(b)#
Sets the additional keywords to be removed alongside default stopwords. Defaults to empty.
- Parameters:
b (list) – List of additional keywords to be removed
- setCleanStopWords(b)#
Sets whether to clean stop words, by default False.
- Parameters:
b (bool) – Whether to clean stop words
- setEnableLemmatizer(b)#
Sets whether to enable lemmatizer, by default False.
- Parameters:
b (bool) – Whether to enable lemmatizer
- setEnableStemmer(b)#
Sets whether to enable stemmer, by default False.
- Parameters:
b (bool) – Whether to enable stemmer
- setExcludePunctuation(b)#
Sets whether to exclude punctuation, by default True.
- Parameters:
b (bool) – Whether to exclude punctuation
- setExcludeRegexPatterns(b)#
Sets the regex patterns used to drop matched chunks. Defaults to empty.
- Parameters:
b (list) – List of regex patterns
- setReturnChunks(b)#
Sets whether to return the original text chunks from input or the matched (e.g., stemmed/lemmatized) phrases. Can be ‘original’ or ‘matched’. Defaults to ‘original’.
- Parameters:
b (str) – ‘original’ or ‘matched’
- setSafeKeywords(b)#
Sets the keywords to preserve during stopword removal when cleanStopWords is enabled. This will filter out the safe keywords from the stopwords list.
- Parameters:
b (list) – List of safe keywords
- setSkipMatcherAugmentation(b)#
Sets whether to skip matcher augmentation, by default False.
- Parameters:
b (bool) – Whether to skip matcher augmentation
- setSkipSourceTextAugmentation(b)#
Sets whether to skip source text augmentation, by default False.
- Parameters:
b (bool) – Whether to skip source text augmentation
- setStopWords(b)#
Sets the stop words to be removed.
- Parameters:
b (list) – List of stop words to be removed