sparknlp_jsl.annotator.CommonResolverParams#
- class sparknlp_jsl.annotator.CommonResolverParams[source]#
Bases:
HasCaseSensitiveProperties
Class used to have a common interface Entity Resolver family.
- Parameters:
- distanceFunction
What distance function to use for WMD: ‘EUCLIDEAN’ or ‘COSINE’.
- neighbours
Number of neighbours to consider in the KNN query to calculate WMD
- alternatives
Number of results to return in the metadata after sorting by last distance calculated
- extramassPenalty
Penalty for extra words in the knowledge base match
- threshold
Threshold value for the last distance calculated.
- enableWmd
Whether or not to use WMD token distance.
- enableTfidf
Whether or not to use TFIDF token distance.
- enableJaccard
Whether or not to use Jaccard token distance.
- enableSorensenDice
Whether or not to use Sorensen-Dice token distance.
- enableJaroWinkler =
Whether or not to use Jaro-Winkler character distance.
- enableLevenshtein
Whether or not to use Levenshtein character distance.
- distanceWeights
Distance weights to apply before pooling: [WMD, TFIDF, Jaccard, SorensenDice, JaroWinkler, Levenshtein].
- poolingStrategy
Pooling strategy to aggregate distances: AVERAGE, MIN or MAX.
- confidenceFunction
What function to use to calculate confidence: INVERSE or SOFTMAX.
- allDistancesMetadata
Whether or not to return an all distance values in the metadata. Default: False.
- missAsEmpty
Whether or not to return an empty annotation on unmatched chunks.
Methods
__init__
(*args, **kwargs)Gets whether to ignore case in tokens for embeddings matching.
Sets whether or not to return an all distance values in the metadata.
Sets number of results to return in the metadata after sorting by last distance calculated.
setCaseSensitive
(value)Sets whether to ignore case in tokens for embeddings matching.
What function to use to calculate confidence: INVERSE or SOFTMAX.
setDistanceFunction
(dist)Sets distance function to use for WMD: 'EUCLIDEAN' or 'COSINE'.
Sets distance weights to apply before pooling: [WMD, TFIDF, Jaccard, SorensenDice, JaroWinkler, Levenshtein].
Sets whether or not to use Jaccard token distance.
Whether or not to use Jaro-Winkler token distance.
Sets whether or not to use Levenshtein token distance.
Sets whether or not to use Sorensen-Dice token distance.
Sets whether or not to use TFIDF token distance.
setEnableWmd
(e)Sets whether or not to use WMD token distance.
setExtramassPenalty
(emp)Sets penalty for extra words in the knowledge base match.
setMissAsEmpty
(value)Sets whether or not to return an empty annotation on unmatched chunks
Sets number of neighbours to consider in the KNN query to calculate WMD.
Sets pooling strategy to aggregate distances: AVERAGE, MIN or MAX.
setThreshold
(thres)Sets Threshold value for the last distance calculated.
Attributes
allDistancesMetadata
alternatives
caseSensitive
confidenceFunction
distanceFunction
distanceWeights
enableJaccard
enableJaroWinkler
enableLevenshtein
enableSorensenDice
enableTfidf
enableWmd
extramassPenalty
missAsEmpty
neighbours
poolingStrategy
threshold
- getCaseSensitive()#
Gets whether to ignore case in tokens for embeddings matching.
- Returns:
- bool
Whether to ignore case in tokens for embeddings matching
- setAllDistancesMetadata(s)[source]#
Sets whether or not to return an all distance values in the metadata. Default: False.
- Parameters:
- sbool
whether or not to return an all distance values in the metadata. Default: False.
- setAlternatives(a)[source]#
Sets number of results to return in the metadata after sorting by last distance calculated.
- Parameters:
- aint
Number of results to return in the metadata after sorting by last distance calculated.
- setCaseSensitive(value)#
Sets whether to ignore case in tokens for embeddings matching.
- Parameters:
- valuebool
Whether to ignore case in tokens for embeddings matching
- setConfidenceFunction(s)[source]#
What function to use to calculate confidence: INVERSE or SOFTMAX.
- Parameters:
- sstr
What function to use to calculate confidence: INVERSE or SOFTMAX.
- setDistanceFunction(dist)[source]#
Sets distance function to use for WMD: ‘EUCLIDEAN’ or ‘COSINE’.
- Parameters:
- diststr
Value that selects what distance function to use for WMD: ‘EUCLIDEAN’ or ‘COSINE’.
- setDistanceWeights(l)[source]#
Sets distance weights to apply before pooling: [WMD, TFIDF, Jaccard, SorensenDice, JaroWinkler, Levenshtein].
- Parameters:
- lstr
Whether or not to use Jaro-Winkler token distance.
- setEnableJaccard(e)[source]#
Sets whether or not to use Jaccard token distance.
- Parameters:
- ebool
- Whether or not to use Jaccard token distance.
- setEnableJaroWinkler(e)[source]#
Whether or not to use Jaro-Winkler token distance.
- Parameters:
- ebool
Whether or not to use Jaro-Winkler token distance.
- setEnableLevenshtein(e)[source]#
Sets whether or not to use Levenshtein token distance.
- Parameters:
- ebool
Whether or not to use Levenshtein token distance.
- setEnableSorensenDice(e)[source]#
Sets whether or not to use Sorensen-Dice token distance.
- Parameters:
- ebool
Whether or not to use Sorensen-Dice token distance.
- setEnableTfidf(e)[source]#
Sets whether or not to use TFIDF token distance.
- Parameters:
- pbool
Whether or not to use TFIDF token distance.
- setEnableWmd(e)[source]#
Sets whether or not to use WMD token distance.
- Parameters:
- ebool
Whether or not to use WMD token distance.
- setExtramassPenalty(emp)[source]#
Sets penalty for extra words in the knowledge base match.
- Parameters:
- empfloat
Penalty for extra words in the knowledge base match.
- setMissAsEmpty(value)[source]#
Sets whether or not to return an empty annotation on unmatched chunks
- Parameters:
- sbool
whether or not to return an empty annotation on unmatched chunks
- setNeighbours(k)[source]#
Sets number of neighbours to consider in the KNN query to calculate WMD.
- Parameters:
- kint
Number of neighbours to consider in the KNN query to calculate WMD.