sparknlp_jsl.annotator.CommonResolverParams#

class sparknlp_jsl.annotator.CommonResolverParams[source]#

Bases: HasCaseSensitiveProperties

Class used to have a common interface Entity Resolver family.

Parameters:
distanceFunction

What distance function to use for WMD: ‘EUCLIDEAN’ or ‘COSINE’.

neighbours

Number of neighbours to consider in the KNN query to calculate WMD

alternatives

Number of results to return in the metadata after sorting by last distance calculated

extramassPenalty

Penalty for extra words in the knowledge base match

threshold

Threshold value for the last distance calculated.

enableWmd

Whether or not to use WMD token distance.

enableTfidf

Whether or not to use TFIDF token distance.

enableJaccard

Whether or not to use Jaccard token distance.

enableSorensenDice

Whether or not to use Sorensen-Dice token distance.

enableJaroWinkler =

Whether or not to use Jaro-Winkler character distance.

enableLevenshtein

Whether or not to use Levenshtein character distance.

distanceWeights

Distance weights to apply before pooling: [WMD, TFIDF, Jaccard, SorensenDice, JaroWinkler, Levenshtein].

poolingStrategy

Pooling strategy to aggregate distances: AVERAGE, MIN or MAX.

confidenceFunction

What function to use to calculate confidence: INVERSE or SOFTMAX.

allDistancesMetadata

Whether or not to return an all distance values in the metadata. Default: False.

missAsEmpty

Whether or not to return an empty annotation on unmatched chunks.

Methods

__init__(*args, **kwargs)

getCaseSensitive()

Gets whether to ignore case in tokens for embeddings matching.

setAllDistancesMetadata(s)

Sets whether or not to return an all distance values in the metadata.

setAlternatives(a)

Sets number of results to return in the metadata after sorting by last distance calculated.

setCaseSensitive(value)

Sets whether to ignore case in tokens for embeddings matching.

setConfidenceFunction(s)

What function to use to calculate confidence: INVERSE or SOFTMAX.

setDistanceFunction(dist)

Sets distance function to use for WMD: 'EUCLIDEAN' or 'COSINE'.

setDistanceWeights(l)

Sets distance weights to apply before pooling: [WMD, TFIDF, Jaccard, SorensenDice, JaroWinkler, Levenshtein].

setEnableJaccard(e)

Sets whether or not to use Jaccard token distance.

setEnableJaroWinkler(e)

Whether or not to use Jaro-Winkler token distance.

setEnableLevenshtein(e)

Sets whether or not to use Levenshtein token distance.

setEnableSorensenDice(e)

Sets whether or not to use Sorensen-Dice token distance.

setEnableTfidf(e)

Sets whether or not to use TFIDF token distance.

setEnableWmd(e)

Sets whether or not to use WMD token distance.

setExtramassPenalty(emp)

Sets penalty for extra words in the knowledge base match.

setMissAsEmpty(value)

Sets whether or not to return an empty annotation on unmatched chunks

setNeighbours(k)

Sets number of neighbours to consider in the KNN query to calculate WMD.

setPoolingStrategy(s)

Sets pooling strategy to aggregate distances: AVERAGE, MIN or MAX.

setThreshold(thres)

Sets Threshold value for the last distance calculated.

Attributes

allDistancesMetadata

alternatives

caseSensitive

confidenceFunction

distanceFunction

distanceWeights

enableJaccard

enableJaroWinkler

enableLevenshtein

enableSorensenDice

enableTfidf

enableWmd

extramassPenalty

missAsEmpty

neighbours

poolingStrategy

threshold

getCaseSensitive()#

Gets whether to ignore case in tokens for embeddings matching.

Returns:
bool

Whether to ignore case in tokens for embeddings matching

setAllDistancesMetadata(s)[source]#

Sets whether or not to return an all distance values in the metadata. Default: False.

Parameters:
sbool

whether or not to return an all distance values in the metadata. Default: False.

setAlternatives(a)[source]#

Sets number of results to return in the metadata after sorting by last distance calculated.

Parameters:
aint

Number of results to return in the metadata after sorting by last distance calculated.

setCaseSensitive(value)#

Sets whether to ignore case in tokens for embeddings matching.

Parameters:
valuebool

Whether to ignore case in tokens for embeddings matching

setConfidenceFunction(s)[source]#

What function to use to calculate confidence: INVERSE or SOFTMAX.

Parameters:
sstr

What function to use to calculate confidence: INVERSE or SOFTMAX.

setDistanceFunction(dist)[source]#

Sets distance function to use for WMD: ‘EUCLIDEAN’ or ‘COSINE’.

Parameters:
diststr

Value that selects what distance function to use for WMD: ‘EUCLIDEAN’ or ‘COSINE’.

setDistanceWeights(l)[source]#

Sets distance weights to apply before pooling: [WMD, TFIDF, Jaccard, SorensenDice, JaroWinkler, Levenshtein].

Parameters:
lstr

Whether or not to use Jaro-Winkler token distance.

setEnableJaccard(e)[source]#

Sets whether or not to use Jaccard token distance.

Parameters:
ebool
Whether or not to use Jaccard token distance.
setEnableJaroWinkler(e)[source]#

Whether or not to use Jaro-Winkler token distance.

Parameters:
ebool

Whether or not to use Jaro-Winkler token distance.

setEnableLevenshtein(e)[source]#

Sets whether or not to use Levenshtein token distance.

Parameters:
ebool

Whether or not to use Levenshtein token distance.

setEnableSorensenDice(e)[source]#

Sets whether or not to use Sorensen-Dice token distance.

Parameters:
ebool

Whether or not to use Sorensen-Dice token distance.

setEnableTfidf(e)[source]#

Sets whether or not to use TFIDF token distance.

Parameters:
pbool

Whether or not to use TFIDF token distance.

setEnableWmd(e)[source]#

Sets whether or not to use WMD token distance.

Parameters:
ebool

Whether or not to use WMD token distance.

setExtramassPenalty(emp)[source]#

Sets penalty for extra words in the knowledge base match.

Parameters:
empfloat

Penalty for extra words in the knowledge base match.

setMissAsEmpty(value)[source]#

Sets whether or not to return an empty annotation on unmatched chunks

Parameters:
sbool

whether or not to return an empty annotation on unmatched chunks

setNeighbours(k)[source]#

Sets number of neighbours to consider in the KNN query to calculate WMD.

Parameters:
kint

Number of neighbours to consider in the KNN query to calculate WMD.

setPoolingStrategy(s)[source]#

Sets pooling strategy to aggregate distances: AVERAGE, MIN or MAX.

Parameters:
sstr

Pooling strategy to aggregate distances: AVERAGE, MIN or MAX.

setThreshold(thres)[source]#

Sets Threshold value for the last distance calculated.

Parameters:
thresfloat

Threshold value for the last distance calculated.