com.johnsnowlabs.nlp.annotators.resolution
whether or not to return an all distance values in the metadata.
whether or not to return an all distance values in the metadata. Default: False
number of results to return in the metadata after sorting by last distance calculated
number of results to return in the metadata after sorting by last distance calculated
Resolves the ResolverLabel for the given array of TOKEN and WORD_EMBEDDINGS annotations
Resolves the ResolverLabel for the given array of TOKEN and WORD_EMBEDDINGS annotations
an array of TOKEN and WORD_EMBEDDINGS Annotation objects coming from ChunkTokenizer and ChunkEmbeddings respectively
an array of Annotation objects, with the result of the entity resolution for each chunk and the following metadata
all_k_results -> Sorted ResolverLabels in the top alternatives
that match the distance threshold
all_k_resolutions -> Respective ResolverNormalized strings
all_k_distances -> Respective distance values after aggregation
all_k_wmd_distances -> Respective WMD distance values
all_k_tfidf_distances -> Respective TFIDF Cosinge distance values
all_k_jaccard_distances -> Respective Jaccard distance values
all_k_sorensen_distances -> Respective SorensenDice distance values
all_k_jaro_distances -> Respective JaroWinkler distance values
all_k_levenshtein_distances -> Respective Levenshtein distance values
all_k_confidences -> Respective normalized probabilities based in inverse distance values
target_text -> The actual searched string
resolved_text -> The top ResolverNormalized string
confidence -> Top probability
distance -> Top distance value
sentence -> Sentence index
chunk -> Chunk Index
token -> Token index
Optional column with one extra label per document.
Optional column with one extra label per document. This extra label will be outputted later on in an additional column
validates the dataset before applying it further down the pipeline
validates the dataset before applying it further down the pipeline
what function to use to calculate confidence: INVERSE or SOFTMAX
what function to use to calculate confidence: INVERSE or SOFTMAX
creates WordEmbeddingsReader, based on the DB name and connection
creates WordEmbeddingsReader, based on the DB name and connection
Name of the desired database
Connection to the RocksDB
The instance of the class WordEmbeddingsReader
This cannot hold EMBEDDINGS since otherwise ER will try to re-save and read embeddings again
This cannot hold EMBEDDINGS since otherwise ER will try to re-save and read embeddings again
what distance function to use for KNN: 'EUCLIDEAN' or 'COSINE'
what distance function to use for KNN: 'EUCLIDEAN' or 'COSINE'
distance weights to apply before pooling: [WMD, TFIDF, Jaccard, SorensenDice, JaroWinkler, Levenshtein]
distance weights to apply before pooling: [WMD, TFIDF, Jaccard, SorensenDice, JaroWinkler, Levenshtein]
whether or not to use Jaccard token distance.
whether or not to use Jaccard token distance. Default: True
whether or not to use Jaro-Winkler character distance.
whether or not to use Jaro-Winkler character distance. Default: False
whether or not to use Levenshtein character distance.
whether or not to use Levenshtein character distance. Default: False
whether or not to use Sorensen-Dice token distance.
whether or not to use Sorensen-Dice token distance. Default: False
whether or not to use TFIDF token distance.
whether or not to use TFIDF token distance. Default: True
whether or not to use WMD token distance.
whether or not to use WMD token distance. Default: True
penalty for extra words in the knowledge base match during WMD calculation
penalty for extra words in the knowledge base match during WMD calculation
Annotator reference id.
Annotator reference id. Used to identify elements in metadata or to refer to this annotator type
whether or not to return an empty annotation on unmatched chunks
whether or not to return an empty annotation on unmatched chunks
number of neighbours to consider in the KNN query to calculate WMD
number of neighbours to consider in the KNN query to calculate WMD
pooling strategy to aggregate distances: AVERAGE or SUM
pooling strategy to aggregate distances: AVERAGE or SUM
Whether cosine distances should be calculated between a
Whether cosine distances should be calculated between a
chunk and the k_candidates result embeddings
Search Tree.
Search Tree. Under the hood encapsulates SerializableKDTree. Used to perform the search
Optional column with one extra label per document.
Optional column with one extra label per document. This extra label will be outputted later on in an additional column
Inverted Document Frequency of the term.
Inverted Document Frequency of the term. Used in the TF-IDF method
threshold value for the aggregated distance
threshold value for the aggregated distance
Contains all the parameters to transform a dataset with two Input Annotations of types TOKEN and WORD_EMBEDDINGS, coming from ChunkTokenizer and ChunkEmbeddings Annotators and return the Normalized Entity for a particular trained ontology / curated dataset.