com.johnsnowlabs.nlp.annotators.resolution
SentenceEntityResolverModel
Companion object SentenceEntityResolverModel
class SentenceEntityResolverModel extends AnnotatorModel[SentenceEntityResolverModel] with SentenceResolverParams with HasStorageModel with HasEmbeddingsProperties with HasCaseSensitiveProperties with HasSimpleAnnotate[SentenceEntityResolverModel] with HandleExceptionParams with HasSafeAnnotate[SentenceEntityResolverModel] with CheckLicense
The model transforms a dataset with Input Annotation type SENTENCE_EMBEDDINGS, coming from e.g. BertSentenceEmbeddings and returns the normalized entity for a particular trained ontology / curated dataset. (e.g. ICD-10, RxNorm, SNOMED etc.)
To use pretrained models please see the Models Hub for available models.
Example
Resolving CPT
First define pipeline stages to extract entities
val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val sentenceDetector = SentenceDetectorDLModel.pretrained() .setInputCols("document") .setOutputCol("sentence") val tokenizer = new Tokenizer() .setInputCols("sentence") .setOutputCol("token") val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models") .setInputCols("sentence", "token") .setOutputCol("embeddings") val clinical_ner = MedicalNerModel.pretrained("jsl_ner_wip_clinical", "en", "clinical/models") .setInputCols("sentence", "token", "embeddings") .setOutputCol("ner") val ner_converter = new NerConverter() .setInputCols("sentence", "token", "ner") .setOutputCol("ner_chunk") .setWhiteList("Test","Procedure") val c2doc = new Chunk2Doc() .setInputCols("ner_chunk") .setOutputCol("ner_chunk_doc") val sbert_embedder = BertSentenceEmbeddings .pretrained("sbiobert_base_cased_mli","en","clinical/models") .setInputCols("ner_chunk_doc") .setOutputCol("sbert_embeddings")
Then the resolver is defined on the extracted entities and sentence embeddings
val cpt_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_cpt_procedures_augmented","en", "clinical/models") .setInputCols("sbert_embeddings") .setOutputCol("cpt_code") .setDistanceFunction("EUCLIDEAN") val sbert_pipeline_cpt = new Pipeline().setStages(Array( documentAssembler, sentenceDetector, tokenizer, word_embeddings, clinical_ner, ner_converter, c2doc, sbert_embedder, cpt_resolver))
Show results
sbert_outputs .select("explode(arrays_zip(ner_chunk.result ,ner_chunk.metadata, cpt_code.result, cpt_code.metadata, ner_chunk.begin, ner_chunk.end)) as cpt_code") .selectExpr( "cpt_code['0'] as chunk", "cpt_code['1'].entity as entity", "cpt_code['2'] as code", "cpt_code['3'].confidence as confidence", "cpt_code['3'].all_k_resolutions as all_k_resolutions", "cpt_code['3'].all_k_results as all_k_results" ).show(5) +--------------------+---------+-----+----------+--------------------+--------------------+ | chunk| entity| code|confidence| all_k_resolutions| all_k_codes| +--------------------+---------+-----+----------+--------------------+--------------------+ | heart cath|Procedure|93566| 0.1180|CCA - Cardiac cat...|93566:::62319:::9...| |selective coronar...| Test|93460| 0.1000|Coronary angiogra...|93460:::93458:::9...| |common femoral an...| Test|35884| 0.1808|Femoral artery by...|35884:::35883:::3...| | StarClose closure|Procedure|33305| 0.1197|Heart closure:::H...|33305:::33300:::3...| | stress test| Test|93351| 0.2795|Cardiovascular st...|93351:::94621:::9...| +--------------------+---------+-----+----------+--------------------+--------------------+
- See also
SentenceEntityResolverApproach for training a custom model
- Grouped
- Alphabetic
- By Inheritance
- SentenceEntityResolverModel
- CheckLicense
- HasSafeAnnotate
- HandleExceptionParams
- HasSimpleAnnotate
- HasEmbeddingsProperties
- HasProtectedParams
- HasStorageModel
- HasStorageOptions
- HasStorageReader
- HasCaseSensitiveProperties
- HasStorageRef
- SentenceResolverParams
- AnnotatorModel
- CanBeLazy
- RawAnnotator
- HasOutputAnnotationCol
- HasInputAnnotationCols
- HasOutputAnnotatorType
- ParamsAndFeaturesWritable
- HasFeatures
- DefaultParamsWritable
- MLWritable
- Model
- Transformer
- PipelineStage
- Logging
- Params
- Serializable
- Serializable
- Identifiable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
Type Members
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
$[T](param: Param[T]): T
- Attributes
- protected
- Definition Classes
- Params
-
def
$$[T](feature: StructFeature[T]): T
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
$$[K, V](feature: MapFeature[K, V]): Map[K, V]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
$$[T](feature: SetFeature[T]): Set[T]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
$$[T](feature: ArrayFeature[T]): Array[T]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
_transform(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): DataFrame
- Attributes
- protected
- Definition Classes
- AnnotatorModel
-
def
afterAnnotate(dataset: DataFrame): DataFrame
- Attributes
- protected
- Definition Classes
- AnnotatorModel
-
def
annotate(annotations: Seq[Annotation]): Seq[Annotation]
Resolves the ResolverLabel for the given array of TOKEN and WORD_EMBEDDINGS annotations
Resolves the ResolverLabel for the given array of TOKEN and WORD_EMBEDDINGS annotations
- annotations
an array of TOKEN and WORD_EMBEDDINGS Annotation objects coming from ChunkTokenizer and ChunkEmbeddings respectively
- returns
an array of Annotation objects, with the result of the entity resolution for each chunk and the following metadata all_k_results -> Sorted ResolverLabels in the top
alternatives
that match the distancethreshold
all_k_resolutions -> Respective ResolverNormalized strings all_k_distances -> Respective distance values after aggregation all_k_wmd_distances -> Respective WMD distance values all_k_tfidf_distances -> Respective TFIDF Cosinge distance values all_k_jaccard_distances -> Respective Jaccard distance values all_k_sorensen_distances -> Respective SorensenDice distance values all_k_jaro_distances -> Respective JaroWinkler distance values all_k_levenshtein_distances -> Respective Levenshtein distance values all_k_confidences -> Respective normalized probabilities based in inverse distance values target_text -> The actual searched string resolved_text -> The top ResolverNormalized string confidence -> Top probability distance -> Top distance value sentence -> Sentence index chunk -> Chunk Index token -> Token index
- Definition Classes
- SentenceEntityResolverModel → HasSimpleAnnotate
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
val
auxLabelMap: StructFeature[Map[String, String]]
Collection of Parameters, which are used by method annotate()
-
def
beforeAnnotate(dataset: Dataset[_]): Dataset[_]
validates the dataset before applying it further down the pipeline
validates the dataset before applying it further down the pipeline
- Attributes
- protected
- Definition Classes
- SentenceEntityResolverModel → AnnotatorModel
-
val
caseSensitive: BooleanParam
- Definition Classes
- HasCaseSensitiveProperties
-
final
def
checkSchema(schema: StructType, inputAnnotatorType: String): Boolean
- Attributes
- protected
- Definition Classes
- HasInputAnnotationCols
-
def
checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit
- Definition Classes
- CheckLicense
-
def
checkValidScope(scope: String): Unit
- Definition Classes
- CheckLicense
-
def
checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit
- Definition Classes
- CheckLicense
-
def
checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit
- Definition Classes
- CheckLicense
-
final
def
clear(param: Param[_]): SentenceEntityResolverModel.this.type
- Definition Classes
- Params
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
val
confidenceFunction: Param[String]
- Definition Classes
- SentenceResolverParams
- def continueTraining(newKeys: Array[Array[Float]], newData: Array[TreeData], blackList: Set[String], overwrite: Boolean = true): SerializableKDTree[TreeData]
-
def
copy(extra: ParamMap): SentenceEntityResolverModel
- Definition Classes
- RawAnnotator → Model → Transformer → PipelineStage → Params
-
def
copyValues[T <: Params](to: T, extra: ParamMap): T
- Attributes
- protected
- Definition Classes
- Params
-
def
createDatabaseConnection(database: Name): RocksDBConnection
- Definition Classes
- HasStorageRef
-
def
createReader(database: Name, connection: RocksDBConnection): WordEmbeddingsReader
creates WordEmbeddingsReader, based on the DB name and connection
creates WordEmbeddingsReader, based on the DB name and connection
- database
Name of the desired database
- connection
Connection to the RocksDB
- returns
The instance of the class WordEmbeddingsReader
- Attributes
- protected
- Definition Classes
- SentenceEntityResolverModel → HasStorageReader
-
val
databases: Array[Name]
This cannot hold EMBEDDINGS since otherwise ER will try to re-save and read embeddings again
This cannot hold EMBEDDINGS since otherwise ER will try to re-save and read embeddings again
- Attributes
- protected
- Definition Classes
- SentenceEntityResolverModel → HasStorageModel
-
val
datasetInfo: Param[String]
Descriptive information about the dataset being used.
Descriptive information about the dataset being used.
- Definition Classes
- SentenceResolverParams
-
final
def
defaultCopy[T <: Params](extra: ParamMap): T
- Attributes
- protected
- Definition Classes
- Params
-
def
deserializeStorage(path: String, spark: SparkSession): Unit
- Definition Classes
- HasStorageModel
-
def
dfAnnotate: UserDefinedFunction
- Definition Classes
- HasSimpleAnnotate
-
val
dimension: ProtectedParam[Int]
- Definition Classes
- HasEmbeddingsProperties
-
val
distanceFunction: Param[String]
what distance function to use for KNN: 'EUCLIDEAN' or 'COSINE'
what distance function to use for KNN: 'EUCLIDEAN' or 'COSINE'
- Definition Classes
- SentenceResolverParams
-
val
doExceptionHandling: BooleanParam
If true, exceptions are handled.
If true, exceptions are handled. If exception causing data is passed to the model, a error annotation is emitted which has the exception message. Processing continues with the next one. This comes with a performance penalty.
- Definition Classes
- HandleExceptionParams
-
val
enableInMemoryStorage: BooleanParam
- Definition Classes
- HasStorageOptions
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
explainParam(param: Param[_]): String
- Definition Classes
- Params
-
def
explainParams(): String
- Definition Classes
- Params
-
def
extraValidate(structType: StructType): Boolean
- Attributes
- protected
- Definition Classes
- RawAnnotator
-
def
extraValidateMsg: String
- Attributes
- protected
- Definition Classes
- RawAnnotator
-
final
def
extractParamMap(): ParamMap
- Definition Classes
- Params
-
final
def
extractParamMap(extra: ParamMap): ParamMap
- Definition Classes
- Params
-
val
features: ArrayBuffer[Feature[_, _, _]]
- Definition Classes
- HasFeatures
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
get[T](feature: StructFeature[T]): Option[T]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
get[T](feature: SetFeature[T]): Option[Set[T]]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
get[T](feature: ArrayFeature[T]): Option[Array[T]]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
final
def
get[T](param: Param[T]): Option[T]
- Definition Classes
- Params
-
def
getAuxLabelMap(): Map[String, String]
Map[String,String] where key=label and value=auxLabel from a dataset.
-
def
getCaseSensitive: Boolean
- Definition Classes
- HasCaseSensitiveProperties
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getConfidenceFunction: String
- Definition Classes
- SentenceResolverParams
-
def
getDatasetInfo: String
get descriptive information about the dataset being used
get descriptive information about the dataset being used
- Definition Classes
- SentenceResolverParams
-
final
def
getDefault[T](param: Param[T]): Option[T]
- Definition Classes
- Params
-
def
getDimension: Int
- Definition Classes
- HasEmbeddingsProperties
-
def
getDistanceFunction: String
- Definition Classes
- SentenceResolverParams
-
def
getEnableInMemoryStorage: Boolean
- Definition Classes
- HasStorageOptions
-
def
getIncludeStorage: Boolean
- Definition Classes
- HasStorageOptions
-
def
getInputCols: Array[String]
- Definition Classes
- HasInputAnnotationCols
-
def
getLazyAnnotator: Boolean
- Definition Classes
- CanBeLazy
-
def
getMissAsEmpty: Boolean
- Definition Classes
- SentenceResolverParams
-
def
getNeighbours: Int
- Definition Classes
- SentenceResolverParams
-
final
def
getOrDefault[T](param: Param[T]): T
- Definition Classes
- Params
-
final
def
getOutputCol: String
- Definition Classes
- HasOutputAnnotationCol
-
def
getParam(paramName: String): Param[Any]
- Definition Classes
- Params
-
def
getReader[A](database: Name): StorageReader[A]
- Attributes
- protected
- Definition Classes
- HasStorageReader
-
def
getReturnAllKEmbeddings(): Boolean
Whether to return all embeddings of all K candidates of the resolution.
Whether to return all embeddings of all K candidates of the resolution. Embeddings will be in the metadata. Increase in RAM usage to be expected
-
def
getReturnCosineDistances: Boolean
Whether to calculate and return cosine distances between a chunk/token and the k closest candidates.
Whether to calculate and return cosine distances between a chunk/token and the k closest candidates. Can improve accuracy but increases computation.
- def getSearchTree: SerializableKDTree[TreeData]
-
def
getStorageRef: String
- Definition Classes
- HasStorageRef
-
def
getThreshold: Double
- Definition Classes
- SentenceResolverParams
-
def
getUseAuxLabel(): Boolean
Whether to use Aux Label or not
-
final
def
hasDefault[T](param: Param[T]): Boolean
- Definition Classes
- Params
-
def
hasParam(paramName: String): Boolean
- Definition Classes
- Params
-
def
hasParent: Boolean
- Definition Classes
- Model
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
val
inExceptionMode: Boolean
- Attributes
- protected
- Definition Classes
- HasSafeAnnotate
-
val
includeStorage: BooleanParam
- Definition Classes
- HasStorageOptions
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
val
inputAnnotatorTypes: Array[String]
Input annotator types: SENTENCE_EMBEDDINGS
Input annotator types: SENTENCE_EMBEDDINGS
- Definition Classes
- SentenceEntityResolverModel → HasInputAnnotationCols
-
final
val
inputCols: StringArrayParam
- Attributes
- protected
- Definition Classes
- HasInputAnnotationCols
-
final
def
isDefined(param: Param[_]): Boolean
- Definition Classes
- Params
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
isSet(param: Param[_]): Boolean
- Definition Classes
- Params
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
val
lazyAnnotator: BooleanParam
- Definition Classes
- CanBeLazy
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
val
missAsEmpty: BooleanParam
whether or not to return an empty annotation on unmatched chunks
whether or not to return an empty annotation on unmatched chunks
- Definition Classes
- SentenceResolverParams
-
def
msgHelper(schema: StructType): String
- Attributes
- protected
- Definition Classes
- HasInputAnnotationCols
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
val
neighbours: IntParam
number of neighbours to consider in the KNN query to calculate WMD
number of neighbours to consider in the KNN query to calculate WMD
- Definition Classes
- SentenceResolverParams
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
onWrite(path: String, spark: SparkSession): Unit
- Attributes
- protected
- Definition Classes
- HasStorageModel → ParamsAndFeaturesWritable
-
val
optionalInputAnnotatorTypes: Array[String]
- Definition Classes
- HasInputAnnotationCols
-
val
outputAnnotatorType: AnnotatorType
Output annotator types: ENTITY
Output annotator types: ENTITY
- Definition Classes
- SentenceEntityResolverModel → HasOutputAnnotatorType
-
final
val
outputCol: Param[String]
- Attributes
- protected
- Definition Classes
- HasOutputAnnotationCol
-
lazy val
params: Array[Param[_]]
- Definition Classes
- Params
-
var
parent: Estimator[SentenceEntityResolverModel]
- Definition Classes
- Model
-
val
readers: Map[Name, StorageReader[_]]
- Attributes
- protected
- Definition Classes
- HasStorageReader
- Annotations
- @transient()
-
val
returnResolvedTextEmbeddings: BooleanParam
Whether to include embeddings for resolved text embeddings.(Default : false)
-
def
safeAnnotate(annotations: Seq[Annotation]): Seq[Annotation]
A protected method designed to safely annotate a sequence of Annotation objects by handling exceptions.
A protected method designed to safely annotate a sequence of Annotation objects by handling exceptions.
- annotations
A sequence of Annotation.
- returns
A sequence of Annotation objects after processing, potentially containing error annotations.
- Attributes
- protected
- Definition Classes
- HasSafeAnnotate
-
def
save(path: String): Unit
- Definition Classes
- MLWritable
- Annotations
- @Since( "1.6.0" ) @throws( ... )
-
def
saveStorage(path: String, spark: SparkSession, withinStorage: Boolean): Unit
- Definition Classes
- HasStorageModel
-
val
searchTree: StructFeature_HadoopFix[SerializableKDTree[TreeData]]
Search Tree.
Search Tree. Under the hood encapsulates SerializableKDTree. Used to perform the search
-
def
serializeStorage(path: String, spark: SparkSession): Unit
- Definition Classes
- HasStorageModel
-
def
set[T](param: ProtectedParam[T], value: T): SentenceEntityResolverModel.this.type
- Definition Classes
- HasProtectedParams
-
def
set[T](feature: StructFeature[T], value: T): SentenceEntityResolverModel.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
set[K, V](feature: MapFeature[K, V], value: Map[K, V]): SentenceEntityResolverModel.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
set[T](feature: SetFeature[T], value: Set[T]): SentenceEntityResolverModel.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
set[T](feature: ArrayFeature[T], value: Array[T]): SentenceEntityResolverModel.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
final
def
set(paramPair: ParamPair[_]): SentenceEntityResolverModel.this.type
- Attributes
- protected
- Definition Classes
- Params
-
final
def
set(param: String, value: Any): SentenceEntityResolverModel.this.type
- Attributes
- protected
- Definition Classes
- Params
-
final
def
set[T](param: Param[T], value: T): SentenceEntityResolverModel.this.type
- Definition Classes
- Params
-
def
setAuxLabelMap(m: Map[String, String]): SentenceEntityResolverModel.this.type
Map[String,String] where key=label and value=auxLabel from a dataset.
-
def
setCaseSensitive(value: Boolean): SentenceEntityResolverModel.this.type
- Definition Classes
- HasCaseSensitiveProperties
-
def
setConfidenceFunction(v: String): SentenceEntityResolverModel.this.type
- Definition Classes
- SentenceResolverParams
-
def
setDatasetInfo(value: String): SentenceEntityResolverModel.this.type
set descriptive information about the dataset being used
set descriptive information about the dataset being used
- Definition Classes
- SentenceResolverParams
-
def
setDefault[T](feature: StructFeature[T], value: () ⇒ T): SentenceEntityResolverModel.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): SentenceEntityResolverModel.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): SentenceEntityResolverModel.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): SentenceEntityResolverModel.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
final
def
setDefault(paramPairs: ParamPair[_]*): SentenceEntityResolverModel.this.type
- Attributes
- protected
- Definition Classes
- Params
-
final
def
setDefault[T](param: Param[T], value: T): SentenceEntityResolverModel.this.type
- Attributes
- protected[org.apache.spark.ml]
- Definition Classes
- Params
-
def
setDimension(value: Int): SentenceEntityResolverModel.this.type
- Definition Classes
- HasEmbeddingsProperties
-
def
setDistanceFunction(value: String): SentenceEntityResolverModel.this.type
- Definition Classes
- SentenceResolverParams
-
def
setDoExceptionHandling(value: Boolean): SentenceEntityResolverModel.this.type
If true, exceptions are handled.
If true, exceptions are handled. If exception causing data is passed to the model, a error annotation is emitted which has the exception message. Processing continues with the next one. This comes with a performance penalty.
- Definition Classes
- HandleExceptionParams
-
def
setEnableInMemoryStorage(value: Boolean): SentenceEntityResolverModel.this.type
- Definition Classes
- HasStorageOptions
-
def
setIncludeStorage(value: Boolean): SentenceEntityResolverModel.this.type
- Definition Classes
- HasStorageOptions
-
final
def
setInputCols(value: String*): SentenceEntityResolverModel.this.type
- Definition Classes
- HasInputAnnotationCols
-
def
setInputCols(value: Array[String]): SentenceEntityResolverModel.this.type
- Definition Classes
- HasInputAnnotationCols
-
def
setLazyAnnotator(value: Boolean): SentenceEntityResolverModel.this.type
- Definition Classes
- CanBeLazy
-
def
setMissAsEmpty(v: Boolean): SentenceEntityResolverModel.this.type
- Definition Classes
- SentenceResolverParams
-
def
setNeighbours(k: Int): SentenceEntityResolverModel.this.type
- Definition Classes
- SentenceResolverParams
-
final
def
setOutputCol(value: String): SentenceEntityResolverModel.this.type
- Definition Classes
- HasOutputAnnotationCol
-
def
setParent(parent: Estimator[SentenceEntityResolverModel]): SentenceEntityResolverModel
- Definition Classes
- Model
-
def
setReturnResolvedTextEmbeddings(value: Boolean): SentenceEntityResolverModel.this.type
Whether to include embeddings for resolved text embeddings.(Default : false)
- def setSearchTree(tree: SerializableKDTree[TreeData]): SentenceEntityResolverModel.this.type
-
def
setStorageRef(value: String): SentenceEntityResolverModel.this.type
- Definition Classes
- HasStorageRef
-
def
setThreshold(dist: Double): SentenceEntityResolverModel.this.type
- Definition Classes
- SentenceResolverParams
-
def
setUseAuxLabel(b: Boolean): SentenceEntityResolverModel.this.type
Whether to use Aux Label or not
-
val
storageRef: Param[String]
- Definition Classes
- HasStorageRef
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
val
threshold: DoubleParam
threshold value for the aggregated distance
threshold value for the aggregated distance
- Definition Classes
- SentenceResolverParams
-
def
toString(): String
- Definition Classes
- Identifiable → AnyRef → Any
-
final
def
transform(dataset: Dataset[_]): DataFrame
- Definition Classes
- AnnotatorModel → Transformer
-
def
transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
- Definition Classes
- Transformer
- Annotations
- @Since( "2.0.0" )
-
def
transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
- Definition Classes
- Transformer
- Annotations
- @Since( "2.0.0" ) @varargs()
-
final
def
transformSchema(schema: StructType): StructType
- Definition Classes
- RawAnnotator → PipelineStage
-
def
transformSchema(schema: StructType, logging: Boolean): StructType
- Attributes
- protected
- Definition Classes
- PipelineStage
- Annotations
- @DeveloperApi()
-
val
uid: String
- Definition Classes
- SentenceEntityResolverModel → Identifiable
-
val
useAuxLabel: BooleanParam
Whether to use Aux Label or not (Default: false)
-
def
validate(schema: StructType): Boolean
- Attributes
- protected
- Definition Classes
- RawAnnotator
-
def
validateStorageRef(dataset: Dataset[_], inputCols: Array[String], annotatorType: String): Unit
- Definition Classes
- HasStorageRef
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
wrapColumnMetadata(col: Column): Column
- Attributes
- protected
- Definition Classes
- RawAnnotator
-
def
wrapEmbeddingsMetadata(col: Column, embeddingsDim: Int, embeddingsRef: Option[String]): Column
- Attributes
- protected
- Definition Classes
- HasEmbeddingsProperties
-
def
wrapSentenceEmbeddingsMetadata(col: Column, embeddingsDim: Int, embeddingsRef: Option[String]): Column
- Attributes
- protected
- Definition Classes
- HasEmbeddingsProperties
-
def
write: MLWriter
- Definition Classes
- ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable
Deprecated Value Members
-
val
auxLabelCol: Param[String]
Optional column with one extra label per document.
Optional column with one extra label per document. This extra label will be outputted later on in an additional column (Default: aux_label)
- Annotations
- @deprecated
- Deprecated
-
def
getAuxLabelCol(): String
Optional column with one extra label per document.
Optional column with one extra label per document. This extra label will be outputted later on in an additional column.
- Annotations
- @deprecated
- Deprecated
-
def
getReturnEuclideanDistances: Boolean
Whether to Euclidean distances of the k closest candidates for a chunk/token.
Whether to Euclidean distances of the k closest candidates for a chunk/token.
- Annotations
- @deprecated
- Deprecated
-
val
returnAllKEmbeddings: BooleanParam
Whether to return all embeddings of all K candidates of the resolution.
Whether to return all embeddings of all K candidates of the resolution. Embeddings will be in the metadata. Increase in RAM usage to be expected (Default: false)
- Annotations
- @deprecated
- Deprecated
-
val
returnCosineDistances: BooleanParam
Whether to calculate and return cosine distances between a chunk/token and the k closest candidates.
Whether to calculate and return cosine distances between a chunk/token and the k closest candidates. Can improve accuracy but increases computation (Default: true)
- Annotations
- @deprecated
- Deprecated
-
val
returnEuclideanDistances: BooleanParam
Whether to Euclidean distances of the k closest candidates for a chunk/token (Default: true)
Whether to Euclidean distances of the k closest candidates for a chunk/token (Default: true)
- Annotations
- @deprecated
- Deprecated
-
def
setAuxLabelCol(c: String): SentenceEntityResolverModel.this.type
Optional column with one extra label per document.
Optional column with one extra label per document. This extra label will be outputted later on in an additional column.
- Annotations
- @deprecated
- Deprecated
-
def
setReturnAllKEmbeddings(b: Boolean): SentenceEntityResolverModel.this.type
Whether to return all embeddings of all K candidates of the resolution.
Whether to return all embeddings of all K candidates of the resolution. Embeddings will be in the metadata. Increase in RAM usage to be expected
- Annotations
- @deprecated
- Deprecated
-
def
setReturnCosineDistances(value: Boolean): SentenceEntityResolverModel.this.type
Whether to calculate and return cosine distances between a chunk/token and the k closest candidates.
Whether to calculate and return cosine distances between a chunk/token and the k closest candidates. Can improve accuracy but increases computation.
- Annotations
- @deprecated
- Deprecated
-
def
setReturnEuclideanDistances(value: Boolean): SentenceEntityResolverModel.this.type
Whether to Euclidean distances of the k closest candidates for a chunk/token.
Whether to Euclidean distances of the k closest candidates for a chunk/token.
- Annotations
- @deprecated
- Deprecated