com.johnsnowlabs.nlp.annotators.disambiguation

NerDisambiguator

Companion object NerDisambiguator

class NerDisambiguator extends AnnotatorApproach[NerDisambiguatorModel] with DisambiguatorModelParams

Links words of interest, such as names of persons, locations and companies, from an input text document to a corresponding unique entity in a target Knowledge Base (KB). Words of interest are called Named Entities (NEs), mentions, or surface forms. The model needs extracted CHUNKS and SENTENCE_EMBEDDINGS type input from e.g. SentenceEmbeddings and NerConverter.

Example

Extracting Person identities

First define pipeline stages that extract entities and embeddings. Entities are filtered for PER type entities.

val data = Seq("The show also had a contestant named Donald Trump who later defeated Christina Aguilera ...")
  .toDF("text")
val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")
val sentenceDetector = new SentenceDetector()
  .setInputCols("document")
  .setOutputCol("sentence")
val tokenizer = new Tokenizer()
  .setInputCols("sentence")
  .setOutputCol("token")
val word_embeddings = WordEmbeddingsModel.pretrained()
  .setInputCols("sentence", "token")
  .setOutputCol("embeddings")
val sentence_embeddings = new SentenceEmbeddings()
  .setInputCols("sentence","embeddings")
  .setOutputCol("sentence_embeddings")
val ner_model = NerDLModel.pretrained()
  .setInputCols("sentence", "token", "embeddings")
  .setOutputCol("ner")
val ner_converter = new NerConverter()
  .setInputCols("sentence", "token", "ner")
  .setOutputCol("ner_chunk")
  .setWhiteList("PER")

Then the extracted entities can be disambiguated.

 val disambiguator = new NerDisambiguator()
  .setS3KnowledgeBaseName("i-per")
  .setInputCols("ner_chunk", "sentence_embeddings")
  .setOutputCol("disambiguation")
  .setNumFirstChars(5)

val nlpPipeline = new Pipeline().setStages(Array(
  documentAssembler,
  sentenceDetector,
  tokenizer,
  word_embeddings,
  sentence_embeddings,
  ner_model,
  ner_converter,
  disambiguator))

val model = nlpPipeline.fit(data)
val result = model.transform(data)

Show results

result.selectExpr("explode(disambiguation)")
  .selectExpr("col.metadata.chunk as chunk", "col.result as result").show(5, false)
+------------------+------------------------------------------------------------------------------------------------------------------------+
|chunk             |result                                                                                                                  |
+------------------+------------------------------------------------------------------------------------------------------------------------+
|Donald Trump      |http://en.wikipedia.org/?curid=4848272, http://en.wikipedia.org/?curid=31698421, http://en.wikipedia.org/?curid=55907961|
|Christina Aguilera|http://en.wikipedia.org/?curid=144171, http://en.wikipedia.org/?curid=6636454                                           |
+------------------+------------------------------------------------------------------------------------------------------------------------+

Linear Supertypes

DisambiguatorModelParams, HasFeatures, AnnotatorApproach[NerDisambiguatorModel], CanBeLazy, DefaultParamsWritable, MLWritable, HasOutputAnnotatorType, HasOutputAnnotationCol, HasInputAnnotationCols, Estimator[NerDisambiguatorModel], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any

Ordering

Grouped
Alphabetic
By Inheritance

Inherited

NerDisambiguator
DisambiguatorModelParams
HasFeatures
AnnotatorApproach
CanBeLazy
DefaultParamsWritable
MLWritable
HasOutputAnnotatorType
HasOutputAnnotationCol
HasInputAnnotationCols
Estimator
PipelineStage
Logging
Params
Serializable
Serializable
Identifiable
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Instance Constructors

new NerDisambiguator()
new NerDisambiguator(uid: String)
uid
a unique identifier for the instantiated AnnotatorModel

Type Members

type AnnotatorType = String

Definition Classes
HasOutputAnnotatorType

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def $[T](param: Param[T]): T

Attributes
protected
Definition Classes
Params
def $$[T](feature: StructFeature[T]): T

Attributes
protected
Definition Classes
HasFeatures
def $$[K, V](feature: MapFeature[K, V]): Map[K, V]

Attributes
protected
Definition Classes
HasFeatures
def $$[T](feature: SetFeature[T]): Set[T]

Attributes
protected
Definition Classes
HasFeatures
def $$[T](feature: ArrayFeature[T]): Array[T]

Attributes
protected
Definition Classes
HasFeatures
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def _fit(dataset: Dataset[_], recursiveStages: Option[PipelineModel]): NerDisambiguatorModel

Attributes
protected
Definition Classes
AnnotatorApproach
final def asInstanceOf[T0]: T0

Definition Classes
Any
def beforeTraining(spark: SparkSession): Unit

Definition Classes
NerDisambiguator → AnnotatorApproach
final def checkSchema(schema: StructType, inputAnnotatorType: String): Boolean

Attributes
protected
Definition Classes
HasInputAnnotationCols
final def clear(param: Param[_]): NerDisambiguator.this.type

Definition Classes
Params
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
final def copy(extra: ParamMap): Estimator[NerDisambiguatorModel]

Definition Classes
AnnotatorApproach → Estimator → PipelineStage → Params
def copyValues[T <: Params](to: T, extra: ParamMap): T

Attributes
protected
Definition Classes
Params
final def defaultCopy[T <: Params](extra: ParamMap): T

Attributes
protected
Definition Classes
Params
val description: String

Definition Classes
NerDisambiguator → AnnotatorApproach
val embeddingTypeParam: Param[String]
Can be 'bow' for word embeddings or 'sentence' for sentences (Default: sentence)
Can be 'bow' for word embeddings or 'sentence' for sentences (Default: sentence)

Definition Classes
DisambiguatorModelParams
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def explainParam(param: Param[_]): String

Definition Classes
Params
def explainParams(): String

Definition Classes
Params
final def extractParamMap(): ParamMap

Definition Classes
Params
final def extractParamMap(extra: ParamMap): ParamMap

Definition Classes
Params
val features: ArrayBuffer[Feature[_, _, _]]

Definition Classes
HasFeatures
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def fit(dataset: Dataset[_]): NerDisambiguatorModel

Definition Classes
AnnotatorApproach → Estimator
def fit(dataset: Dataset[_], paramMaps: Seq[ParamMap]): Seq[NerDisambiguatorModel]

Definition Classes
Estimator
Annotations
@Since( "2.0.0" )
def fit(dataset: Dataset[_], paramMap: ParamMap): NerDisambiguatorModel

Definition Classes
Estimator
Annotations
@Since( "2.0.0" )
def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): NerDisambiguatorModel

Definition Classes
Estimator
Annotations
@Since( "2.0.0" ) @varargs()
def get[T](feature: StructFeature[T]): Option[T]

Attributes
protected
Definition Classes
HasFeatures
def get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]

Attributes
protected
Definition Classes
HasFeatures
def get[T](feature: SetFeature[T]): Option[Set[T]]

Attributes
protected
Definition Classes
HasFeatures
def get[T](feature: ArrayFeature[T]): Option[Array[T]]

Attributes
protected
Definition Classes
HasFeatures
final def get[T](param: Param[T]): Option[T]

Definition Classes
Params
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
final def getDefault[T](param: Param[T]): Option[T]

Definition Classes
Params
def getEmbeddingType: String
Can be 'bow' for word embeddings or 'sentence' for sentences (Default: sentence)
Can be 'bow' for word embeddings or 'sentence' for sentences (Default: sentence)

Definition Classes
DisambiguatorModelParams
def getInputCols: Array[String]

Definition Classes
HasInputAnnotationCols
def getLazyAnnotator: Boolean

Definition Classes
CanBeLazy
def getLevenshteinDistanceThresholdParam: Double
Levenshtein distance threshold to narrow results from prefix search (Default: 0.1)
Levenshtein distance threshold to narrow results from prefix search (Default: 0.1)

Definition Classes
DisambiguatorModelParams
def getNarrowWithApproximateMatching: Boolean
Whether to narrow prefix search results with levenstein distance based matching (Default: true)
Whether to narrow prefix search results with levenstein distance based matching (Default: true)

Definition Classes
DisambiguatorModelParams
def getNearMatchingGapParam: Int
Puts a limit on a string length (by trimming the candidate chunks) during levenshtein-distance based narrowing, len(candidate) - len(entity chunk) > nearMatchingGap (Default: 4).
Puts a limit on a string length (by trimming the candidate chunks) during levenshtein-distance based narrowing, len(candidate) - len(entity chunk) > nearMatchingGap (Default: 4).

Definition Classes
DisambiguatorModelParams
def getNumFirstChars: Int
How many characters should be considered for initial prefix search in knowledge base
How many characters should be considered for initial prefix search in knowledge base

Definition Classes
DisambiguatorModelParams
final def getOrDefault[T](param: Param[T]): T

Definition Classes
Params
final def getOutputCol: String

Definition Classes
HasOutputAnnotationCol
def getParam(paramName: String): Param[Any]

Definition Classes
Params
def getPredictionLimit: Int
Limit on amount of predictions N for topN predictions (Default: 100)
Limit on amount of predictions N for topN predictions (Default: 100)

Definition Classes
DisambiguatorModelParams
def getTokenSearch: Boolean
Whether to search by token or by chunk in knowledge base (Default: true)
Whether to search by token or by chunk in knowledge base (Default: true)

Definition Classes
DisambiguatorModelParams
final def hasDefault[T](param: Param[T]): Boolean

Definition Classes
Params
def hasParam(paramName: String): Boolean

Definition Classes
Params
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

Attributes
protected
Definition Classes
Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes
protected
Definition Classes
Logging
val inputAnnotatorTypes: Array[String]
Input annotator types: CHUNK, SENTENCE_EMBEDDINGS
Input annotator types: CHUNK, SENTENCE_EMBEDDINGS

Definition Classes
NerDisambiguator → HasInputAnnotationCols
final val inputCols: StringArrayParam

Attributes
protected
Definition Classes
HasInputAnnotationCols
final def isDefined(param: Param[_]): Boolean

Definition Classes
Params
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def isSet(param: Param[_]): Boolean

Definition Classes
Params
def isTraceEnabled(): Boolean

Attributes
protected
Definition Classes
Logging
val knowledgeBase: Param[String]
Knowledge base path
val lazyAnnotator: BooleanParam

Definition Classes
CanBeLazy
val levenshteinDistanceThresholdParam: DoubleParam
Levenshtein distance threshold to narrow results from prefix search (Default: 0.1)
Levenshtein distance threshold to narrow results from prefix search (Default: 0.1)

Definition Classes
DisambiguatorModelParams
def log: Logger

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logName: String

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def msgHelper(schema: StructType): String

Attributes
protected
Definition Classes
HasInputAnnotationCols
val narrowWithApproximateMatching: BooleanParam
Whether to narrow prefix search results with levenstein distance based matching (Default: true)
Whether to narrow prefix search results with levenstein distance based matching (Default: true)

Definition Classes
DisambiguatorModelParams
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
val nearMatchingGapParam: IntParam
Puts a limit on a string length (by trimming the candidate chunks) during levenshtein-distance based narrowing, len(candidate) - len(entity chunk) > nearMatchingGap (Default: 4).
Puts a limit on a string length (by trimming the candidate chunks) during levenshtein-distance based narrowing, len(candidate) - len(entity chunk) > nearMatchingGap (Default: 4).

Definition Classes
DisambiguatorModelParams
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
val numFirstChars: IntParam
How many characters should be considered for initial prefix search in knowledge base
How many characters should be considered for initial prefix search in knowledge base

Definition Classes
DisambiguatorModelParams
def onTrained(model: NerDisambiguatorModel, spark: SparkSession): Unit

Definition Classes
AnnotatorApproach
val optionalInputAnnotatorTypes: Array[String]

Definition Classes
HasInputAnnotationCols
val outputAnnotatorType: AnnotatorType
Output annotator types: DISAMBIGUATION
Output annotator types: DISAMBIGUATION

Definition Classes
NerDisambiguator → HasOutputAnnotatorType
final val outputCol: Param[String]

Attributes
protected
Definition Classes
HasOutputAnnotationCol
lazy val params: Array[Param[_]]

Definition Classes
Params
val predictionsLimit: IntParam
Limit on amount of predictions N for topN predictions (Default: 100)
Limit on amount of predictions N for topN predictions (Default: 100)

Definition Classes
DisambiguatorModelParams
def resolveStorageName(database: String): String
val s3KnowledgeBaseName: Param[String]
Knowledge base name in s3
def save(path: String): Unit

Definition Classes
MLWritable
Annotations
@Since( "1.6.0" ) @throws( ... )
def set[T](feature: StructFeature[T], value: T): NerDisambiguator.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[K, V](feature: MapFeature[K, V], value: Map[K, V]): NerDisambiguator.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[T](feature: SetFeature[T], value: Set[T]): NerDisambiguator.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[T](feature: ArrayFeature[T], value: Array[T]): NerDisambiguator.this.type

Attributes
protected
Definition Classes
HasFeatures
final def set(paramPair: ParamPair[_]): NerDisambiguator.this.type

Attributes
protected
Definition Classes
Params
final def set(param: String, value: Any): NerDisambiguator.this.type

Attributes
protected
Definition Classes
Params
final def set[T](param: Param[T], value: T): NerDisambiguator.this.type

Definition Classes
Params
def setDefault[T](feature: StructFeature[T], value: () ⇒ T): NerDisambiguator.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): NerDisambiguator.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): NerDisambiguator.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): NerDisambiguator.this.type

Attributes
protected
Definition Classes
HasFeatures
final def setDefault(paramPairs: ParamPair[_]*): NerDisambiguator.this.type

Attributes
protected
Definition Classes
Params
final def setDefault[T](param: Param[T], value: T): NerDisambiguator.this.type

Attributes
protected[org.apache.spark.ml]
Definition Classes
Params
def setEmbeddingType(v: String): NerDisambiguator.this.type
Can be 'bow' for word embeddings or 'sentence' for sentences (Default: sentence)
Can be 'bow' for word embeddings or 'sentence' for sentences (Default: sentence)

Definition Classes
DisambiguatorModelParams
final def setInputCols(value: String*): NerDisambiguator.this.type

Definition Classes
HasInputAnnotationCols
def setInputCols(value: Array[String]): NerDisambiguator.this.type

Definition Classes
HasInputAnnotationCols
def setKnowledgeBase(path: String): NerDisambiguator.this.type
Knowledge base path
def setLazyAnnotator(value: Boolean): NerDisambiguator.this.type

Definition Classes
CanBeLazy
def setLevenshteinDistanceThresholdParam(v: Double): NerDisambiguator.this.type
Levenshtein distance threshold to narrow results from prefix search (Default: 0.1)
Levenshtein distance threshold to narrow results from prefix search (Default: 0.1)

Definition Classes
DisambiguatorModelParams
def setNarrowWithApproximateMatching(v: Boolean): NerDisambiguator.this.type
Whether to narrow prefix search results with levenstein distance based matching (Default: true)
Whether to narrow prefix search results with levenstein distance based matching (Default: true)

Definition Classes
DisambiguatorModelParams
def setNearMatchingGapParam(v: Int): NerDisambiguator.this.type
Puts a limit on a string length (by trimming the candidate chunks) during levenshtein-distance based narrowing, len(candidate) - len(entity chunk) > nearMatchingGap (Default: 4).
Puts a limit on a string length (by trimming the candidate chunks) during levenshtein-distance based narrowing, len(candidate) - len(entity chunk) > nearMatchingGap (Default: 4).

Definition Classes
DisambiguatorModelParams
def setNumFirstChars(v: Int): NerDisambiguator.this.type
How many characters should be considered for initial prefix search in knowledge base
How many characters should be considered for initial prefix search in knowledge base

Definition Classes
DisambiguatorModelParams
final def setOutputCol(value: String): NerDisambiguator.this.type

Definition Classes
HasOutputAnnotationCol
def setPredictionLimit(v: Int): NerDisambiguator.this.type
Limit on amount of predictions N for topN predictions (Default: 100)
Limit on amount of predictions N for topN predictions (Default: 100)

Definition Classes
DisambiguatorModelParams
def setS3KnowledgeBaseName(path: String): NerDisambiguator.this.type
Knowledge base name in s3
def setTokenSearch(v: Boolean): NerDisambiguator.this.type
Whether to search by token or by chunk in knowledge base (Default: true)
Whether to search by token or by chunk in knowledge base (Default: true)

Definition Classes
DisambiguatorModelParams
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
Identifiable → AnyRef → Any
val tokenSearch: BooleanParam
Whether to search by token or by chunk in knowledge base (Default: true)
Whether to search by token or by chunk in knowledge base (Default: true)

Definition Classes
DisambiguatorModelParams
def train(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): NerDisambiguatorModel

Definition Classes
NerDisambiguator → AnnotatorApproach
final def transformSchema(schema: StructType): StructType

Definition Classes
AnnotatorApproach → PipelineStage
def transformSchema(schema: StructType, logging: Boolean): StructType

Attributes
protected
Definition Classes
PipelineStage
Annotations
@DeveloperApi()
val uid: String

Definition Classes
NerDisambiguator → Identifiable
def validate(schema: StructType): Boolean

Attributes
protected
Definition Classes
AnnotatorApproach
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
def write: MLWriter

Definition Classes
DefaultParamsWritable → MLWritable

Packages

NerDisambiguator

Companion object NerDisambiguator

class NerDisambiguator extends AnnotatorApproach[NerDisambiguatorModel] with DisambiguatorModelParams

Example

Extracting Person identities

Instance Constructors

Type Members

Value Members

Inherited from DisambiguatorModelParams

Inherited from HasFeatures

Inherited from AnnotatorApproach[NerDisambiguatorModel]

Inherited from CanBeLazy

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from HasOutputAnnotatorType

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from Estimator[NerDisambiguatorModel]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Annotator types

Members

Parameter setters

Parameter getters

Packages

NerDisambiguator 

Companion object NerDisambiguator

class NerDisambiguator extends AnnotatorApproach[NerDisambiguatorModel] with DisambiguatorModelParams

Example

Extracting Person identities

Instance Constructors

Type Members

Value Members

Inherited from DisambiguatorModelParams

Inherited from HasFeatures

Inherited from AnnotatorApproach[NerDisambiguatorModel]

Inherited from CanBeLazy

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from HasOutputAnnotatorType

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from Estimator[NerDisambiguatorModel]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Annotator types

Members

Parameter setters

Parameter getters

NerDisambiguator