ChunkMergeApproach

class ChunkMergeApproach extends AnnotatorApproach[ChunkMergeModel] with CheckLicense with HasMultipleInputAnnotationCols with MergeResourceParams with MergeCommonParams with MergePrioritizationParams with HasFeatures with FilteringParams with HandleExceptionParams with ResetSentenceIndicesParam

Merges two chunk columns coming from two annotators(NER, ContextualParser or any other annotator producing chunks). The merger of the two chunk columns is made by selecting one chunk from one of the columns according to certain criteria. The decision on which chunk to select is made according to the chunk indices in the source document. (chunks with longer lengths and highest information will be kept from each source) Labels can be changed by setReplaceDictResource.

Example

Define a pipeline with 2 different NER models with a ChunkMergeApproach at the end

val data = Seq(("A 63-year-old man presents to the hospital ...")).toDF("text")
val pipeline = new Pipeline().setStages(Array(
  new DocumentAssembler().setInputCol("text").setOutputCol("document"),
  new SentenceDetector().setInputCols("document").setOutputCol("sentence"),
  new Tokenizer().setInputCols("sentence").setOutputCol("token"),
  WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models").setOutputCol("embs"),
  MedicalNerModel.pretrained("ner_jsl", "en", "clinical/models")
    .setInputCols("sentence", "token", "embs").setOutputCol("jsl_ner"),
  new NerConverter().setInputCols("sentence", "token", "jsl_ner").setOutputCol("jsl_ner_chunk"),
  MedicalNerModel.pretrained("ner_bionlp", "en", "clinical/models")
    .setInputCols("sentence", "token", "embs").setOutputCol("bionlp_ner"),
  new NerConverter().setInputCols("sentence", "token", "bionlp_ner")
    .setOutputCol("bionlp_ner_chunk"),
  new ChunkMergeApproach().setInputCols("jsl_ner_chunk", "bionlp_ner_chunk").setOutputCol("merged_chunk")
))

Show results

val result = pipeline.fit(data).transform(data).cache()
result.selectExpr("explode(merged_chunk) as a")
  .selectExpr("a.begin","a.end","a.result as chunk","a.metadata.entity as entity")
  .show(5, false)
+-----+---+-----------+---------+
|begin|end|chunk      |entity   |
+-----+---+-----------+---------+
|5    |15 |63-year-old|Age      |
|17   |19 |man        |Gender   |
|64   |72 |recurrent  |Modifier |
|98   |107|cellulitis |Diagnosis|
|110  |119|pneumonias |Diagnosis|
+-----+---+-----------+---------+

Linear Supertypes

ResetSentenceIndicesParam, HandleExceptionParams, FilteringParams, HasFeatures, MergePrioritizationParams, MergeCommonParams, MergeResourceParams, HasMultipleInputAnnotationCols, CheckLicense, AnnotatorApproach[ChunkMergeModel], CanBeLazy, DefaultParamsWritable, MLWritable, HasOutputAnnotatorType, HasOutputAnnotationCol, HasInputAnnotationCols, Estimator[ChunkMergeModel], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any

Ordering

Grouped
Alphabetic
By Inheritance

Inherited

ChunkMergeApproach
ResetSentenceIndicesParam
HandleExceptionParams
FilteringParams
HasFeatures
MergePrioritizationParams
MergeCommonParams
MergeResourceParams
HasMultipleInputAnnotationCols
CheckLicense
AnnotatorApproach
CanBeLazy
DefaultParamsWritable
MLWritable
HasOutputAnnotatorType
HasOutputAnnotationCol
HasInputAnnotationCols
Estimator
PipelineStage
Logging
Params
Serializable
Serializable
Identifiable
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Instance Constructors

new ChunkMergeApproach()
new ChunkMergeApproach(uid: String)
uid
a unique identifier for the instantiated AnnotatorModel

Type Members

type AnnotatorType = String

Definition Classes
HasOutputAnnotatorType

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def $[T](param: Param[T]): T

Attributes
protected
Definition Classes
Params
def $$[T](feature: StructFeature[T]): T

Attributes
protected
Definition Classes
HasFeatures
def $$[K, V](feature: MapFeature[K, V]): Map[K, V]

Attributes
protected
Definition Classes
HasFeatures
def $$[T](feature: SetFeature[T]): Set[T]

Attributes
protected
Definition Classes
HasFeatures
def $$[T](feature: ArrayFeature[T]): Array[T]

Attributes
protected
Definition Classes
HasFeatures
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def _fit(dataset: Dataset[_], recursiveStages: Option[PipelineModel]): ChunkMergeModel

Attributes
protected
Definition Classes
AnnotatorApproach
final def asInstanceOf[T0]: T0

Definition Classes
Any
def beforeTraining(spark: SparkSession): Unit

Definition Classes
AnnotatorApproach
val blackList: StringArrayParam
If defined, list of entities to ignore.
If defined, list of entities to ignore. The rest will be processed

Definition Classes
FilteringParams
val caseSensitive: BooleanParam
Determines whether the definitions of the white listed and black listed entities are case sensitive or not.
Determines whether the definitions of the white listed and black listed entities are case sensitive or not. If the filterValue is 'entity', 'caseSensitive' is always false. The default value is true, except: com.johnsnowlabs.nlp.annotators.chunker.AssertionFilterer

Definition Classes
FilteringParams
final def checkSchema(schema: StructType, inputAnnotatorType: String): Boolean

Attributes
protected
Definition Classes
HasInputAnnotationCols
def checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit

Definition Classes
CheckLicense
def checkValidScope(scope: String): Unit

Definition Classes
CheckLicense
def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit

Definition Classes
CheckLicense
def checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit

Definition Classes
CheckLicense
val chunkPrecedence: Param[String]
When ChunkPrecedence ordering feature is used this param contains the comma separated metadata fields that drive prioritization of overlapping annotations.
When ChunkPrecedence ordering feature is used this param contains the comma separated metadata fields that drive prioritization of overlapping annotations. When used by itself (empty chunkPrecedenceValuePrioritization) annotations will be prioritized based on number of metadata fields present. When used together with chunkPrecedenceValuePrioritization param it will prioritize based on the order of its values.

Definition Classes
MergePrioritizationParams
val chunkPrecedenceValuePrioritization: StringArrayParam
When ChunkPrecedence ordering feature is used this param contains an Array of comma separated strings representing the desired order of prioritization for the values in the metadata fields included in chunkPrecedence.
When ChunkPrecedence ordering feature is used this param contains an Array of comma separated strings representing the desired order of prioritization for the values in the metadata fields included in chunkPrecedence.

Definition Classes
MergePrioritizationParams
final def clear(param: Param[_]): ChunkMergeApproach.this.type

Definition Classes
Params
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
final def copy(extra: ParamMap): Estimator[ChunkMergeModel]

Definition Classes
AnnotatorApproach → Estimator → PipelineStage → Params
def copyValues[T <: Params](to: T, extra: ParamMap): T

Attributes
protected
Definition Classes
Params
val criteria: Param[String]
It is used to how to compare black and white listed values with the result of the Annotation.
It is used to how to compare black and white listed values with the result of the Annotation. Possible values are the following: 'isin', 'regex'. Default: isin
- isin : Filter by the chunk
- regex : Filter by using a regex
Definition Classes
FilteringParams
val defaultConfidence: FloatParam
When ChunkConfidence ordering feature is included and a given annotation does not have any confidence the value of this param will be used.
When ChunkConfidence ordering feature is included and a given annotation does not have any confidence the value of this param will be used.

Definition Classes
MergePrioritizationParams
final def defaultCopy[T <: Params](extra: ParamMap): T

Attributes
protected
Definition Classes
Params
val description: String

Definition Classes
ChunkMergeApproach → AnnotatorApproach
val doExceptionHandling: BooleanParam
If true, exceptions are handled.
If true, exceptions are handled. If exception causing data is passed to the model, a error annotation is emitted which has the exception message. Processing continues with the next one. This comes with a performance penalty.

Definition Classes
HandleExceptionParams
val entitiesConfidence: MapFeature[String, Float]
Pairs (entity,confidenceThreshold).
Pairs (entity,confidenceThreshold). Filter the chunks with entities which have confidence lower than the confidence threshold.

Definition Classes
FilteringParams
lazy val entitiesConfidenceMap: Map[String, Float]

Definition Classes
FilteringParams
val entitiesConfidenceResource: ExternalResourceParam
Path to csv with entity pairs to remove chunks based on the confidance level
Path to csv with entity pairs to remove chunks based on the confidance level

Definition Classes
MergeResourceParams
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def explainParam(param: Param[_]): String

Definition Classes
Params
def explainParams(): String

Definition Classes
Params
final def extractParamMap(): ParamMap

Definition Classes
Params
final def extractParamMap(extra: ParamMap): ParamMap

Definition Classes
Params
val falsePositivesResource: ExternalResourceParam
Path to csv with false positive text, entity pairs to remove
Path to csv with false positive text, entity pairs to remove

Definition Classes
MergeResourceParams
val features: ArrayBuffer[Feature[_, _, _]]

Definition Classes
HasFeatures
val filterValue: Param[String]
Possible values are 'result' and 'entity'.
Possible values are 'result' and 'entity'. If the value is 'entity', it filters the ner chunks by the ner label that you want to filter. If the value is 'result', it will filter chunks by the result of the Annotation.

Definition Classes
FilteringParams
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def fit(dataset: Dataset[_]): ChunkMergeModel

Definition Classes
AnnotatorApproach → Estimator
def fit(dataset: Dataset[_], paramMaps: Seq[ParamMap]): Seq[ChunkMergeModel]

Definition Classes
Estimator
Annotations
@Since( "2.0.0" )
def fit(dataset: Dataset[_], paramMap: ParamMap): ChunkMergeModel

Definition Classes
Estimator
Annotations
@Since( "2.0.0" )
def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): ChunkMergeModel

Definition Classes
Estimator
Annotations
@Since( "2.0.0" ) @varargs()
def get[T](feature: StructFeature[T]): Option[T]

Attributes
protected
Definition Classes
HasFeatures
def get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]

Attributes
protected
Definition Classes
HasFeatures
def get[T](feature: SetFeature[T]): Option[Set[T]]

Attributes
protected
Definition Classes
HasFeatures
def get[T](feature: ArrayFeature[T]): Option[Array[T]]

Attributes
protected
Definition Classes
HasFeatures
final def get[T](param: Param[T]): Option[T]

Definition Classes
Params
def getBlackList: Array[String]
Gets blackList parameter
Gets blackList parameter

Definition Classes
FilteringParams
def getCaseSensitive: Boolean
Gets caseSensitive parameter
Gets caseSensitive parameter

Definition Classes
FilteringParams
def getChunkPrecedence: String

Definition Classes
MergePrioritizationParams
def getChunkPrecedenceValuePrioritization: Array[String]

Definition Classes
MergePrioritizationParams
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
final def getDefault[T](param: Param[T]): Option[T]

Definition Classes
Params
def getDefaultConfidence: Float

Definition Classes
MergePrioritizationParams
def getInputCols: Array[String]

Definition Classes
HasInputAnnotationCols
def getLazyAnnotator: Boolean

Definition Classes
CanBeLazy
def getMergeOverlapping: Boolean

Definition Classes
MergeCommonParams
final def getOrDefault[T](param: Param[T]): T

Definition Classes
Params
def getOrderingFeatures: Array[String]

Definition Classes
MergePrioritizationParams
final def getOutputCol: String

Definition Classes
HasOutputAnnotationCol
def getParam(paramName: String): Param[Any]

Definition Classes
Params
def getResetSentenceIndices: Boolean
Gets resetSentenceIndices parameter
Gets resetSentenceIndices parameter

Definition Classes
ResetSentenceIndicesParam
def getSelectionStrategy: String

Definition Classes
MergePrioritizationParams
def getWhiteList: Array[String]
Gets whiteList parameter
Gets whiteList parameter

Definition Classes
FilteringParams
final def hasDefault[T](param: Param[T]): Boolean

Definition Classes
Params
def hasParam(paramName: String): Boolean

Definition Classes
Params
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

Attributes
protected
Definition Classes
Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes
protected
Definition Classes
Logging
val inputAnnotatorType: String
Output annotator types: CHUNK, CHUNK
Output annotator types: CHUNK, CHUNK

Definition Classes
ChunkMergeApproach → HasMultipleInputAnnotationCols
lazy val inputAnnotatorTypes: Array[String]

Definition Classes
HasMultipleInputAnnotationCols → HasInputAnnotationCols
final val inputCols: StringArrayParam

Attributes
protected
Definition Classes
HasInputAnnotationCols
final def isDefined(param: Param[_]): Boolean

Definition Classes
Params
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def isSet(param: Param[_]): Boolean

Definition Classes
Params
def isTraceEnabled(): Boolean

Attributes
protected
Definition Classes
Logging
val lazyAnnotator: BooleanParam

Definition Classes
CanBeLazy
def log: Logger

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logName: String

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
val mergeOverlapping: BooleanParam
whether to merge overlapping matched chunks.
whether to merge overlapping matched chunks. Defaults to true

Definition Classes
MergeCommonParams
def msgHelper(schema: StructType): String

Attributes
protected
Definition Classes
HasInputAnnotationCols
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
def onTrained(model: ChunkMergeModel, spark: SparkSession): Unit

Definition Classes
AnnotatorApproach
val optionalInputAnnotatorTypes: Array[String]

Definition Classes
HasInputAnnotationCols
val orderingFeatures: StringArrayParam
Array of strings specifying the ordering features to use for overlapping entities.
Array of strings specifying the ordering features to use for overlapping entities. Possible values are ChunkBegin, ChunkLength, ChunkPrecedence, ChunkConfidence.

Definition Classes
MergePrioritizationParams
val outputAnnotatorType: AnnotatorType
Input annotator types: CHUNK
Input annotator types: CHUNK

Definition Classes
ChunkMergeApproach → HasOutputAnnotatorType
final val outputCol: Param[String]

Attributes
protected
Definition Classes
HasOutputAnnotationCol
lazy val params: Array[Param[_]]

Definition Classes
Params
def prioritize(annotations: Seq[Annotation]): Seq[Annotation]

Attributes
protected
Definition Classes
MergePrioritizationParams
val regex: StringArrayParam
If defined, list of regex to process the chunks (Default: Array())
If defined, list of regex to process the chunks (Default: Array())

Definition Classes
FilteringParams
val replaceDictResource: ExternalResourceParam
dictionary with regular expression patterns that match some protected entity TODO: is this regex?
dictionary with regular expression patterns that match some protected entity TODO: is this regex?

Definition Classes
MergeResourceParams
def resetSentenceIndices(metadata: Map[String, String]): Map[String, String]
Reset sentence index in metadata by adding "sentence" -> "0"
Reset sentence index in metadata by adding "sentence" -> "0"

Attributes
protected
Definition Classes
ResetSentenceIndicesParam
val resetSentenceIndices: BooleanParam
Whether to reset sentence indices to treat the entire output as if it originates from a single document.
Whether to reset sentence indices to treat the entire output as if it originates from a single document.
When set to true, the metadata of each entity will be updated by assigning the sentence key a value of 0, effectively treating the entire output as if it comes from a single document, regardless of the original sentence boundaries. Default: False.

Definition Classes
ResetSentenceIndicesParam
def resolveFilter(chunkerAnnotations: Seq[Annotation]): Seq[Annotation]

Attributes
protected
Definition Classes
FilteringParams
def resolveMergeFilter(a: Annotation, entityValue: String, falsePositivesArray: Array[(String, String, String)], replaceDictMap: Map[String, String] = Map.empty): Option[Annotation]

Attributes
protected
Definition Classes
FilteringParams
def resolveWhiteListBlackListFilter(annotations: Seq[Annotation]): Seq[Annotation]

Attributes
protected
Definition Classes
FilteringParams
def save(path: String): Unit

Definition Classes
MLWritable
Annotations
@Since( "1.6.0" ) @throws( ... )
val selectionStrategy: Param[String]
Whether to select annotations sequentially based on annotation order (Sequential) or using any other available strategy; currently only Sequential and DiverseLonger are available.
Whether to select annotations sequentially based on annotation order (Sequential) or using any other available strategy; currently only Sequential and DiverseLonger are available.

Definition Classes
MergePrioritizationParams
def set[T](feature: StructFeature[T], value: T): ChunkMergeApproach.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[K, V](feature: MapFeature[K, V], value: Map[K, V]): ChunkMergeApproach.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[T](feature: SetFeature[T], value: Set[T]): ChunkMergeApproach.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[T](feature: ArrayFeature[T], value: Array[T]): ChunkMergeApproach.this.type

Attributes
protected
Definition Classes
HasFeatures
final def set(paramPair: ParamPair[_]): ChunkMergeApproach.this.type

Attributes
protected
Definition Classes
Params
final def set(param: String, value: Any): ChunkMergeApproach.this.type

Attributes
protected
Definition Classes
Params
final def set[T](param: Param[T], value: T): ChunkMergeApproach.this.type

Definition Classes
Params
def setAllowList(list: String*): ChunkMergeApproach.this.type

Definition Classes
FilteringParams
def setAllowList(list: Array[String]): ChunkMergeApproach.this.type

Definition Classes
FilteringParams
def setBlackList(list: String*): ChunkMergeApproach.this.type

Definition Classes
FilteringParams
def setBlackList(list: Array[String]): ChunkMergeApproach.this.type
If defined, list of entities to ignore.
If defined, list of entities to ignore. The rest will be processed.

Definition Classes
FilteringParams
def setCaseSensitive(value: Boolean): ChunkMergeApproach.this.type
Determines whether the definitions of the white listed and black listed entities are case sensitive or not.
Determines whether the definitions of the white listed and black listed entities are case sensitive or not. If the filterValue is 'entity', 'caseSensitive' is always False. The default value is true, except: com.johnsnowlabs.nlp.annotators.chunker.AssertionFilterer

Definition Classes
FilteringParams
def setChunkPrecedence(m: String): ChunkMergeApproach.this.type

Definition Classes
MergePrioritizationParams
def setChunkPrecedenceValuePrioritization(m: Array[String]): ChunkMergeApproach.this.type

Definition Classes
MergePrioritizationParams
def setCriteria(s: String): ChunkMergeApproach.this.type
Sets criteria for how to compare black and white listed values with the result of the Annotation.
Sets criteria for how to compare black and white listed values with the result of the Annotation. Possible values are the following: 'isin', 'regex'. Default: isin.
- 'isin' : Filter by the chunk.
- 'regex' : Filter by using a regex.
- You can use 'assertion' in com.johnsnowlabs.nlp.annotators.chunker.AssertionFilterer and 'assertion' option is default value for com.johnsnowlabs.nlp.annotators.chunker.AssertionFilterer
Definition Classes
FilteringParams
def setDefault[T](feature: StructFeature[T], value: () ⇒ T): ChunkMergeApproach.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): ChunkMergeApproach.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): ChunkMergeApproach.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): ChunkMergeApproach.this.type

Attributes
protected
Definition Classes
HasFeatures
final def setDefault(paramPairs: ParamPair[_]*): ChunkMergeApproach.this.type

Attributes
protected
Definition Classes
Params
final def setDefault[T](param: Param[T], value: T): ChunkMergeApproach.this.type

Attributes
protected[org.apache.spark.ml]
Definition Classes
Params
def setDefaultConfidence(m: Float): ChunkMergeApproach.this.type

Definition Classes
MergePrioritizationParams
def setDenyList(list: String*): ChunkMergeApproach.this.type

Definition Classes
FilteringParams
def setDenyList(list: Array[String]): ChunkMergeApproach.this.type

Definition Classes
FilteringParams
def setDoExceptionHandling(value: Boolean): ChunkMergeApproach.this.type
If true, exceptions are handled.
If true, exceptions are handled. If exception causing data is passed to the model, a error annotation is emitted which has the exception message. Processing continues with the next one. This comes with a performance penalty.

Definition Classes
HandleExceptionParams
def setEntitiesConfidence(value: HashMap[String, Double]): ChunkMergeApproach.this.type
Sets Pairs (entity,confidenceThreshold) to filter the chunks with entities which have confidence lower than the confidence threshold.
def setEntitiesConfidence(value: Map[String, Float]): ChunkMergeApproach.this.type

Definition Classes
FilteringParams
def setEntitiesConfidenceResource(path: String, readAs: Format = ReadAs.TEXT, options: Map[String, String] = Map("delimiter" -> ",")): ChunkMergeApproach.this.type

Definition Classes
MergeResourceParams
def setFalsePositivesResource(path: String, readAs: Format = ReadAs.TEXT, options: Map[String, String] = Map("delimiter" -> ",")): ChunkMergeApproach.this.type
Path to csv with false positive text, entity pairs to remove
Path to csv with false positive text, entity pairs to remove

Definition Classes
MergeResourceParams
def setFilterEntity(v: String): ChunkMergeApproach.this.type
Possible values are 'result' and 'entity'.
Possible values are 'result' and 'entity'. If the value is 'entity', it filters the ner chunks by the ner label that you want to filter. If the value is 'result', it will filter chunks by the result of the Annotation.

Definition Classes
FilteringParams
def setInputCols(value: Array[String]): ChunkMergeApproach.this.type

Definition Classes
HasMultipleInputAnnotationCols → HasInputAnnotationCols
final def setInputCols(value: String*): ChunkMergeApproach.this.type

Definition Classes
HasInputAnnotationCols
def setLazyAnnotator(value: Boolean): ChunkMergeApproach.this.type

Definition Classes
CanBeLazy
def setMergeOverlapping(v: Boolean): ChunkMergeApproach.this.type
whether to merge overlapping matched chunks.
whether to merge overlapping matched chunks.

Definition Classes
MergeCommonParams
def setOrderingFeatures(m: Array[String]): ChunkMergeApproach.this.type

Definition Classes
MergePrioritizationParams
final def setOutputCol(value: String): ChunkMergeApproach.this.type

Definition Classes
HasOutputAnnotationCol
def setRegex(list: String*): ChunkMergeApproach.this.type
Sets the list of regexes to process the chunks.
Sets the list of regexes to process the chunks.

Definition Classes
FilteringParams
def setReplaceDictResource(path: String, readAs: Format = ReadAs.TEXT, options: Map[String, String] = Map("delimiter" -> ",")): ChunkMergeApproach.this.type
dictionary with regular expression patterns that match some protected entity
dictionary with regular expression patterns that match some protected entity

Definition Classes
MergeResourceParams
def setReplaceDictResource(path: ExternalResource): ChunkMergeApproach.this.type
dictionary with regular expression patterns that match some protected entity
dictionary with regular expression patterns that match some protected entity

Definition Classes
MergeResourceParams
def setResetSentenceIndices(value: Boolean): ChunkMergeApproach.this.type
Set whether to reset sentence indices to treat the entire output as if it originates from a single document.
Set whether to reset sentence indices to treat the entire output as if it originates from a single document.
When set to true, the metadata of each entity will be updated by assigning the sentence key a value of 0, effectively treating the entire output as if it comes from a single document, regardless of the original sentence boundaries. Default: False.

Definition Classes
ResetSentenceIndicesParam
def setSelectionStrategy(m: String): ChunkMergeApproach.this.type

Definition Classes
MergePrioritizationParams
def setWhiteList(list: String*): ChunkMergeApproach.this.type

Definition Classes
FilteringParams
def setWhiteList(list: Array[String]): ChunkMergeApproach.this.type
Sets the list of entities to process.
Sets the list of entities to process. The rest will be ignored. Do not include IOB prefix on labels.

Definition Classes
FilteringParams
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
Identifiable → AnyRef → Any
def train(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): ChunkMergeModel
Trains a model from the provided dataset.
Trains a model from the provided dataset. Input columns should be set to output columns from e.g. a NerDLModel and a RegexMatcher.

Definition Classes
ChunkMergeApproach → AnnotatorApproach
def transformEntitiesConfidenceResource(): Map[String, Float]

Attributes
protected
Definition Classes
MergeResourceParams
def transformFalsePositivesResource(): Array[(String, String, String)]

Attributes
protected
Definition Classes
MergeResourceParams
def transformReplaceDict(replaceDict: Array[(String, String)]): Map[String, String]
def transformReplaceDictResource(): Array[(String, String)]

Attributes
protected
Definition Classes
MergeResourceParams
final def transformSchema(schema: StructType): StructType

Definition Classes
AnnotatorApproach → PipelineStage
def transformSchema(schema: StructType, logging: Boolean): StructType

Attributes
protected
Definition Classes
PipelineStage
Annotations
@DeveloperApi()
val uid: String

Definition Classes
ChunkMergeApproach → Identifiable
def validate(schema: StructType): Boolean

Attributes
protected
Definition Classes
AnnotatorApproach
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
val whiteList: StringArrayParam
If defined, list of entities to process.
If defined, list of entities to process. The rest will be ignored. Does not include IOB prefix on labels (Default: Array())

Definition Classes
FilteringParams
def write: MLWriter

Definition Classes
DefaultParamsWritable → MLWritable

Packages

ChunkMergeApproach

class ChunkMergeApproach extends AnnotatorApproach[ChunkMergeModel] with CheckLicense with HasMultipleInputAnnotationCols with MergeResourceParams with MergeCommonParams with MergePrioritizationParams with HasFeatures with FilteringParams with HandleExceptionParams with ResetSentenceIndicesParam

Example

Instance Constructors

Type Members

Value Members

Inherited from ResetSentenceIndicesParam

Inherited from HandleExceptionParams

Inherited from FilteringParams

Inherited from HasFeatures

Inherited from MergePrioritizationParams

Inherited from MergeCommonParams

Inherited from MergeResourceParams

Inherited from HasMultipleInputAnnotationCols

Inherited from CheckLicense

Inherited from AnnotatorApproach[ChunkMergeModel]

Inherited from CanBeLazy

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from HasOutputAnnotatorType

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from Estimator[ChunkMergeModel]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Annotator types

Members

Parameter setters

Parameter getters

Packages

ChunkMergeApproach 

class ChunkMergeApproach extends AnnotatorApproach[ChunkMergeModel] with CheckLicense with HasMultipleInputAnnotationCols with MergeResourceParams with MergeCommonParams with MergePrioritizationParams with HasFeatures with FilteringParams with HandleExceptionParams with ResetSentenceIndicesParam

Example

Instance Constructors

Type Members

Value Members

Inherited from ResetSentenceIndicesParam

Inherited from HandleExceptionParams

Inherited from FilteringParams

Inherited from HasFeatures

Inherited from MergePrioritizationParams

Inherited from MergeCommonParams

Inherited from MergeResourceParams

Inherited from HasMultipleInputAnnotationCols

Inherited from CheckLicense

Inherited from AnnotatorApproach[ChunkMergeModel]

Inherited from CanBeLazy

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from HasOutputAnnotatorType

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from Estimator[ChunkMergeModel]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Annotator types

Members

Parameter setters

Parameter getters

ChunkMergeApproach