class DeIdentification extends AnnotatorApproach[DeIdentificationModel] with DeIdentificationParams with CheckLicense

Contains all the methods for training a DeIdentificationModel model. This module can obfuscate or mask the entities that contains personal information. These can be set with a file of regex patterns with setRegexPatternsDictionary, where each line is a mapping of entity to regex.

DATE \d{4}
AID \d{6,7}

Additionally, obfuscation strings can be defined with setObfuscateRefFile, where each line is a mapping of string to entity. The format and seperator can be speficied with setRefFileFormat and setRefSep.

Dr. Gregory House#DOCTOR
01010101#MEDICALRECORD

The configuration params for that module are in trait DeIdentificationParams.

See also

DeIdentificationModel

DeIdentificationParams

train Ideally this annotator works in conjunction with Demographic Named EntityRecognizers that can be trained either using TextMatchers, RegexMatchers, DateMatchers, NerCRFs or NerDLs Example of pipeline for deidentification.

Example

val documentAssembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val sentenceDetector = new SentenceDetector()
    .setInputCols(Array("document"))
    .setOutputCol("sentence")
    .setUseAbbreviations(true)

val tokenizer = new Tokenizer()
    .setInputCols(Array("sentence"))
    .setOutputCol("token")

val embeddings = WordEmbeddingsModel
    .pretrained("embeddings_clinical", "en", "clinical/models")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")

Ner entities

val clinical_sensitive_entities = MedicalNerModel.pretrained("ner_deid_enriched", "en", "clinical/models")
       .setInputCols(Array("sentence", "token", "embeddings")).setOutputCol("ner")

val nerConverter = new NerConverter()
    .setInputCols(Array("sentence", "token", "ner"))
    .setOutputCol("ner_con")

Deidentification

val deIdentification = new DeIdentification()
    .setInputCols(Array("ner_chunk", "token", "sentence"))
    .setOutputCol("dei")
    // file with custom regex patterns for custom entities
    .setRegexPatternsDictionary("path/to/dic_regex_patterns_main_categories.txt")
    // file with custom obfuscator names for the entities
    .setObfuscateRefFile("path/to/obfuscate_fixed_entities.txt")
    .setRefFileFormat("csv")
    .setRefSep("#")
    .setMode("obfuscate")
    .setDateFormats(Array("MM/dd/yy","yyyy-MM-dd"))
    .setObfuscateDate(true)
    .setDateTag("DATE")
    .setDays(5)
    .setObfuscateRefSource("file")

Pipeline

val data = Seq(
  "# 7194334 Date : 01/13/93 PCP : Oliveira , 25 years-old , Record date : 2079-11-09."
).toDF("text")

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  sentenceDetector,
  tokenizer,
  embeddings,
  clinical_sensitive_entities,
  nerConverter,
  deIdentification
))
val result = pipeline.fit(data).transform(data)

result.select("dei.result").show(truncate = false)

Show Results

result.select("dei.result").show(truncate = false)
+--------------------------------------------------------------------------------------------------+
|result                                                                                            |
+--------------------------------------------------------------------------------------------------+
|[# 01010101 Date : 01/18/93 PCP : Dr. Gregory House , <AGE> years-old , Record date : 2079-11-14.]|
+--------------------------------------------------------------------------------------------------+
Linear Supertypes
CheckLicense, DeIdentificationParams, AnnotatorApproach[DeIdentificationModel], CanBeLazy, DefaultParamsWritable, MLWritable, HasOutputAnnotatorType, HasOutputAnnotationCol, HasInputAnnotationCols, Estimator[DeIdentificationModel], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. DeIdentification
  2. CheckLicense
  3. DeIdentificationParams
  4. AnnotatorApproach
  5. CanBeLazy
  6. DefaultParamsWritable
  7. MLWritable
  8. HasOutputAnnotatorType
  9. HasOutputAnnotationCol
  10. HasInputAnnotationCols
  11. Estimator
  12. PipelineStage
  13. Logging
  14. Params
  15. Serializable
  16. Serializable
  17. Identifiable
  18. AnyRef
  19. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DeIdentification()
  2. new DeIdentification(uid: String)

    uid

    a unique identifier for the instanced Annotator

Type Members

  1. type AnnotatorType = String
    Definition Classes
    HasOutputAnnotatorType

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T
    Attributes
    protected
    Definition Classes
    Params
  4. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  5. def _fit(dataset: Dataset[_], recursiveStages: Option[PipelineModel]): DeIdentificationModel
    Attributes
    protected
    Definition Classes
    AnnotatorApproach
  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. def beforeTraining(spark: SparkSession): Unit
    Definition Classes
    AnnotatorApproach
  8. final def checkSchema(schema: StructType, inputAnnotatorType: String): Boolean
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  9. def checkValidEnvironment(spark: Option[SparkContext]): Unit
    Definition Classes
    CheckLicense
  10. def checkValidScope(scope: String): Unit
    Definition Classes
    CheckLicense
  11. def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkContext], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  12. final def clear(param: Param[_]): DeIdentification.this.type
    Definition Classes
    Params
  13. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  14. val consistentObfuscation: BooleanParam

    Whether to replace very similar entities in a document with the same randomized term (default: true) The similarity is based on the Levenshtein Distance between the words.

    Whether to replace very similar entities in a document with the same randomized term (default: true) The similarity is based on the Levenshtein Distance between the words.

    Definition Classes
    DeIdentificationParams
  15. final def copy(extra: ParamMap): Estimator[DeIdentificationModel]
    Definition Classes
    AnnotatorApproach → Estimator → PipelineStage → Params
  16. def copyValues[T <: Params](to: T, extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  17. val dateFormats: StringArrayParam

    Format of dates to displace

    Format of dates to displace

    Definition Classes
    DeIdentificationParams
  18. val dateTag: Param[String]

    Tag representing what are the NER entity (default: DATE)

    Tag representing what are the NER entity (default: DATE)

    Definition Classes
    DeIdentificationParams
  19. val dateToYear: BooleanParam

    true if dates must be converted to years, false otherwise

    true if dates must be converted to years, false otherwise

    Definition Classes
    DeIdentificationParams
  20. val days: IntParam

    Number of days to obfuscate the dates by displacement.

    Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used

    Definition Classes
    DeIdentificationParams
  21. final def defaultCopy[T <: Params](extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  22. val description: String
    Definition Classes
    DeIdentification → AnnotatorApproach
  23. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  24. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  25. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  26. def explainParams(): String
    Definition Classes
    Params
  27. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  28. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  29. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  30. final def fit(dataset: Dataset[_]): DeIdentificationModel
    Definition Classes
    AnnotatorApproach → Estimator
  31. def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[DeIdentificationModel]
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  32. def fit(dataset: Dataset[_], paramMap: ParamMap): DeIdentificationModel
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  33. def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DeIdentificationModel
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" ) @varargs()
  34. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  35. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  36. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  37. def getInputCols: Array[String]
    Definition Classes
    HasInputAnnotationCols
  38. def getLazyAnnotator: Boolean
    Definition Classes
    CanBeLazy
  39. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  40. final def getOutputCol: String
    Definition Classes
    HasOutputAnnotationCol
  41. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  42. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  43. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  44. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  45. val ignoreRegex: BooleanParam

    Select if you want to use regex file loaded in the model.

    Select if you want to use regex file loaded in the model. If true the default regex file will be not used The default value is false.

    Definition Classes
    DeIdentificationParams
  46. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  47. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  48. val inputAnnotatorTypes: Array[AnnotatorType]

    Input annotator type: DOCUMENT, TOKEN, CHUNK

    Input annotator type: DOCUMENT, TOKEN, CHUNK

    Definition Classes
    DeIdentification → HasInputAnnotationCols
  49. final val inputCols: StringArrayParam
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  50. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  51. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  52. def isLp(): Boolean
    Definition Classes
    CheckLicense
  53. val isRandomDateDisplacement: BooleanParam

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities,if false use the DeIdentificationParams.days The default value is false.

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities,if false use the DeIdentificationParams.days The default value is false.

    Definition Classes
    DeIdentificationParams
  54. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  55. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  56. val lazyAnnotator: BooleanParam
    Definition Classes
    CanBeLazy
  57. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  58. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  59. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  60. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  61. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  62. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  63. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  64. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  65. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  66. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  67. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  68. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  69. val mappingsColumn: Param[String]

    This is the mapping column that will return the Annotations chunks with the fake entities

    This is the mapping column that will return the Annotations chunks with the fake entities

    Definition Classes
    DeIdentificationParams
  70. val minYear: IntParam

    Minimum year to use when converting date to year

    Minimum year to use when converting date to year

    Definition Classes
    DeIdentificationParams
  71. val mode: Param[String]

    Mode for Anonymizer ['mask'|'obfuscate'] Given the following text

    Mode for Anonymizer ['mask'|'obfuscate'] Given the following text

    "David Hale visited EEUU a couple of years ago"

    Mask mode: The entities will be replaced by their entity types. Example "<PERSON> visited <COUNTRY> a couple of years ago"

    Obfuscate mode:

    The entity is replaced by a obfuscator term:

    "Bryan Johnson visited Japon a couple of years ago"

    Definition Classes
    DeIdentificationParams
  72. def msgHelper(schema: StructType): String
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  73. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  74. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  75. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  76. val obfuscateDate: BooleanParam

    When mode=="obfuscate" whether to obfuscate dates or not.

    When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to true, make sure dateFormats param fits the needs (default: false) WHen setting to 'false' then the date will be mask to <DATE>

    Definition Classes
    DeIdentificationParams
  77. val obfuscateRefFile: Param[String]

    File with the terms to be used for Obfuscation

  78. val obfuscateRefSource: Param[String]

    The source of obfuscation of to obfuscate the entities.For dates entities doesnt apply tha method.

    The source of obfuscation of to obfuscate the entities.For dates entities doesnt apply tha method. The values ar the following: 'file': Takes the entities from the obfuscatorRefFile 'faker': Takes the entities from the Faker module 'both': Takes the entities from the obfuscatorRefFile and the faker module randomly.

    Definition Classes
    DeIdentificationParams
  79. def onTrained(model: DeIdentificationModel, spark: SparkSession): Unit
    Definition Classes
    AnnotatorApproach
  80. val optionalInputAnnotatorTypes: Array[String]
    Definition Classes
    HasInputAnnotationCols
  81. val outputAnnotatorType: AnnotatorType

    Output annotator types: DOCUMENT

    Output annotator types: DOCUMENT

    Definition Classes
    DeIdentification → HasOutputAnnotatorType
  82. final val outputCol: Param[String]
    Attributes
    protected
    Definition Classes
    HasOutputAnnotationCol
  83. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  84. val refFileFormat: Param[String]

    Format of the reference file for Obfuscation the default value for that is "csv"

  85. val refSep: Param[String]

    Separator character for the csv reference file for Obfuscation de default value is "#"

  86. val regexOverride: BooleanParam

    If is true prioritize the regex entities, if is false prioritize the ner.

    If is true prioritize the regex entities, if is false prioritize the ner. The default value is false.

    Definition Classes
    DeIdentificationParams
  87. val regexPatternsDictionary: ExternalResourceParam

    dictionary with regular expression patterns that match some protected entity if the dictionary in not setting up we will use the default regex file.

  88. val returnEntityMappings: BooleanParam

    With this property you select if you want to return mapping column

    With this property you select if you want to return mapping column

    Definition Classes
    DeIdentificationParams
  89. val sameEntityThreshold: DoubleParam

    Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same (default: 0.9) For date entities this method doesn't apply.

    Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same (default: 0.9) For date entities this method doesn't apply.

    Definition Classes
    DeIdentificationParams
  90. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  91. val seed: IntParam

    It is the seed to select the entities on obfuscate mode.With the seed you can reply a execution several times with the same ouptut.

    It is the seed to select the entities on obfuscate mode.With the seed you can reply a execution several times with the same ouptut.

    Definition Classes
    DeIdentificationParams
  92. final def set(paramPair: ParamPair[_]): DeIdentification.this.type
    Attributes
    protected
    Definition Classes
    Params
  93. final def set(param: String, value: Any): DeIdentification.this.type
    Attributes
    protected
    Definition Classes
    Params
  94. final def set[T](param: Param[T], value: T): DeIdentification.this.type
    Definition Classes
    Params
  95. def setConsistentObfuscation(s: Boolean): DeIdentification.this.type

    Whether to replace very similar entities in a document with the same randomized term (default: true) The similarity is based on the Levenshtein Distance between the words.

    Whether to replace very similar entities in a document with the same randomized term (default: true) The similarity is based on the Levenshtein Distance between the words.

    Definition Classes
    DeIdentificationParams
  96. def setDateFormats(s: Array[String]): DeIdentification.this.type

    Format of dates to displace

    Format of dates to displace

    Definition Classes
    DeIdentificationParams
  97. def setDateTag(s: String): DeIdentification.this.type

    Tag representing what are the NER entity (default: DATE)

    Tag representing what are the NER entity (default: DATE)

    Definition Classes
    DeIdentificationParams
  98. def setDateToYear(s: Boolean): DeIdentification.this.type

    true if dates must be converted to years, false otherwise

    true if dates must be converted to years, false otherwise

    Definition Classes
    DeIdentificationParams
  99. def setDays(k: Int): DeIdentification.this.type

    Number of days to obfuscate the dates by displacement.

    Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used

    Definition Classes
    DeIdentificationParams
  100. final def setDefault(paramPairs: ParamPair[_]*): DeIdentification.this.type
    Attributes
    protected
    Definition Classes
    Params
  101. final def setDefault[T](param: Param[T], value: T): DeIdentification.this.type
    Attributes
    protected
    Definition Classes
    Params
  102. def setIgnoreRegex(s: Boolean): DeIdentification.this.type

    Select if you want to use regex file loaded in the model.

    Select if you want to use regex file loaded in the model. If true the default regex file will be not used The default value is false.

    Definition Classes
    DeIdentificationParams
  103. final def setInputCols(value: String*): DeIdentification.this.type
    Definition Classes
    HasInputAnnotationCols
  104. final def setInputCols(value: Array[String]): DeIdentification.this.type
    Definition Classes
    HasInputAnnotationCols
  105. def setIsRandomDateDisplacement(s: Boolean): DeIdentification.this.type

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities, if false use the DeIdentificationParams.days The default value is false.

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities, if false use the DeIdentificationParams.days The default value is false.

    Definition Classes
    DeIdentificationParams
  106. def setLazyAnnotator(value: Boolean): DeIdentification.this.type
    Definition Classes
    CanBeLazy
  107. def setMappingsColumn(s: String): DeIdentification.this.type

    This is the mapping column that will return the Annotations chunks with the fake entities

    This is the mapping column that will return the Annotations chunks with the fake entities

    Definition Classes
    DeIdentificationParams
  108. def setMinYear(s: Int): DeIdentification.this.type

    Minimum year to use when converting date to year

    Minimum year to use when converting date to year

    Definition Classes
    DeIdentificationParams
  109. def setMode(m: String): DeIdentification.this.type

    Mode for Anonymizer ['mask'|'obfuscate'] Given the following text

    Mode for Anonymizer ['mask'|'obfuscate'] Given the following text

    "David Hale visited EEUU a couple of years ago"

    Mask mode: The entities will be replaced by their entity types. Example "<PERSON> visited <COUNTRY> a couple of years ago"

    Obfuscate mode:

    The entity is replaced by an obfuscator's term:

    "Bryan Johnson visited Japon a couple of years ago"

    Definition Classes
    DeIdentificationParams
  110. def setObfuscateDate(s: Boolean): DeIdentification.this.type

    When mode=="obfuscate" whether to obfuscate dates or not.

    When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to true, make sure dateFormats param fits the needs (default: false) WHen setting to 'false' then the date will be mask to <DATE>

    Definition Classes
    DeIdentificationParams
  111. def setObfuscateRefFile(f: String): DeIdentification.this.type

    File with the terms to be used for Obfuscation

  112. def setObfuscateRefSource(s: String): DeIdentification.this.type

    The source of obfuscation of to obfuscate the entities.For dates entities doesnt apply tha method.

    The source of obfuscation of to obfuscate the entities.For dates entities doesnt apply tha method. The values ar the following: 'file': Takes the entities from the obfuscatorRefFile 'faker': Takes the entities from the Faker module 'both': Takes the entities from the obfuscatorRefFile and the faker module randomly.

    Definition Classes
    DeIdentificationParams
  113. final def setOutputCol(value: String): DeIdentification.this.type
    Definition Classes
    HasOutputAnnotationCol
  114. def setRefFileFormat(f: String): DeIdentification.this.type

    File with the terms to be used for Obfuscation

  115. def setRefSep(f: String): DeIdentification.this.type

    Separator character for the csv reference file for Obfuscation de default value is "#"

  116. def setRegexOverride(s: Boolean): DeIdentification.this.type

    If true prioritize the regex if false prioritize the ner.

    If true prioritize the regex if false prioritize the ner. The default value is false.

    Definition Classes
    DeIdentificationParams
  117. def setRegexPatternsDictionary(path: String, readAs: Format = ReadAs.TEXT, options: Map[String, String] = Map()): DeIdentification.this.type

    dictionary with regular expression patterns that match some protected entity.When the field is not set then a default regex file will be used.

    dictionary with regular expression patterns that match some protected entity.When the field is not set then a default regex file will be used.

    path

    the string path where the file is allocated.

    readAs

    Format of the the reader

    options

    options to apply to the reader.

  118. def setRegexPatternsDictionary(path: ExternalResource): DeIdentification.this.type

    dictionary with regular expression patterns that match some protected entity.When the field is not set then a default regex file will be used.

    dictionary with regular expression patterns that match some protected entity.When the field is not set then a default regex file will be used.

    path

    the external resource where the file is allocated.

    See also

    ExternalResource

  119. def setReturnEntityMappings(s: Boolean): DeIdentification.this.type

    With this property you select if you want to return mapping column.

    With this property you select if you want to return mapping column.

    Definition Classes
    DeIdentificationParams
  120. def setSameEntityThreshold(s: Double): DeIdentification.this.type

    Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same (default: 0.9) For date entities this method doesn't apply.

    Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same (default: 0.9) For date entities this method doesn't apply.

    Definition Classes
    DeIdentificationParams
  121. def setSeed(s: Int): DeIdentification.this.type

    It is the seed to select the entities on obfuscate mode.With the seed you can reply a execution several times with the same ouptut.

    It is the seed to select the entities on obfuscate mode.With the seed you can reply a execution several times with the same ouptut.

    Definition Classes
    DeIdentificationParams
  122. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  123. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  124. def train(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): DeIdentificationModel

    Returns the DeIdentificationModel Transformer, that can be used to transform input datasets

    Returns the DeIdentificationModel Transformer, that can be used to transform input datasets

    The dataset provided to the fit method should have one chunk per row and contain the following columns: Document, Tokens, Chunks

    This method is called inside the AnnotatorApproach's fit method

    dataset

    a Dataset containing ChunkTokens, ChunkEmbeddings, ClassifierLabel, ResolverLabel, [ResolverNormalized]

    recursivePipeline

    an instance of PipelineModel

    returns

    a trained DeIdentificationModel

    Definition Classes
    DeIdentification → AnnotatorApproach
  125. def transformRegexPatternsDictionary(regexPatternsDictionary: Array[(String, String)]): Map[String, Array[String]]
  126. final def transformSchema(schema: StructType): StructType
    Definition Classes
    AnnotatorApproach → PipelineStage
  127. def transformSchema(schema: StructType, logging: Boolean): StructType
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  128. val uid: String
    Definition Classes
    DeIdentification → Identifiable
  129. def validate(schema: StructType): Boolean
    Attributes
    protected
    Definition Classes
    AnnotatorApproach
  130. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  131. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  132. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  133. def write: MLWriter
    Definition Classes
    DefaultParamsWritable → MLWritable

Inherited from CheckLicense

Inherited from DeIdentificationParams

Inherited from AnnotatorApproach[DeIdentificationModel]

Inherited from CanBeLazy

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from HasOutputAnnotatorType

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from Estimator[DeIdentificationModel]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Annotator types

Required input and expected output annotator types

Members

Parameter setters