class DeIdentificationModel extends AnnotatorModel[DeIdentificationModel] with DeIdentificationParams with HasSimpleAnnotate[DeIdentificationModel] with CheckLicense

Contains all the parameters to transform a dataset with three Input Annotations of types DOCUMENT, TOKEN and CHUNK, into its DeIdentified version of by either masking or obfuscating the given CHUNKS.

To create an configured DeIdentificationModel, please see the example of DeIdentification.

See also

DeIdentification to train your own model

Linear Supertypes
CheckLicense, HasSimpleAnnotate[DeIdentificationModel], DeIdentificationParams, AnnotatorModel[DeIdentificationModel], CanBeLazy, RawAnnotator[DeIdentificationModel], HasOutputAnnotationCol, HasInputAnnotationCols, HasOutputAnnotatorType, ParamsAndFeaturesWritable, HasFeatures, DefaultParamsWritable, MLWritable, Model[DeIdentificationModel], Transformer, PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. DeIdentificationModel
  2. CheckLicense
  3. HasSimpleAnnotate
  4. DeIdentificationParams
  5. AnnotatorModel
  6. CanBeLazy
  7. RawAnnotator
  8. HasOutputAnnotationCol
  9. HasInputAnnotationCols
  10. HasOutputAnnotatorType
  11. ParamsAndFeaturesWritable
  12. HasFeatures
  13. DefaultParamsWritable
  14. MLWritable
  15. Model
  16. Transformer
  17. PipelineStage
  18. Logging
  19. Params
  20. Serializable
  21. Serializable
  22. Identifiable
  23. AnyRef
  24. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DeIdentificationModel()
  2. new DeIdentificationModel(uid: String)

    uid

    a unique identifier for the instanced AnnotatorModel

Type Members

  1. type AnnotationContent = Seq[Row]
    Attributes
    protected
    Definition Classes
    AnnotatorModel
  2. type AnnotatorType = String
    Definition Classes
    HasOutputAnnotatorType
  3. implicit class StringReplacement extends AnyRef

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T
    Attributes
    protected
    Definition Classes
    Params
  4. def $$[T](feature: StructFeature[T]): T
    Attributes
    protected
    Definition Classes
    HasFeatures
  5. def $$[K, V](feature: MapFeature[K, V]): Map[K, V]
    Attributes
    protected
    Definition Classes
    HasFeatures
  6. def $$[T](feature: SetFeature[T]): Set[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  7. def $$[T](feature: ArrayFeature[T]): Array[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  8. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  9. def _transform(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): DataFrame
    Attributes
    protected
    Definition Classes
    AnnotatorModel
  10. def afterAnnotate(dataset: DataFrame): DataFrame
    Definition Classes
    DeIdentificationModel → AnnotatorModel
  11. val allTerms: MapFeature[String, List[String]]

    dictionary, which contains all terms for using later in anonimization function

  12. def annotate(annotations: Seq[Annotation]): Seq[Annotation]

    annotations

    The annotations per row that we need to obfuscate the document. Annotations should be DOCUMENT, TOKEN, CHUNK. The annotations of kind TOKEN or CHUNK will be have sentence number in the metadata in any of the annotations of kind Document. If the TOKEN or CHUNK have a sentence number in metadata longer that the sentence number on the document annotations the annotator should throw and exception

    returns

    The annotations of kind Document masked or obfuscated.

    Definition Classes
    DeIdentificationModel → HasSimpleAnnotate
  13. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  14. def beforeAnnotate(dataset: Dataset[_]): Dataset[_]

    This method represents the pipeline method which calls each method one by one It utilizes the main point of interest which is getAnonymizeSentence() and calls it for each sentence

    This method represents the pipeline method which calls each method one by one It utilizes the main point of interest which is getAnonymizeSentence() and calls it for each sentence

    returns

    a Sequence of Anonimized Annotations

    Definition Classes
    DeIdentificationModel → AnnotatorModel
  15. final def checkSchema(schema: StructType, inputAnnotatorType: String): Boolean
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  16. def checkValidEnvironment(spark: Option[SparkContext]): Unit
    Definition Classes
    CheckLicense
  17. def checkValidScope(scope: String): Unit
    Definition Classes
    CheckLicense
  18. def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkContext], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  19. def chunkFlexibleEquals(chunkA: Annotation, chunkB: Annotation): Boolean
  20. def chunkSameEntity(chunkA: Annotation, chunkB: Annotation): Boolean
  21. final def clear(param: Param[_]): DeIdentificationModel.this.type
    Definition Classes
    Params
  22. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  23. val consistentObfuscation: BooleanParam

    Whether to replace very similar entities in a document with the same randomized term (default: true) The similarity is based on the Levenshtein Distance between the words.

    Whether to replace very similar entities in a document with the same randomized term (default: true) The similarity is based on the Levenshtein Distance between the words.

    Definition Classes
    DeIdentificationParams
  24. def copy(extra: ParamMap): DeIdentificationModel
    Definition Classes
    RawAnnotator → Model → Transformer → PipelineStage → Params
  25. def copyValues[T <: Params](to: T, extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  26. def createAnonymizeAnnotation(anonymizeSentence: Sentence, offset: Int, idx: Int): Annotation

    The method that takes anonymized sentence to create proper Annotation

    The method that takes anonymized sentence to create proper Annotation

    anonymizeSentence

    a sentence, which is anonymized

    idx

    a index of the sentence

    returns

    a proper Annotation instance

  27. val dateFormats: StringArrayParam

    Format of dates to displace

    Format of dates to displace

    Definition Classes
    DeIdentificationParams
  28. val dateTag: Param[String]

    Tag representing what are the NER entity (default: DATE)

    Tag representing what are the NER entity (default: DATE)

    Definition Classes
    DeIdentificationParams
  29. val dateToYear: BooleanParam

    true if dates must be converted to years, false otherwise

    true if dates must be converted to years, false otherwise

    Definition Classes
    DeIdentificationParams
  30. val days: IntParam

    Number of days to obfuscate the dates by displacement.

    Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used

    Definition Classes
    DeIdentificationParams
  31. final def defaultCopy[T <: Params](extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  32. def dfAnnotate: UserDefinedFunction
    Definition Classes
    HasSimpleAnnotate
  33. def displaceMappings(annotation: Annotation, offset: Int): Annotation

    The method that takes Mapping and displace de begin and end based in the of set

    The method that takes Mapping and displace de begin and end based in the of set

    annotation

    a sentence, which is anonymized

    offset

    a index of the sentence

    returns

    a proper Annotation instance

  34. def duplicateClean(entities: Seq[Annotation], entitiesToCompare: Seq[Annotation]): Seq[Annotation]
  35. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  36. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  37. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  38. def explainParams(): String
    Definition Classes
    Params
  39. def extraValidate(structType: StructType): Boolean
    Attributes
    protected
    Definition Classes
    RawAnnotator
  40. def extraValidateMsg: String
    Attributes
    protected
    Definition Classes
    RawAnnotator
  41. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  42. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  43. val features: ArrayBuffer[Feature[_, _, _]]
    Definition Classes
    HasFeatures
  44. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  45. def get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]
    Definition Classes
    DeIdentificationModel → HasFeatures
  46. def get[T](feature: StructFeature[T]): Option[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  47. def get[T](feature: SetFeature[T]): Option[Set[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  48. def get[T](feature: ArrayFeature[T]): Option[Array[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  49. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  50. def getAllTerms: Map[String, List[String]]

    dictionary, which contains all terms for using later in anonimization function

  51. def getAnonymizeSentence(sentence: String, protectedEntities: Seq[Annotation], dateTag: String = "DATE", days: Int = 0, dateToYear: Boolean = false, minYear: Int = 1929): (String, Seq[Annotation])

    Main point of interest.

    Main point of interest. This method projects the sentence into the anonymized form This method is called for each sentence in the input collection of Annotations

    sentence

    a sentence, which we want to anonymize

    protectedEntities

    a sequence of Entities which we want to anonymize

    dateTag

    a String which represents the value with which we replace dates

    days

    a Int which represents how many days back we look at

    dateToYear

    a flag whether we use displaceDate() or apply dateToYear() method

    minYear

    a minimum date from which all obfuscated date start, default is 1929

    returns

    a String, which represents an anonymized sentence

  52. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  53. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  54. def getFakersEntity(entity: String): Seq[String]
  55. def getInputCols: Array[String]
    Definition Classes
    HasInputAnnotationCols
  56. def getLazyAnnotator: Boolean
    Definition Classes
    CanBeLazy
  57. def getNearTokens(tokenizedSentence: Seq[IndexedToken], count: Int, ngrams: Int = 2): (String, String)
  58. def getNerEntitiesBySentence(annotations: Seq[Annotation], sentenceCount: Int): Seq[Seq[Annotation]]

    Returns the NER Annotations for each Annotation instance in the input Sequence

    Returns the NER Annotations for each Annotation instance in the input Sequence

    annotations

    a Sequence of Annotation instances

    returns

    a Sequence of Sequence[IndexedToken], each Sequence represents tokens from each input Annotation

  59. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  60. final def getOutputCol: String
    Definition Classes
    HasOutputAnnotationCol
  61. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  62. def getRegexEntities(tokensSentences: Seq[IndexedToken], idx: Int): Seq[Annotation]

    Returns the Regex Annotations for each IndexedToken in the input Sequence

    Returns the Regex Annotations for each IndexedToken in the input Sequence

    tokensSentences

    a Sequence of IndexedToken instances

    returns

    a Sequence of Annotation, each Annotation represents Regex Entity

  63. def getRegexPatternsDictionary: Map[String, Array[String]]

    dictionary with regular expression patterns that match some protected entity

  64. def getSeed(): Int
  65. def getSentences(annotations: Seq[Annotation]): Seq[Sentence]

    Returns the content of each sentence inside the input sequence

    Returns the content of each sentence inside the input sequence

    annotations

    a Sequence of Annotation instances, to return content from

    returns

    a Sequence of Sentence

  66. def getTokensBySentence(annotations: Seq[Annotation]): Seq[Seq[IndexedToken]]

    Returns the tokens for each Annotation instance in the input Sequence

    Returns the tokens for each Annotation instance in the input Sequence

    annotations

    a Sequence of Annotation instances

    returns

    a Sequence of Sequence[IndexedToken], each Sequence represents tokens from each input Annotation

  67. def handleEntitiesDifferences(AEntities: Seq[Annotation], BEntities: Seq[Annotation]): Seq[Annotation]

    Returns a complement of A entities against B entities

    Returns a complement of A entities against B entities

    AEntities

    a sequence of Entities to combine

    BEntities

    an sequence of Entities to combine

    returns

    a Sequence of Annotation, which is difference between NER and RegEx

  68. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  69. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  70. def hasParent: Boolean
    Definition Classes
    Model
  71. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  72. val ignoreRegex: BooleanParam

    Select if you want to use regex file loaded in the model.

    Select if you want to use regex file loaded in the model. If true the default regex file will be not used The default value is false.

    Definition Classes
    DeIdentificationParams
  73. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  74. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  75. val inputAnnotatorTypes: Array[AnnotatorType]

    Input annotator type: DOCUMENT, TOKEN, CHUNK

    Input annotator type: DOCUMENT, TOKEN, CHUNK

    Definition Classes
    DeIdentificationModel → HasInputAnnotationCols
  76. final val inputCols: StringArrayParam
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  77. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  78. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  79. def isLp(): Boolean
    Definition Classes
    CheckLicense
  80. val isRandomDateDisplacement: BooleanParam

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities,if false use the DeIdentificationParams.days The default value is false.

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities,if false use the DeIdentificationParams.days The default value is false.

    Definition Classes
    DeIdentificationParams
  81. def isRegexMatch(nerTokens: (String, String), token: String, regexPatterns: Array[String]): Boolean

    Returns Boolean flag, which says if the token matches at least one pattern from array

    Returns Boolean flag, which says if the token matches at least one pattern from array

    token

    a token of interest to check for the match

    regexPatterns

    an Array of String to check against the token

    returns

    a Boolean flag, representing if the token matches at least pattern one of regexPatterns

  82. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  83. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  84. val lazyAnnotator: BooleanParam
    Definition Classes
    CanBeLazy
  85. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  86. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  87. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  88. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  89. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  90. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  91. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  92. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  93. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  94. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  95. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  96. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  97. val mappingsColumn: Param[String]

    This is the mapping column that will return the Annotations chunks with the fake entities

    This is the mapping column that will return the Annotations chunks with the fake entities

    Definition Classes
    DeIdentificationParams
  98. def mergeEntities(nerEntities: Seq[Annotation], regexEntities: Seq[Annotation], regexOverride: Boolean = false): Seq[Annotation]

    Returns a combined Sequence of Annotations, cleaned from duplicates

    Returns a combined Sequence of Annotations, cleaned from duplicates

    nerEntities

    a sequence of NER Entities to combine

    regexEntities

    an sequence of Regex Entities to combine

    returns

    a Sequence of Annotation, which is result of a merge without duplicates

  99. val minYear: IntParam

    Minimum year to use when converting date to year

    Minimum year to use when converting date to year

    Definition Classes
    DeIdentificationParams
  100. val mode: Param[String]

    Mode for Anonymizer ['mask'|'obfuscate'] Given the following text

    Mode for Anonymizer ['mask'|'obfuscate'] Given the following text

    "David Hale visited EEUU a couple of years ago"

    Mask mode: The entities will be replaced by their entity types. Example "<PERSON> visited <COUNTRY> a couple of years ago"

    Obfuscate mode:

    The entity is replaced by a obfuscator term:

    "Bryan Johnson visited Japon a couple of years ago"

    Definition Classes
    DeIdentificationParams
  101. def msgHelper(schema: StructType): String
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  102. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  103. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  104. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  105. val obfuscateDate: BooleanParam

    When mode=="obfuscate" whether to obfuscate dates or not.

    When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to true, make sure dateFormats param fits the needs (default: false) WHen setting to 'false' then the date will be mask to <DATE>

    Definition Classes
    DeIdentificationParams
  106. val obfuscateRefSource: Param[String]

    The source of obfuscation of to obfuscate the entities.For dates entities doesnt apply tha method.

    The source of obfuscation of to obfuscate the entities.For dates entities doesnt apply tha method. The values ar the following: 'file': Takes the entities from the obfuscatorRefFile 'faker': Takes the entities from the Faker module 'both': Takes the entities from the obfuscatorRefFile and the faker module randomly.

    Definition Classes
    DeIdentificationParams
  107. def onWrite(path: String, spark: SparkSession): Unit
    Attributes
    protected
    Definition Classes
    ParamsAndFeaturesWritable
  108. val optionalInputAnnotatorTypes: Array[String]
    Definition Classes
    HasInputAnnotationCols
  109. val outputAnnotatorType: AnnotatorType

    Output annotator types: DOCUMENT

    Output annotator types: DOCUMENT

    Definition Classes
    DeIdentificationModel → HasOutputAnnotatorType
  110. final val outputCol: Param[String]
    Attributes
    protected
    Definition Classes
    HasOutputAnnotationCol
  111. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  112. var parent: Estimator[DeIdentificationModel]
    Definition Classes
    Model
  113. val regexOverride: BooleanParam

    If is true prioritize the regex entities, if is false prioritize the ner.

    If is true prioritize the regex entities, if is false prioritize the ner. The default value is false.

    Definition Classes
    DeIdentificationParams
  114. val regexPatternsDictionary: MapFeature[String, Array[String]]

    dictionary with regular expression patterns that match some protected entity

  115. def replaceRegExFlavors(word: String): String

    This is simple RegEx replace which removes some punctuation tokens from input

    This is simple RegEx replace which removes some punctuation tokens from input

    word

    a String, inside which we want to replace flavors

    returns

    a String, which represents a cleaned version

  116. val returnEntityMappings: BooleanParam

    With this property you select if you want to return mapping column

    With this property you select if you want to return mapping column

    Definition Classes
    DeIdentificationParams
  117. val sameEntityThreshold: DoubleParam

    Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same (default: 0.9) For date entities this method doesn't apply.

    Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same (default: 0.9) For date entities this method doesn't apply.

    Definition Classes
    DeIdentificationParams
  118. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  119. val seed: IntParam

    It is the seed to select the entities on obfuscate mode.With the seed you can reply a execution several times with the same ouptut.

    It is the seed to select the entities on obfuscate mode.With the seed you can reply a execution several times with the same ouptut.

    Definition Classes
    DeIdentificationParams
  120. def set[T](feature: StructFeature[T], value: T): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  121. def set[K, V](feature: MapFeature[K, V], value: Map[K, V]): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  122. def set[T](feature: SetFeature[T], value: Set[T]): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  123. def set[T](feature: ArrayFeature[T], value: Array[T]): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  124. final def set(paramPair: ParamPair[_]): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    Params
  125. final def set(param: String, value: Any): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    Params
  126. final def set[T](param: Param[T], value: T): DeIdentificationModel.this.type
    Definition Classes
    Params
  127. def setAllTerms(value: Map[String, List[String]]): DeIdentificationModel.this.type

    dictionary, which contains all terms for using later in anonimization function

  128. def setConsistentObfuscation(s: Boolean): DeIdentificationModel.this.type

    Whether to replace very similar entities in a document with the same randomized term (default: true) The similarity is based on the Levenshtein Distance between the words.

    Whether to replace very similar entities in a document with the same randomized term (default: true) The similarity is based on the Levenshtein Distance between the words.

    Definition Classes
    DeIdentificationParams
  129. def setDateFormats(s: Array[String]): DeIdentificationModel.this.type

    Format of dates to displace

    Format of dates to displace

    Definition Classes
    DeIdentificationParams
  130. def setDateTag(s: String): DeIdentificationModel.this.type

    Tag representing what are the NER entity (default: DATE)

    Tag representing what are the NER entity (default: DATE)

    Definition Classes
    DeIdentificationParams
  131. def setDateToYear(s: Boolean): DeIdentificationModel.this.type

    true if dates must be converted to years, false otherwise

    true if dates must be converted to years, false otherwise

    Definition Classes
    DeIdentificationParams
  132. def setDays(k: Int): DeIdentificationModel.this.type

    Number of days to obfuscate the dates by displacement.

    Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used

    Definition Classes
    DeIdentificationParams
  133. def setDefault[T](feature: StructFeature[T], value: () ⇒ T): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  134. def setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  135. def setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  136. def setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  137. final def setDefault(paramPairs: ParamPair[_]*): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    Params
  138. final def setDefault[T](param: Param[T], value: T): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    Params
  139. def setIgnoreRegex(s: Boolean): DeIdentificationModel.this.type

    Select if you want to use regex file loaded in the model.

    Select if you want to use regex file loaded in the model. If true the default regex file will be not used The default value is false.

    Definition Classes
    DeIdentificationParams
  140. final def setInputCols(value: String*): DeIdentificationModel.this.type
    Definition Classes
    HasInputAnnotationCols
  141. final def setInputCols(value: Array[String]): DeIdentificationModel.this.type
    Definition Classes
    HasInputAnnotationCols
  142. def setIsRandomDateDisplacement(s: Boolean): DeIdentificationModel.this.type

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities, if false use the DeIdentificationParams.days The default value is false.

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities, if false use the DeIdentificationParams.days The default value is false.

    Definition Classes
    DeIdentificationParams
  143. def setLazyAnnotator(value: Boolean): DeIdentificationModel.this.type
    Definition Classes
    CanBeLazy
  144. def setMappingsColumn(s: String): DeIdentificationModel.this.type

    This is the mapping column that will return the Annotations chunks with the fake entities

    This is the mapping column that will return the Annotations chunks with the fake entities

    Definition Classes
    DeIdentificationParams
  145. def setMinYear(s: Int): DeIdentificationModel.this.type

    Minimum year to use when converting date to year

    Minimum year to use when converting date to year

    Definition Classes
    DeIdentificationParams
  146. def setMode(m: String): DeIdentificationModel.this.type

    Mode for Anonymizer ['mask'|'obfuscate'] Given the following text

    Mode for Anonymizer ['mask'|'obfuscate'] Given the following text

    "David Hale visited EEUU a couple of years ago"

    Mask mode: The entities will be replaced by their entity types. Example "<PERSON> visited <COUNTRY> a couple of years ago"

    Obfuscate mode:

    The entity is replaced by an obfuscator's term:

    "Bryan Johnson visited Japon a couple of years ago"

    Definition Classes
    DeIdentificationParams
  147. def setObfuscateDate(s: Boolean): DeIdentificationModel.this.type

    When mode=="obfuscate" whether to obfuscate dates or not.

    When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to true, make sure dateFormats param fits the needs (default: false) WHen setting to 'false' then the date will be mask to <DATE>

    Definition Classes
    DeIdentificationParams
  148. def setObfuscateRefSource(s: String): DeIdentificationModel.this.type

    The source of obfuscation of to obfuscate the entities.For dates entities doesnt apply tha method.

    The source of obfuscation of to obfuscate the entities.For dates entities doesnt apply tha method. The values ar the following: 'file': Takes the entities from the obfuscatorRefFile 'faker': Takes the entities from the Faker module 'both': Takes the entities from the obfuscatorRefFile and the faker module randomly.

    Definition Classes
    DeIdentificationParams
  149. final def setOutputCol(value: String): DeIdentificationModel.this.type
    Definition Classes
    HasOutputAnnotationCol
  150. def setParent(parent: Estimator[DeIdentificationModel]): DeIdentificationModel
    Definition Classes
    Model
  151. def setRegexOverride(s: Boolean): DeIdentificationModel.this.type

    If true prioritize the regex if false prioritize the ner.

    If true prioritize the regex if false prioritize the ner. The default value is false.

    Definition Classes
    DeIdentificationParams
  152. def setRegexPatternsDictionary(value: Map[String, Array[String]]): DeIdentificationModel.this.type

    dictionary with regular expression patterns that match some protected entity

  153. def setReturnEntityMappings(s: Boolean): DeIdentificationModel.this.type

    With this property you select if you want to return mapping column.

    With this property you select if you want to return mapping column.

    Definition Classes
    DeIdentificationParams
  154. def setSameEntityThreshold(s: Double): DeIdentificationModel.this.type

    Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same (default: 0.9) For date entities this method doesn't apply.

    Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same (default: 0.9) For date entities this method doesn't apply.

    Definition Classes
    DeIdentificationParams
  155. def setSeed(s: Int): DeIdentificationModel.this.type

    It is the seed to select the entities on obfuscate mode.With the seed you can reply a execution several times with the same ouptut.

    It is the seed to select the entities on obfuscate mode.With the seed you can reply a execution several times with the same ouptut.

    Definition Classes
    DeIdentificationParams
  156. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  157. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  158. final def transform(dataset: Dataset[_]): DataFrame
    Definition Classes
    AnnotatorModel → Transformer
  159. def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" )
  160. def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" ) @varargs()
  161. final def transformSchema(schema: StructType): StructType
    Definition Classes
    RawAnnotator → PipelineStage
  162. def transformSchema(schema: StructType, logging: Boolean): StructType
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  163. def udfDocuments: UserDefinedFunction
  164. def udfProtectedEntities: UserDefinedFunction
  165. val uid: String
    Definition Classes
    DeIdentificationModel → Identifiable
  166. def validate(schema: StructType): Boolean
    Attributes
    protected
    Definition Classes
    RawAnnotator
  167. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  168. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  169. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  170. def wrapColumn(col: Column): Column
  171. def wrapColumnMetadata(col: Column): Column
    Attributes
    protected
    Definition Classes
    RawAnnotator
  172. def write: MLWriter
    Definition Classes
    ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable

Inherited from CheckLicense

Inherited from HasSimpleAnnotate[DeIdentificationModel]

Inherited from DeIdentificationParams

Inherited from AnnotatorModel[DeIdentificationModel]

Inherited from CanBeLazy

Inherited from RawAnnotator[DeIdentificationModel]

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from HasOutputAnnotatorType

Inherited from ParamsAndFeaturesWritable

Inherited from HasFeatures

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from Model[DeIdentificationModel]

Inherited from Transformer

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Annotator types

Required input and expected output annotator types

Members

Parameter setters

Parameter getters