class DeIdentificationModel extends AnnotatorModel[DeIdentificationModel] with DeIdentificationParams with DeidModelParams with HasSimpleAnnotate[DeIdentificationModel] with HandleExceptionParams with HasSafeAnnotate[DeIdentificationModel] with CheckLicense

Contains all the parameters to transform a dataset with three Input Annotations of types DOCUMENT, TOKEN and CHUNK, into its DeIdentified version of by either masking or obfuscating the given CHUNKS.

To create an configured DeIdentificationModel, please see the example of DeIdentification.

See also

BaseDeidParams to see params

DeIdentificationParams to see params

DeidModelParams to see params

DeIdentification to train your own model

Linear Supertypes
CheckLicense, HasSafeAnnotate[DeIdentificationModel], HandleExceptionParams, HasSimpleAnnotate[DeIdentificationModel], DeidModelParams, DeIdentificationParams, MaskingParams, BaseDeidParams, AnnotatorModel[DeIdentificationModel], CanBeLazy, RawAnnotator[DeIdentificationModel], HasOutputAnnotationCol, HasInputAnnotationCols, HasOutputAnnotatorType, ParamsAndFeaturesWritable, HasFeatures, DefaultParamsWritable, MLWritable, Model[DeIdentificationModel], Transformer, PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. DeIdentificationModel
  2. CheckLicense
  3. HasSafeAnnotate
  4. HandleExceptionParams
  5. HasSimpleAnnotate
  6. DeidModelParams
  7. DeIdentificationParams
  8. MaskingParams
  9. BaseDeidParams
  10. AnnotatorModel
  11. CanBeLazy
  12. RawAnnotator
  13. HasOutputAnnotationCol
  14. HasInputAnnotationCols
  15. HasOutputAnnotatorType
  16. ParamsAndFeaturesWritable
  17. HasFeatures
  18. DefaultParamsWritable
  19. MLWritable
  20. Model
  21. Transformer
  22. PipelineStage
  23. Logging
  24. Params
  25. Serializable
  26. Serializable
  27. Identifiable
  28. AnyRef
  29. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DeIdentificationModel()
  2. new DeIdentificationModel(uid: String)

    uid

    a unique identifier for the instanced AnnotatorModel

Type Members

  1. type AnnotationContent = Seq[Row]
    Attributes
    protected
    Definition Classes
    AnnotatorModel
  2. type AnnotatorType = String
    Definition Classes
    HasOutputAnnotatorType
  3. implicit class StringReplacement extends AnyRef

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T
    Attributes
    protected
    Definition Classes
    Params
  4. def $$[T](feature: StructFeature[T]): T
    Attributes
    protected
    Definition Classes
    HasFeatures
  5. def $$[K, V](feature: MapFeature[K, V]): Map[K, V]
    Attributes
    protected
    Definition Classes
    HasFeatures
  6. def $$[T](feature: SetFeature[T]): Set[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  7. def $$[T](feature: ArrayFeature[T]): Array[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  8. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  9. val GEOGRAPHIC_ENTITIES_PRIORITY: Map[String, Int]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  10. val GEO_METADATA_KEY: String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  11. def _transform(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): DataFrame
    Definition Classes
    DeIdentificationModel → AnnotatorModel
  12. val additionalDateFormats: StringArrayParam

    Additional date formats to be considered during date obfuscation.

    Additional date formats to be considered during date obfuscation. This allows users to specify custom date formats in addition to the default dateFormats.

    Definition Classes
    BaseDeidParams
  13. def afterAnnotate(dataset: DataFrame): DataFrame
    Definition Classes
    DeIdentificationModel → AnnotatorModel
  14. val ageGroups: StructFeature[Map[String, Array[Int]]]

    A map of age groups to obfuscate ages.

    A map of age groups to obfuscate ages. For this parameter to be active, the obfuscateByAgeGroups parameter must be true. If the given ageGroups do not fully contain the ages, the ages continue to be obfuscated according to the ageRanges. The map should contain the age group name as the key and an array of two integers as the value. The first integer is the lower bound of the age group, and the second integer is the upper bound of the age group. Default age groups are as follows in the English language:

    Map(
    "baby" -> Array(0, 1),
    "toddler" -> Array(1, 4),
    "child" -> Array(4, 13),
    "teenager" -> Array(13, 20),
    "adult" -> Array(20, 65),
    "senior" -> Array(65, 200)
    )
    Definition Classes
    DeIdentificationParams
  15. val ageRanges: IntArrayParam

    List of integers specifying limits of the age groups to preserve during obfuscation

    List of integers specifying limits of the age groups to preserve during obfuscation

    Definition Classes
    BaseDeidParams
  16. val ageRangesByHipaa: BooleanParam

    A Boolean variable indicating whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.

    A Boolean variable indicating whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.

    The HIPAA Privacy Rule mandates that ages from patients older than 90 years must be obfuscated, while age for patients 90 years or younger can remain unchanged.

    When true, age entities larger than 90 will be obfuscated as per HIPAA Privacy Rule, the others will remain unchanged. When false, ageRanges parameter is valid.

    Definition Classes
    BaseDeidParams
  17. val allTerms: MapFeature[String, List[String]]

    dictionary, which contains all terms for using later in anonimization function

  18. def annotate(annotations: Seq[Annotation]): Seq[Annotation]

    annotations

    The annotations per row that we need to obfuscate the document. Annotations should be DOCUMENT, TOKEN, CHUNK. The annotations of kind TOKEN or CHUNK will be have sentence number in the metadata in any of the annotations of kind Document. If the TOKEN or CHUNK have a sentence number in metadata longer that the sentence number on the document annotations the annotator should throw and exception

    returns

    The annotations of kind Document masked or obfuscated.

    Definition Classes
    DeIdentificationModel → HasSimpleAnnotate
  19. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  20. def beforeAnnotate(dataset: Dataset[_]): Dataset[_]
    Definition Classes
    DeIdentificationModel → AnnotatorModel
  21. val blackList: StringArrayParam

    List of entities that will be ignored in the regex file.

    List of entities that will be ignored in the regex file. The rest will be processed. The default values are "IBAN","ZIP","NPI","DLN","PASSPORT","C_CARD","DEA","SSN", "IP", "DEA".

    Definition Classes
    DeIdentificationParams
  22. val blackListEntities: StringArrayParam

    List of entities coming from NER or regex rules that will be ignored for masking or obfuscation.

    List of entities coming from NER or regex rules that will be ignored for masking or obfuscation. The rest entities will be processed. Defaults to an empty array.

    Definition Classes
    DeIdentificationParams
  23. final def checkSchema(schema: StructType, inputAnnotatorType: String): Boolean
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  24. def checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit
    Definition Classes
    CheckLicense
  25. def checkValidScope(scope: String): Unit
    Definition Classes
    CheckLicense
  26. def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  27. def checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  28. val chunkMatching: MapFeature[String, Double]

    Performs entity chunk matching across rows or within groups in a DataFrame.

    Performs entity chunk matching across rows or within groups in a DataFrame. Useful in de-identification pipelines where certain entity labels like "NAME" or "DATE" may be missing in some rows and need to be filled from other rows in the same group.

    Definition Classes
    DeIdentificationParams
    Note

    When applying the method across multiple rows, the usage of groupByCol parameter is required.

  29. final def clear(param: Param[_]): DeIdentificationModel.this.type
    Definition Classes
    Params
  30. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  31. lazy val combinedDateFormats: Array[String]
    Attributes
    protected
    Definition Classes
    BaseDeidParams
  32. val consistentAcrossNameParts: BooleanParam

    Param that indicates whether consistency should be enforced across different parts of a name (e.g., first name, middle name, last name).

    Param that indicates whether consistency should be enforced across different parts of a name (e.g., first name, middle name, last name). When set to true, the same transformation or obfuscation will be applied consistently to all parts of the same name entity, even if those parts appear separately.

    For example, if "John Smith" is obfuscated as "Liam Brown", then:

    • When the full name "John Smith" appears, it will be replaced with "Liam Brown"
    • When "John" or "Smith" appear individually, they will still be obfuscated as "Liam" and "Brown" respectively, ensuring consistency in name transformation.

    Default: true

    Definition Classes
    BaseDeidParams
  33. val consistentObfuscation: BooleanParam

    Whether to replace very similar entities in a document with the same randomized term (default: true) The similarity is based on the Levenshtein Distance between the words.

    Whether to replace very similar entities in a document with the same randomized term (default: true) The similarity is based on the Levenshtein Distance between the words.

    Definition Classes
    DeIdentificationParams
  34. def copy(extra: ParamMap): DeIdentificationModel
    Definition Classes
    RawAnnotator → Model → Transformer → PipelineStage → Params
  35. def copyValues[T <: Params](to: T, extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  36. val countryObfuscation: BooleanParam

    Whether to obfuscate country entities or not.

    Whether to obfuscate country entities or not. If true, country entities will be obfuscated using the Faker module. If false, country entities will be skipped during obfuscation. Default: false

    Definition Classes
    BaseDeidParams
  37. def createAnonymizeAnnotation(anonymizeSentence: (Sentence, Seq[Annotation]), offset: Int, idx: Int, spacesLength: Int): Annotation

    The method that takes anonymized sentence to create proper Annotation

    The method that takes anonymized sentence to create proper Annotation

    anonymizeSentence

    a sentence, which is anonymized

    idx

    a index of the sentence

    returns

    a proper Annotation instance

  38. val dateEntities: StringArrayParam

    List of date entities.

    List of date entities. Default: Array("DATE", "DOB", "DOD", "EFFDATE", "FISCAL_YEAR")

    Definition Classes
    BaseDeidParams
  39. val dateFormats: StringArrayParam

    Format of dates to displace

    Format of dates to displace

    Definition Classes
    BaseDeidParams
  40. val dateTag: Param[String]

    Tag representing what are the NER entity (default: DATE)

    Tag representing what are the NER entity (default: DATE)

    Definition Classes
    DeIdentificationParams
  41. val dateToYear: BooleanParam

    true if dates must be converted to years, false otherwise

    true if dates must be converted to years, false otherwise

    Definition Classes
    DeIdentificationParams
  42. val days: IntParam

    Number of days to obfuscate the dates by displacement.

    Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used

    Definition Classes
    BaseDeidParams
  43. final def defaultCopy[T <: Params](extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  44. def dfAnnotate: UserDefinedFunction
    Definition Classes
    HasSimpleAnnotate
  45. val doExceptionHandling: BooleanParam

    If true, exceptions are handled.

    If true, exceptions are handled. If exception causing data is passed to the model, a error annotation is emitted which has the exception message. Processing continues with the next one. This comes with a performance penalty.

    Definition Classes
    HandleExceptionParams
  46. val enableDefaultObfuscationEquivalents: BooleanParam

    Whether to enable default obfuscation equivalents for common entities.

    Whether to enable default obfuscation equivalents for common entities. This parameter allows the system to automatically include a set of predefined common English name equivalents. Default: false

    Definition Classes
    BaseDeidParams
  47. val entityCasingModes: StructFeature[Map[String, Array[String]]]

    Dictionary with entity casing modes that match some entities 'lowercase': Converts all characters to lower case using the rules of the default locale.

    Dictionary with entity casing modes that match some entities 'lowercase': Converts all characters to lower case using the rules of the default locale. 'uppercase': Converts all characters to upper case using the rules of the default locale. 'capitalize': Converts the first character to upper case and converts others to lower case. 'titlecase': Converts the first character in every token to upper case and converts others to lower case.

  48. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  49. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  50. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  51. def explainParams(): String
    Definition Classes
    Params
  52. def extraValidate(structType: StructType): Boolean
    Attributes
    protected
    Definition Classes
    RawAnnotator
  53. def extraValidateMsg: String
    Attributes
    protected
    Definition Classes
    RawAnnotator
  54. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  55. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  56. val fakerLengthOffset: IntParam

    It specifies how much length deviation is accepted in obfuscation, with keepTextSizeForObfuscation enabled.

    It specifies how much length deviation is accepted in obfuscation, with keepTextSizeForObfuscation enabled. Value must be greater than 0. Default is 3.

    Definition Classes
    BaseDeidParams
  57. val features: ArrayBuffer[Feature[_, _, _]]
    Definition Classes
    HasFeatures
  58. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  59. val fixedMaskLength: IntParam

    Select the fixed mask length: this is the length of the masking sequence that will be used when the 'fixed_length_chars' masking policy is selected.

    Select the fixed mask length: this is the length of the masking sequence that will be used when the 'fixed_length_chars' masking policy is selected.

    Definition Classes
    MaskingParams
  60. val genderAwareness: BooleanParam

    Whether to use gender-aware names or not during obfuscation.

    Whether to use gender-aware names or not during obfuscation. This param effects only names. If value is true, it might decrease performance. Default: False

    Definition Classes
    BaseDeidParams
  61. def generateFakeBySameLength(wordToReplace: String, entity: String): String

    obfuscating digits to new digits, letters to new letters and others remains the same

    obfuscating digits to new digits, letters to new letters and others remains the same

    Definition Classes
    DeidModelParams
  62. def generateFakeBySameLengthUsingHash(wordToReplace: String, entity: String): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  63. val geoConsistency: BooleanParam

    Whether to enforce consistent obfuscation across geographical entities: state, city, street, zip and phone.

    Whether to enforce consistent obfuscation across geographical entities: state, city, street, zip and phone.

    ## Functionality Overview This parameter enables intelligent geographical entity obfuscation that maintains realistic relationships between different geographic components. When enabled, the system ensures that obfuscated addresses form coherent, valid combinations rather than random replacements.

    ## Supported Entity Types The following geographical entities are processed with priority order: - **state** (Priority: 0) - US state names - **city** (Priority: 1) - City names - **zip** (Priority: 2) - Zip codes - **street** (Priority: 3) - Street addresses - **phone** (Priority: 4) - Phone numbers

    ## Language Requirement **IMPORTANT**: Geographic consistency is only applied when: - geoConsistency parameter is set to true AND - language parameter is set to en

    For non-English configurations, this feature is automatically disabled regardless of the parameter setting.

    ## Consistency Algorithm When geographical entities comes from the chunk columns:

    1. **Entity Grouping**: All geographic entities are identified and grouped by type 2. **Fake Address Selection**: A consistent set of fake US addresses is selected using hash-based deterministic selection to ensure reproducibility 3. **Priority-Based Mapping**: Entities are mapped to fake addresses following the priority order (state → city → zip → street → phone) 4. **Consistent Replacement**: All entities of the same type within a document use the same fake address pool, maintaining geographical coherence

    ## Parameter Interactions **IMPORTANT**: Enabling this parameter automatically disables: - keepTextSizeForObfuscation - Text size preservation is not maintained - consistentObfuscation - Standard consistency rules are overridden - file-based fakers

    This is necessary because geographic consistency requires specific fake address selection that may not preserve original text lengths or follow standard obfuscation patterns.

    default: false

    Definition Classes
    BaseDeidParams
  64. def get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]
    Definition Classes
    DeIdentificationModel → HasFeatures
  65. def get[T](feature: StructFeature[T]): Option[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  66. def get[T](feature: SetFeature[T]): Option[Set[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  67. def get[T](feature: ArrayFeature[T]): Option[Array[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  68. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  69. def getAdditionalDateFormats: Array[String]

    Gets the value of additionalDateFormats

    Gets the value of additionalDateFormats

    Definition Classes
    BaseDeidParams
  70. def getAgeRanges: Array[Int]

    Gets ageRanges param.

    Gets ageRanges param.

    Definition Classes
    BaseDeidParams
  71. def getAgeRangesByHipaa: Boolean

    Gets the value of ageRangesByHipaa.

    Gets the value of ageRangesByHipaa.

    Definition Classes
    BaseDeidParams
  72. def getAllTerms: Map[String, List[String]]

    dictionary, which contains all terms for using later in anonimization function

  73. def getAnonymizeSentence(sentence: Sentence, protectedEntities: Seq[Annotation], dateTag: String = "DATE", wholeDocumentDate: Option[Int] = None, zipCodeTag: String = "ZIP", entityMemory: Map[String, String], namePartsMemory: Map[String, String], documentID: Option[String]): (String, Seq[Annotation])

    Main point of interest.

    Main point of interest. This method projects the sentence into the anonymized form This method is called for each sentence in the input collection of Annotations

    sentence

    a sentence, which we want to anonymize

    protectedEntities

    a sequence of Entities which we want to anonymize

    dateTag

    a String which represents the value with which we replace dates

    returns

    a String, which represents an anonymized sentence

  74. def getBlackListEntities: Array[String]

    Gets blackListEntities param

    Definition Classes
    DeIdentificationParams
  75. def getChunkMatching: Map[String, Double]
    Definition Classes
    DeIdentificationParams
  76. def getChunkMatchingAsStr: String
    Definition Classes
    DeIdentificationParams
  77. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  78. def getConsistentAcrossNameParts: Boolean

    Gets the value of consistentAcrossNameParts.

    Gets the value of consistentAcrossNameParts.

    Definition Classes
    BaseDeidParams
  79. def getConsistentObfuscation: Boolean
    Definition Classes
    DeIdentificationParams
  80. def getCountryObfuscation: Boolean

    Gets the value of countryObfuscation.

    Gets the value of countryObfuscation.

    Definition Classes
    BaseDeidParams
  81. def getDateEntities: Array[String]

    Gets dateEntities param.

    Gets dateEntities param.

    Definition Classes
    BaseDeidParams
  82. def getDateFormats: Array[String]

    Gets the value of dateFormats

    Gets the value of dateFormats

    Definition Classes
    BaseDeidParams
  83. def getDateTag: String
    Definition Classes
    DeIdentificationParams
  84. def getDateToYear: Boolean
    Definition Classes
    DeIdentificationParams
  85. def getDays: Int

    Gets days param

    Gets days param

    Definition Classes
    BaseDeidParams
  86. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  87. def getDefaultObfuscationEquivalents: Array[StaticObfuscationEntity]
    Definition Classes
    BaseDeidParams
  88. def getDefaultObfuscationEquivalentsAsJava: Array[ArrayList[String]]
    Definition Classes
    BaseDeidParams
  89. def getDocumentIDFromSentences(sentences: Seq[Annotation]): Option[String]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  90. def getEnableDefaultObfuscationEquivalents: Boolean

    Gets the value of enableDefaultObfuscationEquivalents.

    Definition Classes
    BaseDeidParams
  91. def getEntitiesBySentence(chunks: Seq[Annotation], sentenceCount: Int): Seq[Seq[Annotation]]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  92. def getEntityBasedObfuscationRefSource(entityClass: String): String
    Attributes
    protected
    Definition Classes
    BaseDeidParams
  93. def getEntityCasingModes: Option[Map[String, Array[String]]]
  94. def getEntityField(annotation: Annotation): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  95. def getExternalFakers(entityClass: String, customFakers: Map[String, List[String]], wordToReplace: String): List[String]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  96. def getFakeByHashcode(fakes: Seq[String], wordToReplace: String, entity: String, seed: Int): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  97. def getFakeWithSameSize(fakes: Seq[String], wordToReplace: String, entity: String, lengthDeviation: Int, seed: Int): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  98. def getFakerLengthOffset: Int

    Gets fakerLengthOffset param

    Gets fakerLengthOffset param

    Definition Classes
    BaseDeidParams
  99. def getFakersEntity(entity: String, result: String): Seq[String]
    Definition Classes
    DeidModelParams
  100. def getFixedMaskLength: Int

    Gets fixedMaskLength param.

    Gets fixedMaskLength param.

    Definition Classes
    MaskingParams
  101. def getGenderAwareness: Boolean

    Gets genderAwareness param.

    Gets genderAwareness param.

    Definition Classes
    BaseDeidParams
  102. def getGeoConsistency: Boolean

    Gets the value of geoConsistency.

    Gets the value of geoConsistency.

    Definition Classes
    BaseDeidParams
  103. def getGroupByCol: String

    Gets groupByCol param

    Gets groupByCol param

    Definition Classes
    DeIdentificationParams
  104. def getIgnoreRegex: Boolean
    Definition Classes
    DeIdentificationParams
  105. def getInputCols: Array[String]
    Definition Classes
    HasInputAnnotationCols
  106. def getIsRandomDateDisplacement: Boolean

    Gets isRandomDateDisplacement param

    Definition Classes
    BaseDeidParams
  107. def getKeepMonth: Boolean

    Gets keepMonth param

    Gets keepMonth param

    Definition Classes
    BaseDeidParams
  108. def getKeepTextSizeForObfuscation: Boolean

    Gets keepTextSizeForObfuscation param

    Definition Classes
    BaseDeidParams
  109. def getKeepYear: Boolean

    Gets keepYear param

    Gets keepYear param

    Definition Classes
    BaseDeidParams
  110. def getLanguage: String

    Gets language param.

    Gets language param.

    Definition Classes
    BaseDeidParams
  111. def getLazyAnnotator: Boolean
    Definition Classes
    CanBeLazy
  112. def getMappingsColumn: String
    Definition Classes
    DeIdentificationParams
  113. def getMaskStatus(entityClass: String): String
    Attributes
    protected
    Definition Classes
    MaskingParams
  114. def getMaskingPolicy: String

    Gets maskingPolicy param.

    Gets maskingPolicy param.

    Definition Classes
    MaskingParams
  115. def getMaxRandomDisplacementDays: Int

    Gets maxRandomDisplacementDays param

    Definition Classes
    BaseDeidParams
  116. def getMaxSentence(annotations: Seq[Annotation]): Int
    Attributes
    protected
    Definition Classes
    DeidModelParams
  117. def getMetadataMaskingPolicy: String

    Gets metadataMaskingPolicy param

    Definition Classes
    DeIdentificationParams
  118. def getMinYear: Int
    Definition Classes
    DeIdentificationParams
  119. def getMode: String

    Gets mode param.

    Gets mode param.

    Definition Classes
    BaseDeidParams
  120. def getNearTokens(tokenizedSentence: Seq[IndexedToken], count: Int, ngrams: Int = 2): (String, String)
  121. def getNerEntitiesBySentence(annotations: Seq[Annotation], sentenceCount: Int): Seq[Seq[Annotation]]

    Returns the NER Annotations for each Annotation instance in the input Sequence

    Returns the NER Annotations for each Annotation instance in the input Sequence

    annotations

    a Sequence of Annotation instances

    returns

    a Sequence of Sequence[IndexedToken], each Sequence represents tokens from each input Annotation

  122. def getObfuscateByAgeGroups: Boolean

    Gets obfuscateByAgeGroups param

    Definition Classes
    DeIdentificationParams
  123. def getObfuscateDate: Boolean

    Gets obfuscateDate param

    Gets obfuscateDate param

    Definition Classes
    BaseDeidParams
  124. def getObfuscateRefSource: String

    Gets obfuscateRefSource param.

    Gets obfuscateRefSource param.

    Definition Classes
    BaseDeidParams
  125. def getObfuscateZipByHipaa: Boolean

    Gets the value of obfuscateZipByHipaa.

    Gets the value of obfuscateZipByHipaa.

    Definition Classes
    BaseDeidParams
  126. def getObfuscationEquivalents: Option[Array[StaticObfuscationEntity]]

    Gets the value of obfuscationEquivalents.

    Gets the value of obfuscationEquivalents.

    Definition Classes
    BaseDeidParams
  127. def getObfuscationStrategyOnException: String
    Definition Classes
    DeIdentificationParams
  128. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  129. final def getOutputCol: String
    Definition Classes
    HasOutputAnnotationCol
  130. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  131. def getRegexEntities(tokensSentences: Seq[IndexedToken], idx: Int): Seq[Annotation]

    Returns the Regex Annotations for each IndexedToken in the input Sequence

    Returns the Regex Annotations for each IndexedToken in the input Sequence

    tokensSentences

    a Sequence of IndexedToken instances

    returns

    a Sequence of Annotation, each Annotation represents Regex Entity

  132. def getRegexEntities(): Array[String]
  133. def getRegexOverride: Boolean
    Definition Classes
    DeIdentificationParams
  134. def getRegexPatternsDictionary: Map[String, Array[String]]

    dictionary with regular expression patterns that match some protected entity

  135. def getRegion: String

    Gets region param.

    Gets region param.

    Definition Classes
    BaseDeidParams
  136. def getReturnEntityMappings: Boolean
    Definition Classes
    DeIdentificationParams
  137. def getSameEntityThreshold: Double
    Definition Classes
    DeIdentificationParams
  138. def getSameLengthFormattedEntities(): Array[String]
    Definition Classes
    BaseDeidParams
  139. def getSeed(): Int
    Definition Classes
    BaseDeidParams
  140. def getSelectiveObfuscateRefSource: Map[String, String]

    Gets selectiveObfuscateRefSource param.

    Definition Classes
    BaseDeidParams
  141. def getSelectiveObfuscateRefSourceAsStr: String
    Definition Classes
    BaseDeidParams
  142. def getSelectiveObfuscationModes: Option[Map[String, Array[String]]]

    Gets selectiveObfuscationModes param.

    Definition Classes
    BaseDeidParams
  143. def getSentences(annotations: Seq[Annotation]): Seq[Sentence]

    Returns the content of each sentence inside the input sequence

    Returns the content of each sentence inside the input sequence

    annotations

    a Sequence of Annotation instances, to return content from

    returns

    a Sequence of Sentence

  144. def getShiftDaysFromSentences(sentences: Seq[Annotation]): Option[Int]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  145. def getStaticObfuscationFakes(entityClass: String, wordToReplace: String): Option[Seq[String]]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  146. def getStaticObfuscationPairs: Option[Array[StaticObfuscationEntity]]
    Definition Classes
    BaseDeidParams
  147. def getTokensBySentence(annotations: Seq[Annotation]): Seq[Seq[IndexedToken]]

    Returns the tokens for each Annotation instance in the input Sequence

    Returns the tokens for each Annotation instance in the input Sequence

    annotations

    a Sequence of Annotation instances

    returns

    a Sequence of Sequence[IndexedToken], each Sequence represents tokens from each input Annotation

  148. def getUnnormalizedDateMode: String

    Gets unnormalizedDateMode param.

    Definition Classes
    BaseDeidParams
  149. def getUseShiftDays: Boolean

    Getter method of useShiftDays

    Getter method of useShiftDays

    Definition Classes
    DeIdentificationParamsBaseDeidParams
  150. def getValidAgeRanges: Array[Int]

    Gets valid ageRanges whether ageRangesByHipaa is true or not.

    Gets valid ageRanges whether ageRangesByHipaa is true or not.

    Attributes
    protected
    Definition Classes
    BaseDeidParams
  151. def getZipCodeTag: String
    Definition Classes
    DeIdentificationParams
  152. val groupByCol: Param[String]

    The column name used to group the dataset.

    The column name used to group the dataset. This parameter is used in conjunction with consistentObfuscation to ensure consistent obfuscation within each group. When groupByCol is set, the dataset is partitioned into groups based on the values of the specified column.

    Default: "" (empty string, meaning no grouping)

    • The column name must be a valid string in the input dataset.
    • The column must be of StringType.
    Definition Classes
    DeIdentificationParams
    Note

    This functionality can change order of the dataset, so it is recommended to use it with caution.

    ,

    This functionality cannot be supported by LightPipeline.

  153. def handleCasing(originalFake: String, wordToReplace: String): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  154. def handleGeographicConsistency(protectedEntities: Seq[Seq[Annotation]]): Seq[Seq[Annotation]]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  155. def handleObfuscationEquivalents(sentenceBaseAnnotations: Seq[Seq[Annotation]]): Seq[Seq[Annotation]]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  156. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  157. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  158. def hasParent: Boolean
    Definition Classes
    Model
  159. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  160. val ignoreRegex: BooleanParam

    Select if you want to use regex file loaded in the model.

    Select if you want to use regex file loaded in the model. If true the default regex file will be not used The default value is false.

    Definition Classes
    DeIdentificationParams
  161. val inExceptionMode: Boolean
    Attributes
    protected
    Definition Classes
    HasSafeAnnotate
  162. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  163. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  164. val inputAnnotatorTypes: Array[AnnotatorType]

    Input annotator type: DOCUMENT, TOKEN, CHUNK

    Input annotator type: DOCUMENT, TOKEN, CHUNK

    Definition Classes
    DeIdentificationModel → HasInputAnnotationCols
  165. final val inputCols: StringArrayParam
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  166. def isArabic: Boolean
    Attributes
    protected
    Definition Classes
    MaskingParams
  167. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  168. def isEmptyString(value: String): Boolean
    Attributes
    protected
    Definition Classes
    DeidModelParams
  169. def isGeoEntity(annotation: Annotation): Boolean
    Attributes
    protected
    Definition Classes
    DeidModelParams
  170. def isGeoObfuscationEnabled: Boolean
    Attributes
    protected
    Definition Classes
    DeidModelParams
  171. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  172. def isObfuscateDate(entityClass: String): Boolean
    Attributes
    protected
    Definition Classes
    DeidModelParams
  173. val isRandomDateDisplacement: BooleanParam

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities,if false use the DeIdentificationParams.days The default value is false.

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities,if false use the DeIdentificationParams.days The default value is false.

    Definition Classes
    BaseDeidParams
  174. def isRegexMatch(nerTokens: (String, String), token: String, regexPatterns: Array[String]): Boolean

    Returns Boolean flag, which says if the token matches at least one pattern from array

    Returns Boolean flag, which says if the token matches at least one pattern from array

    token

    a token of interest to check for the match

    regexPatterns

    an Array of String to check against the token

    returns

    a Boolean flag, representing if the token matches at least pattern one of regexPatterns

  175. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  176. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  177. val keepMonth: BooleanParam

    Whether to keep the month intact when obfuscating date entities.

    Whether to keep the month intact when obfuscating date entities. If true, the month will remain unchanged during the obfuscation process. If false, the month will be modified along with the year and day. Default: false.

    Definition Classes
    BaseDeidParams
  178. val keepTextSizeForObfuscation: BooleanParam

    It specifies whether the output should maintain the same character length as the input text.

    It specifies whether the output should maintain the same character length as the input text. the output text will remain the same if same length is available, else length might vary.

    Definition Classes
    BaseDeidParams
  179. val keepYear: BooleanParam

    Whether to keep the year intact when obfuscating date entities.

    Whether to keep the year intact when obfuscating date entities. If true, the year will remain unchanged during the obfuscation process. If false, the year will be modified along with the month and day. Default: false.

    Definition Classes
    BaseDeidParams
  180. val language: Param[String]

    The language used to select the regex file and some faker entities.

    The language used to select the regex file and some faker entities. 'en'(English),'de'(German), 'es'(Spanish), 'fr'(French), 'ar'(Arabic) or 'ro'(Romanian) Default:'en'

    Definition Classes
    BaseDeidParams
  181. val lazyAnnotator: BooleanParam
    Definition Classes
    CanBeLazy
  182. implicit lazy val locale: Locale
    Attributes
    protected
    Definition Classes
    DeidModelParams
  183. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  184. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  185. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  186. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  187. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  188. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  189. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  190. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  191. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  192. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  193. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  194. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  195. val mappingsColumn: Param[String]

    This is the mapping column that will return the Annotations chunks with the fake entities

    This is the mapping column that will return the Annotations chunks with the fake entities

    Definition Classes
    DeIdentificationParams
  196. def maskEntity(wordToReplace: String, entityClass: String): String
    Attributes
    protected
    Definition Classes
    MaskingParams
  197. def maskEntity(annotation: Annotation, entityClass: String): String
    Attributes
    protected
    Definition Classes
    MaskingParams
  198. def maskEntityWithPolicy(wordToReplace: String, maskingPolicy: String, entityClass: String): String
    Attributes
    protected
    Definition Classes
    MaskingParams
  199. def maskEntityWithPolicy(annotation: Annotation, maskingPolicy: String, entityClass: String): String
    Attributes
    protected
    Definition Classes
    MaskingParams
  200. val maskingPolicy: Param[String]

    Select the masking policy:

    Select the masking policy:

    • 'entity_labels': Replace the values with the entity value.
    • 'same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.If the entity is less than 3 chars (like Jo, or 5), we can just use asterix without brackets.
    • 'fixed_length_chars': Replace the obfuscated entity with a masking sequence composed of a fixed number of asterisk.
    • 'entity_labels_without_brackets': Replace the values with the entity value without brackets.
    • 'same_length_chars_without_brackets': Replace the name with the asterix with same length without brackets.
    • Default: 'entity_labels'
    Definition Classes
    MaskingParams
  201. val maxRandomDisplacementDays: IntParam

    Maximum number of days for random date displacement.

    Maximum number of days for random date displacement. Default is 1825 (5 years). If isRandomDateDisplacement is true, a random number of days between 1 and maxRandomDisplacementDays will be used for date displacement.

    Definition Classes
    BaseDeidParams
  202. def mergeEntities(nerEntities: Seq[Annotation], regexEntities: Seq[Annotation], regexOverride: Boolean = false): Seq[Annotation]

    Returns a combined Sequence of Annotations, cleaned from duplicates

    Returns a combined Sequence of Annotations, cleaned from duplicates

    nerEntities

    a sequence of NER Entities to combine

    regexEntities

    an sequence of Regex Entities to combine

    returns

    a Sequence of Annotation, which is result of a merge without duplicates

  203. val metadataMaskingPolicy: Param[String]

    If specified, the metadata includes the masked form of the document.

    If specified, the metadata includes the masked form of the document. Select the following masking policy if you want to return mask form in the metadata:

    • 'entity_labels': Replace the values with the entity value.
    • 'same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.If the entity is less than 3 chars (like Jo, or 5), we can just use asterix without brackets.
    • 'fixed_length_chars': Replace the obfuscated entity with a masking sequence composed of a fixed number of asterisk.
    • 'entity_labels_without_brackets': Replace the values with the entity value without brackets.
    • 'same_length_chars_without_brackets': Replace the name with the asterix with same length without brackets.
    • Default: ""
    Definition Classes
    DeIdentificationParams
  204. val minYear: IntParam

    Minimum year to use when converting date to year

    Minimum year to use when converting date to year

    Definition Classes
    DeIdentificationParams
  205. val mode: Param[String]

    Mode for Anonymizer ['mask' or 'obfuscate'].

    Mode for Anonymizer ['mask' or 'obfuscate']. Default: 'mask'

    • Mask mode: The entities will be replaced by their entity types.
    • Obfuscate mode: The entity is replaced by an obfuscator's term.
    Definition Classes
    BaseDeidParams
    Example:
    1. Given the following text: "David Hale visited EEUU a couple of years ago"

      • Mask mode: "<PERSON> visited <COUNTRY> a couple of years ago"
      • Obfuscate mode: "Bryan Johnson visited Japan a couple of years ago"
  206. def msgHelper(schema: StructType): String
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  207. val nameEntities: Seq[String]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  208. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  209. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  210. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  211. val obfuscateByAgeGroups: BooleanParam

    Whether to obfuscate ages based on age groups.

    Whether to obfuscate ages based on age groups.

    When true, the age groups specified in the ageGroups parameter will be used to obfuscate ages. When false, the age ranges specified in the ageRanges parameter will be used to obfuscate ages. Default: false.

    Definition Classes
    DeIdentificationParams
  212. val obfuscateDate: BooleanParam

    When mode=="obfuscate" whether to obfuscate dates or not.

    When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to true, make sure dateFormats param fits the needs. If the value is true and obfuscation is failed, then unnormalizedDateMode will be activated. When setting to 'false', then the date will be masked to <DATE>. Default: false

    Definition Classes
    BaseDeidParams
  213. def obfuscateNameEntity(originalName: String, keepTextSize: Boolean, lengthDeviation: Int, namePartsMemory: Map[String, String]): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  214. val obfuscateRefSource: Param[String]

    The source of obfuscation to obfuscate the entities.

    The source of obfuscation to obfuscate the entities. The values ar the following: 'file': Takes the entities from the obfuscatorRefFile 'faker': Takes the entities from the Faker module 'both': Takes the entities from the obfuscatorRefFile and the faker module randomly.

    Definition Classes
    BaseDeidParams
  215. def obfuscateZIP(wordToReplace: String): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  216. val obfuscateZipByHipaa: BooleanParam

    Whether to apply HIPAA Safe Harbor ZIP code obfuscation rules.

    Whether to apply HIPAA Safe Harbor ZIP code obfuscation rules.

    When enabled (true), ZIP/ZIP+4 obfuscation follows the HIPAA Safe Harbor guidance:

    1. The algorithm extracts the first five digits from the input (accepting formats like "12345", "12345-6789", "123456789" and tolerant forms). 2. If the first three-digit ZIP prefix is in the HIPAA restricted list (the 17 prefixes derived from 2000 Census data), the ZIP MUST be suppressed to the canonical value "000**". 3. Otherwise, the ZIP is generalized to the first three digits followed by "**" (i.e. XXX**). The +4 portion will be masked with asterisks if present.

    When disabled (false), HIPAA-specific ZIP obfuscation is not applied and the component's default/custom ZIP obfuscation is used instead.

    Implementation notes and cautions:

    Definition Classes
    BaseDeidParams
  217. val obfuscationEquivalents: StructFeature[Array[StaticObfuscationEntity]]

    variant-to-canonical entity mappings to ensure consistent obfuscation.

    variant-to-canonical entity mappings to ensure consistent obfuscation.

    This method allows you to define equivalence rules for entity variants that should be obfuscated the same way. For example, the names "Alex" and "Alexander" will always be mapped to the same obfuscated value if they are linked to the same canonical form.

    It accepts an array of string triplets, where each triplet defines:

    • variant: A non-standard, short, or alternative form of a value (e.g., "Alex")
    • entityType: The type of the entity (e.g., "NAME", "STATE", "COUNTRY")
    • canonical: The standardized form all variants map to (e.g., "Alexander")

    variant and entityType comparisons are case-insensitive during processing.

    This is especially useful in de-identification tasks to ensure consistent replacement of semantically identical values. It also allows cross-variant normalization across different occurrences of sensitive data.

    Definition Classes
    BaseDeidParams
  218. val obfuscationStrategyOnException: Param[String]

    The obfuscation strategy to be applied when an exception occurs.

    The obfuscation strategy to be applied when an exception occurs.

    The obfuscation strategy determines how obfuscation is handled in case of an exception. Four possible values are supported:

    • "mask": The original chunk is replaced with a masking pattern.
    • "default": The original chunk is replaced with a default faker.
    • "skip": The original chunk is not replaced with any faker.
    • "exception": Throws the exception.

    The default obfuscation strategy is "default".

    Definition Classes
    DeIdentificationParams
  219. def onWrite(path: String, spark: SparkSession): Unit
    Attributes
    protected
    Definition Classes
    ParamsAndFeaturesWritable
  220. val optionalInputAnnotatorTypes: Array[String]
    Definition Classes
    HasInputAnnotationCols
  221. val outputAnnotatorType: AnnotatorType

    Output annotator types: DOCUMENT

    Output annotator types: DOCUMENT

    Definition Classes
    DeIdentificationModel → HasOutputAnnotatorType
  222. val outputAsDocument: BooleanParam

    Whether to return all sentences joined into a single document

    Whether to return all sentences joined into a single document

    Definition Classes
    DeIdentificationParams
  223. final val outputCol: Param[String]
    Attributes
    protected
    Definition Classes
    HasOutputAnnotationCol
  224. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  225. var parent: Estimator[DeIdentificationModel]
    Definition Classes
    Model
  226. lazy val randomDateFormat: String
    Attributes
    protected
    Definition Classes
    BaseDeidParams
  227. val regexEntities: StringArrayParam
  228. val regexOverride: BooleanParam

    If the value is true, prioritize the regex entities; if the value is false, prioritize the ner.

    If the value is true, prioritize the regex entities; if the value is false, prioritize the ner. The default value is false. If DeIdentification.combineRegexPatterns is true, this value will be invalid.

    Definition Classes
    DeIdentificationParams
  229. val regexPatternsDictionary: MapFeature[String, Array[String]]

    dictionary with regular expression patterns that match some protected entity

  230. val region: Param[String]

    With this property, you can select particular dateFormats.

    With this property, you can select particular dateFormats. This property is especially used when obfuscating dates. You can decide whether the first part of 11/11/2023 is a day or the second part is a day when obfuscating dates.

    • The values are following:
    • 'eu' for European Union
    • 'us' for USA
    Definition Classes
    BaseDeidParams
  231. val returnEntityMappings: BooleanParam

    With this property, you can select if you want to return mapping column.

    With this property, you can select if you want to return mapping column.

    Definition Classes
    DeIdentificationParams
  232. def safeAnnotate(annotations: Seq[Annotation]): Seq[Annotation]

    A protected method designed to safely annotate a sequence of Annotation objects by handling exceptions.

    A protected method designed to safely annotate a sequence of Annotation objects by handling exceptions.

    annotations

    A sequence of Annotation.

    returns

    A sequence of Annotation objects after processing, potentially containing error annotations.

    Attributes
    protected
    Definition Classes
    HasSafeAnnotate
  233. val sameEntityThreshold: DoubleParam

    Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same (default: 0.9) For date entities this method doesn't apply.

    Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same (default: 0.9) For date entities this method doesn't apply.

    Definition Classes
    DeIdentificationParams
  234. val sameLengthFormattedEntities: StringArrayParam

    List of formatted entities to generate the same length outputs as original ones during obfuscation.

    List of formatted entities to generate the same length outputs as original ones during obfuscation. The supported and default formatted entities are: "phone", "fax", "contact," "id", "idnum", "bioid", "medicalrecord", "zip", "vin", "ssn", "dln", "plate", "license", "IRS", "CFN", "account".

    Definition Classes
    BaseDeidParams
  235. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  236. val seed: IntParam

    It is the seed to select the entities on obfuscate mode.

    It is the seed to select the entities on obfuscate mode. With the seed, you can reply to an execution several times with the same output.

    Definition Classes
    BaseDeidParams
  237. def selectFakeFromAllFakes(wordToReplace: String, entityClass: String, maskedEntity: String, allFakes: Seq[String]): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  238. val selectiveObfuscateRefSource: MapFeature[String, String]

    A map of entity names to their obfuscation modes.

    A map of entity names to their obfuscation modes. This is used to selectively apply different obfuscation methods to specific entities. The keys are entity names and the values are the obfuscation sources. If an entity is not specified in this map, the obfuscateRefSource param is used to determine the obfuscation source.

    Definition Classes
    BaseDeidParams
    Example:
    1. val selectiveSources = Map(
       "PHONE" -> "file",
       "EMAIL" -> "faker",
       "NAME" -> "faker",
       "ADDRESS" -> "both"
       )
  239. val selectiveObfuscationModes: StructFeature[Map[String, Array[String]]]

    The dictionary of modes to enable multi-mode deidentification.

    The dictionary of modes to enable multi-mode deidentification.

    • 'obfuscate': Replace the values with random values.
    • 'mask_same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.
    • 'mask_entity_labels': Replace the values with the entity value.
    • 'mask_fixed_length_chars': Replace the name with the asterix with fixed length. You can also invoke "setFixedMaskLength()"
    • 'mask_entity_labels_without_brackets': Replace the values with the entity value without brackets.
    • 'mask_same_length_chars_without_brackets': Replace the name with the asterix with same length without brackets.
    • 'skip': Skip the entities (intact)

    The entities which have not been given in dictionary will deidentify according to setMode()

    Definition Classes
    BaseDeidParams
  240. def set[T](feature: StructFeature[T], value: T): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  241. def set[K, V](feature: MapFeature[K, V], value: Map[K, V]): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  242. def set[T](feature: SetFeature[T], value: Set[T]): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  243. def set[T](feature: ArrayFeature[T], value: Array[T]): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  244. final def set(paramPair: ParamPair[_]): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    Params
  245. final def set(param: String, value: Any): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    Params
  246. final def set[T](param: Param[T], value: T): DeIdentificationModel.this.type
    Definition Classes
    Params
  247. def setAdditionalDateFormats(formats: Array[String]): DeIdentificationModel.this.type

    Sets additionalDateFormats param

    Definition Classes
    BaseDeidParams
  248. def setAgeGroups(value: Map[String, Array[Int]]): DeIdentificationModel.this.type

    Sets the age groups to obfuscate ages.

    Sets the age groups to obfuscate ages. For this parameter to be active, the obfuscateByAgeGroups parameter must be true. If the given ageGroups do not fully contain the ages, the ages continue to be obfuscated according to the ageRanges. The map should contain the age group name as the key and an array of two integers as the value. The first integer is the lower bound of the age group, and the second integer is the upper bound of the age group. Default age groups are as follows in the English language:

    Map(
    "baby" -> Array(0, 1),
    "toddler" -> Array(1, 3),
    "child" -> Array(3, 12),
    "teenager" -> Array(12, 20),
    "adult" -> Array(20, 65),
    "senior" -> Array(65, 200)
    )
    Definition Classes
    DeIdentificationParams
    Exceptions thrown

    IllegalArgumentException if the value is empty, contains negative values, or is not a pair of integers

  249. def setAgeGroups(value: HashMap[String, ArrayList[Int]]): DeIdentificationModel.this.type
    Definition Classes
    DeIdentificationParams
  250. def setAgeRanges(mode: Array[Int]): DeIdentificationModel.this.type

    List of integers specifying limits of the age groups to preserve during obfuscation

    List of integers specifying limits of the age groups to preserve during obfuscation

    Definition Classes
    BaseDeidParams
  251. def setAgeRangesByHipaa(value: Boolean): DeIdentificationModel.this.type

    Sets whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.

    Sets whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.

    The HIPAA Privacy Rule mandates that ages from patients older than 90 years must be obfuscated, while age for patients 90 years or younger can remain unchanged.

    value

    If true, age entities larger than 90 will be obfuscated as per HIPAA Privacy Rule, the others will remain unchanged. If false, ageRanges parameter is valid. Default: false.

    Definition Classes
    BaseDeidParams
  252. def setAllTerms(value: Map[String, List[String]]): DeIdentificationModel.this.type
  253. def setBlackList(list: Array[String]): DeIdentificationModel.this.type

    List of entities that will be ignored to in the regex file.

    List of entities that will be ignored to in the regex file. The rest will be processed. The default values are "IBAN","ZIP","NPI","DLN","PASSPORT","C_CARD","DEA","SSN", "IP", "DEA".

    Definition Classes
    DeIdentificationParams
  254. def setBlackListEntities(value: Array[String]): DeIdentificationModel.this.type

    Sets the list of entities coming from NER or regex rules that will be ignored for masking or obfuscation.

    Sets the list of entities coming from NER or regex rules that will be ignored for masking or obfuscation. The rest entities will be processed. Defaults to an empty array.

    Definition Classes
    DeIdentificationParams
  255. def setChunkMatching(categories: HashMap[String, Double]): DeIdentificationModel.this.type
    Definition Classes
    DeIdentificationParams
  256. def setChunkMatching(value: Map[String, Double]): DeIdentificationModel.this.type

    Performs entity chunk matching across rows or within groups in a DataFrame.

    Performs entity chunk matching across rows or within groups in a DataFrame. Useful in de-identification pipelines where certain entity labels like "NAME" or "DATE" may be missing in some rows and need to be filled from other rows in the same group.

    Notes:

    • When applying the method across multiple rows, the usage of groupByCol parameter is required.
    Definition Classes
    DeIdentificationParams
  257. def setConsistentAcrossNameParts(value: Boolean): DeIdentificationModel.this.type

    Sets the value of consistentAcrossNameParts.

    Sets the value of consistentAcrossNameParts.

    value

    Boolean flag to enforce consistency across name parts

    returns

    this instance

    Definition Classes
    BaseDeidParams
  258. def setConsistentObfuscation(s: Boolean): DeIdentificationModel.this.type

    Whether to replace very similar entities in a document with the same randomized term (default: true) The similarity is based on the Levenshtein Distance between the words.

    Whether to replace very similar entities in a document with the same randomized term (default: true) The similarity is based on the Levenshtein Distance between the words.

    Definition Classes
    DeIdentificationParams
  259. def setCountryObfuscation(value: Boolean): DeIdentificationModel.this.type

    Sets whether to obfuscate country entities or not.

    Sets whether to obfuscate country entities or not. If true, country entities will be obfuscated using the Faker module. If false, country entities will be skipped during obfuscation. Default: false

    Definition Classes
    BaseDeidParams
  260. def setDateEntities(value: Array[String]): DeIdentificationModel.this.type

    Sets the value of dateEntities.

    Sets the value of dateEntities. Default: Array("DATE", "DOB", "DOD", "EFFDATE", "FISCAL_YEAR")

    Definition Classes
    BaseDeidParams
  261. def setDateFormats(s: Array[String]): DeIdentificationModel.this.type

    Format of dates to displace

    Format of dates to displace

    Definition Classes
    BaseDeidParams
  262. def setDateTag(s: String): DeIdentificationModel.this.type

    Tag representing what are the NER entity (default: DATE)

    Tag representing what are the NER entity (default: DATE)

    Definition Classes
    DeIdentificationParams
  263. def setDateToYear(s: Boolean): DeIdentificationModel.this.type

    true if dates must be converted to years, false otherwise

    true if dates must be converted to years, false otherwise

    Definition Classes
    DeIdentificationParams
  264. def setDays(k: Int): DeIdentificationModel.this.type

    Number of days to obfuscate the dates by displacement.

    Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used

    Definition Classes
    BaseDeidParams
  265. def setDefault[T](feature: StructFeature[T], value: () ⇒ T): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  266. def setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  267. def setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  268. def setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  269. final def setDefault(paramPairs: ParamPair[_]*): DeIdentificationModel.this.type
    Attributes
    protected
    Definition Classes
    Params
  270. final def setDefault[T](param: Param[T], value: T): DeIdentificationModel.this.type
    Attributes
    protected[org.apache.spark.ml]
    Definition Classes
    Params
  271. def setDoExceptionHandling(value: Boolean): DeIdentificationModel.this.type

    If true, exceptions are handled.

    If true, exceptions are handled. If exception causing data is passed to the model, a error annotation is emitted which has the exception message. Processing continues with the next one. This comes with a performance penalty.

    Definition Classes
    HandleExceptionParams
  272. def setEnableDefaultObfuscationEquivalents(value: Boolean): DeIdentificationModel.this.type

    Sets whether to enable default obfuscation equivalents for common entities.

    Sets whether to enable default obfuscation equivalents for common entities. This parameter allows the system to automatically include a set of predefined common English name equivalents. Default: false

    Definition Classes
    BaseDeidParams
  273. def setEntityCasingModes(value: Map[String, Array[String]]): DeIdentificationModel.this.type

    Set dictionary with entity casing modes that match some entities.

    Set dictionary with entity casing modes that match some entities. 'lowercase': Converts all characters to lower case using the rules of the default locale. 'uppercase': Converts all characters to upper case using the rules of the default locale. 'capitalize': Converts the first character to upper case and converts others to lower case. 'titlecase': Converts the first character in every token to upper case and converts others to lower case.

  274. def setFakerLengthOffset(value: Int): DeIdentificationModel.this.type

    Sets fakerLengthOffset param

    Sets fakerLengthOffset param

    Definition Classes
    BaseDeidParams
  275. def setFixedMaskLength(value: Int): DeIdentificationModel.this.type

    Sets the value of fixedMaskLength.

    Sets the value of fixedMaskLength. This is the length of the masking sequence that will be used when the 'fixed_length_chars' masking policy is selected.

    Definition Classes
    MaskingParams
  276. def setGenderAwareness(value: Boolean): DeIdentificationModel.this.type

    Whether to use gender-aware names or not during obfuscation.

    Whether to use gender-aware names or not during obfuscation. This param effects only names. If value is true, it might decrease performance. Default: False

    Definition Classes
    BaseDeidParams
  277. def setGeoConsistency(value: Boolean): DeIdentificationModel.this.type

    Sets the value of geoConsistency.

    Sets the value of geoConsistency. When set to true, it enables consistent obfuscation across geographical entities such as state, city, street, zip, and phone.

    Definition Classes
    BaseDeidParams
  278. def setGroupByCol(value: String): DeIdentificationModel.this.type

    Sets groupByCol param to group the dataset.

    Sets groupByCol param to group the dataset. This parameter is used in conjunction with consistentObfuscation to ensure consistent obfuscation within each group.

    Definition Classes
    DeIdentificationParams
    Note

    This functionality can change order of the dataset, so it is recommended to use it with caution.

    ,

    This functionality cannot be supported by LightPipeline.

  279. def setIgnoreRegex(s: Boolean): DeIdentificationModel.this.type

    Select if you want to use regex file loaded in the model.

    Select if you want to use regex file loaded in the model. If true the default regex file will be not used The default value is false.

    Definition Classes
    DeIdentificationParams
  280. final def setInputCols(value: String*): DeIdentificationModel.this.type
    Definition Classes
    HasInputAnnotationCols
  281. def setInputCols(value: Array[String]): DeIdentificationModel.this.type
    Definition Classes
    HasInputAnnotationCols
  282. def setIsRandomDateDisplacement(s: Boolean): DeIdentificationModel.this.type

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities, if false use the DeIdentificationParams.days The default value is false.

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities, if false use the DeIdentificationParams.days The default value is false.

    Definition Classes
    BaseDeidParams
  283. def setKeepMonth(value: Boolean): DeIdentificationModel.this.type

    Sets whether to keep the month intact when obfuscating date entities.

    Sets whether to keep the month intact when obfuscating date entities. If true, the month will remain unchanged during the obfuscation process. If false, the month will be modified along with the year and day. Default: false.

    Definition Classes
    BaseDeidParams
  284. def setKeepTextSizeForObfuscation(value: Boolean): DeIdentificationModel.this.type

    Sets keepTextSizeForObfuscation param

    Definition Classes
    BaseDeidParams
  285. def setKeepYear(value: Boolean): DeIdentificationModel.this.type

    Sets whether to keep the year intact when obfuscating date entities.

    Sets whether to keep the year intact when obfuscating date entities. If true, the year will remain unchanged during the obfuscation process. If false, the year will be modified along with the month and day. Default: false.

    Definition Classes
    BaseDeidParams
  286. def setLanguage(s: String): DeIdentificationModel.this.type

    The language used to select the regex file and some faker entities.

    The language used to select the regex file and some faker entities. 'en'(English),'de'(German), 'es'(Spanish), 'fr'(French), 'ar'(Arabic) or 'ro'(Romanian). Default:'en'

    Definition Classes
    BaseDeidParams
  287. def setLazyAnnotator(value: Boolean): DeIdentificationModel.this.type
    Definition Classes
    CanBeLazy
  288. def setMappingsColumn(s: String): DeIdentificationModel.this.type

    This is the mapping column that will return the Annotations chunks with the fake entities

    This is the mapping column that will return the Annotations chunks with the fake entities

    Definition Classes
    DeIdentificationParams
  289. def setMaskingPolicy(value: String): DeIdentificationModel.this.type

    Select the masking policy:

    Select the masking policy:

    • 'entity_labels': Replace the values with the entity value.
    • 'same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.If the entity is less than 3 chars (like Jo, or 5), we can just use asterix without brackets.
    • 'fixed_length_chars': Replace the obfuscated entity with a masking sequence composed of a fixed number of asterisk.
    • 'entity_labels_without_brackets': Replace the values with the entity value without brackets.
    • 'same_length_chars_without_brackets': Replace the name with the asterix with same length without brackets.
    • Default: 'entity_labels'
    Definition Classes
    MaskingParams
  290. def setMaxRandomDisplacementDays(value: Int): DeIdentificationModel.this.type

    Sets maxRandomDisplacementDays param

    Definition Classes
    BaseDeidParams
  291. def setMetadataMaskingPolicy(value: String): DeIdentificationModel.this.type

    If specified, the metadata includes the masked form of the document.

    If specified, the metadata includes the masked form of the document. Select the following masking policy if you want to return mask form in the metadata:

    • 'entity_labels': Replace the values with the entity value.
    • 'same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.If the entity is less than 3 chars (like Jo, or 5), we can just use asterix without brackets.
    • 'fixed_length_chars': Replace the obfuscated entity with a masking sequence composed of a fixed number of asterisk.
    • 'entity_labels_without_brackets': Replace the values with the entity value without brackets.
    • 'same_length_chars_without_brackets': Replace the name with the asterix with same length without brackets.
    • Default: ""
    Definition Classes
    DeIdentificationParams
  292. def setMinYear(s: Int): DeIdentificationModel.this.type

    Minimum year to use when converting date to year

    Minimum year to use when converting date to year

    Definition Classes
    DeIdentificationParams
  293. def setMode(m: String): DeIdentificationModel.this.type

    Mode for Anonymizer ['mask'|'obfuscate'].

    Mode for Anonymizer ['mask'|'obfuscate']. Default: 'mask'

    • Mask mode: The entities will be replaced by their entity types.
    • Obfuscate mode: The entity is replaced by an obfuscator's term.
    Definition Classes
    BaseDeidParams
    Example:
    1. Given the following text: "David Hale visited EEUU a couple of years ago"

      • Mask mode: "<PERSON> visited <COUNTRY> a couple of years ago"
      • Obfuscate mode: "Bryan Johnson visited Japan a couple of years ago"
  294. def setObfuscateByAgeGroups(value: Boolean): DeIdentificationModel.this.type

    Sets whether to obfuscate ages based on age groups.

    Sets whether to obfuscate ages based on age groups.

    When true, the age groups specified in the ageGroups parameter will be used to obfuscate ages. When false, the age ranges specified in the ageRanges parameter will be used to obfuscate ages. Default: false.

    Definition Classes
    DeIdentificationParams
  295. def setObfuscateDate(s: Boolean): DeIdentificationModel.this.type

    When mode=="obfuscate" whether to obfuscate dates or not.

    When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to true, make sure dateFormats param fits the needs. If the value is true and obfuscation is failed, then unnormalizedDateMode will be activated. When setting to 'false' then the date will be masked to <DATE> . Default: false

    Definition Classes
    BaseDeidParams
  296. def setObfuscateRefSource(s: String): DeIdentificationModel.this.type

    The source of obfuscation to obfuscate the entities.

    The source of obfuscation to obfuscate the entities. The values are the following: 'file': Takes the fakes from the obfuscatorRefFile 'faker': Takes the fakes from the Faker module 'both': Takes the fakes from the obfuscatorRefFile and the faker module randomly.

    Definition Classes
    BaseDeidParams
  297. def setObfuscateZipByHipaa(value: Boolean): DeIdentificationModel.this.type

    Sets whether HIPAA Safe Harbor ZIP obfuscation rules should be applied.

    Sets whether HIPAA Safe Harbor ZIP obfuscation rules should be applied.

    Behavior:

    • true: Apply HIPAA rules as described in obfuscateZipByHipaa: extract five digits, map restricted 3-digit prefixes to \"000**\", otherwise generalize to XXX**. The +4 portion will be masked with asterisks if present.
    • false: Do **not** apply HIPAA Safe Harbor behavior; use the component's default/custom ZIP obfuscation instead.

    Implementation & defaults:

    • Default: false (HIPAA behavior is opt-in). If you want HIPAA Safe Harbor behavior by default, change the default value where the parameter is declared.
    Definition Classes
    BaseDeidParams
  298. def setObfuscationEquivalents(equivalents: ArrayList[ArrayList[String]]): DeIdentificationModel.this.type
    Definition Classes
    BaseDeidParams
  299. def setObfuscationEquivalents(equivalents: Array[Array[String]]): DeIdentificationModel.this.type

    Sets variant-to-canonical entity mappings to ensure consistent obfuscation.

    Sets variant-to-canonical entity mappings to ensure consistent obfuscation.

    This method allows you to define equivalence rules for entity variants that should be obfuscated the same way. For example, the names "Alex" and "Alexander" will always be mapped to the same obfuscated value if they are linked to the same canonical form.

    It accepts an array of string triplets, where each triplet defines:

    • variant: A non-standard, short, or alternative form of a value (e.g., "Alex")
    • entityType: The type of the entity (e.g., "NAME", "STATE", "COUNTRY")
    • canonical: The standardized form all variants map to (e.g., "Alexander")

    variant and entityType comparisons are case-insensitive during processing.

    This is especially useful in de-identification tasks to ensure consistent replacement of semantically identical values. It also allows cross-variant normalization across different occurrences of sensitive data.

    Example
    val equivalents = Array(
      Array("Alex", "NAME", "Alexander"),
      Array("Rob", "NAME", "Robert"),
      Array("CA", "STATE", "California"),
      Array("Calif.", "STATE", "California")
    )
    
    myDeidTransformer.setObfuscationEquivalents(equivalents)
    equivalents

    Array of [variant, entityType, canonical] entries.

    Definition Classes
    BaseDeidParams
    Exceptions thrown

    IllegalArgumentException if any entry does not have exactly 3 elements.

  300. def setObfuscationEquivalents(equivalents: Array[StaticObfuscationEntity]): DeIdentificationModel.this.type

    Sets obfuscationEquivalents param.

    Definition Classes
    BaseDeidParams
  301. def setObfuscationStrategyOnException(value: String): DeIdentificationModel.this.type

    Sets the obfuscation strategy to be applied when an exception occurs.

    Sets the obfuscation strategy to be applied when an exception occurs.

    The obfuscation strategy determines how obfuscation is handled in case of an exception. Four possible values are supported:

    • "mask": The original chunk is replaced with a masking pattern.
    • "default": The original chunk is replaced with a default faker.
    • "skip": The original chunk is not replaced with any faker.
    • "exception": Throws the exception.

    The default obfuscation strategy is "default".

    Definition Classes
    DeIdentificationParams
  302. def setOutputAsDocument(mode: Boolean): DeIdentificationModel.this.type

    Whether to return all sentences joined into a single document

    Whether to return all sentences joined into a single document

    Definition Classes
    DeIdentificationParams
  303. final def setOutputCol(value: String): DeIdentificationModel.this.type
    Definition Classes
    HasOutputAnnotationCol
  304. def setParent(parent: Estimator[DeIdentificationModel]): DeIdentificationModel
    Definition Classes
    Model
  305. def setRegexOverride(s: Boolean): DeIdentificationModel.this.type

    If the value is true, prioritize the regex entities; if the value is false, prioritize the ner.

    If the value is true, prioritize the regex entities; if the value is false, prioritize the ner. The default value is false. If DeIdentification.combineRegexPatterns is true, this value will be invalid.

    Definition Classes
    DeIdentificationParams
  306. def setRegexPatternsDictionary(value: Map[String, Array[String]]): DeIdentificationModel.this.type

    dictionary with regular expression patterns that match some protected entity

  307. def setRegion(s: String): DeIdentificationModel.this.type

    With this property, you can select particular dateFormats.

    With this property, you can select particular dateFormats. This property is especially used when obfuscating dates. You can decide whether the first part of 11/11/2023 is a day or the second part is a day when obfuscating dates. The values are following:

    • 'eu' for European Union
    • 'us' for USA
    Definition Classes
    BaseDeidParams
  308. def setReturnEntityMappings(s: Boolean): DeIdentificationModel.this.type

    With this property, you can select if you want to return mapping column.

    With this property, you can select if you want to return mapping column.

    Definition Classes
    DeIdentificationParams
  309. def setSameEntityThreshold(s: Double): DeIdentificationModel.this.type

    Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same (default: 0.9) For date entities this method doesn't apply.

    Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same (default: 0.9) For date entities this method doesn't apply.

    Definition Classes
    DeIdentificationParams
  310. def setSameLengthFormattedEntities(entities: Array[String]): DeIdentificationModel.this.type

    List of formatted entities to generate the same length outputs as original ones during obfuscation.

    List of formatted entities to generate the same length outputs as original ones during obfuscation. The supported and default formatted entities are: PHONE, FAX, CONTACT, ID, IDNUM, BIOID, MEDICALRECORD, ZIP, VIN, SSN, DLN, LICENSE, PLATE, IRS, CFN, ACCOUNT.

    Definition Classes
    BaseDeidParams
  311. def setSeed(s: Int): DeIdentificationModel.this.type

    It is the seed to select the entities on obfuscate mode.

    It is the seed to select the entities on obfuscate mode. With the seed, you can reply to an execution several times with the same output.

    Definition Classes
    BaseDeidParams
  312. def setSelectiveObfuscateRefSource(value: HashMap[String, String]): DeIdentificationModel.this.type
    Definition Classes
    BaseDeidParams
  313. def setSelectiveObfuscateRefSource(value: Map[String, String]): DeIdentificationModel.this.type

    Sets the value of selectiveObfuscateRefSource.

    Sets the value of selectiveObfuscateRefSource. This is used to selectively apply different obfuscation methods to specific entities. The keys are entity names and the values are the obfuscation sources. If an entity is not specified in this map, the obfuscateRefSource param is used to determine the obfuscation method. The values can be: - 'file': Takes the fakes from the file. - 'faker': Takes the fakes from the embedded faker module. - 'both': Takes the fakes from the file and the faker module.

    Definition Classes
    BaseDeidParams
    Example:
    1. val modes = Map(
       "PHONE" -> "file",
       "EMAIL" -> "faker",
       "NAME" -> "faker",
       "ADDRESS" -> "both"
       )
  314. def setSelectiveObfuscationModes(value: HashMap[String, List[String]]): DeIdentificationModel.this.type
    Definition Classes
    BaseDeidParams
  315. def setSelectiveObfuscationModes(value: Map[String, Array[String]]): DeIdentificationModel.this.type

    Sets the value of selectiveObfuscationModes.

    Sets the value of selectiveObfuscationModes. The dictionary of modes to enable multi-mode deidentification.

    • 'obfuscate': Replace the values with random values.
    • 'mask_same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.
    • 'mask_entity_labels': Replace the values with the entity value.
    • 'mask_fixed_length_chars': Replace the name with the asterix with fixed length. You should also invoke "setFixedMaskLength()"
    • 'mask_entity_labels_without_brackets': Replace the values with the entity value without brackets.
    • 'mask_same_length_chars_without_brackets': Replace the name with the asterix with same length without brackets.
    • 'skip': Skip the entities (intact)

    The entities which have not been given in dictionary will deidentify according to setMode()

    Example:

    deidAnnotator
    .setMode("mask")
    .setSelectiveObfuscationModes(Map(
        "OBFUSCATE" -> Array("PHONE", "email"),
        "mask_entity_labels" -> Array("NAME", "CITY"),
        "skip" -> Array("id", "idnum"),
        "mask_same_length_chars" -> Array("fax"),
        "mask_fixed_length_chars" -> Array("zip")
    ))
    .setFixedMaskLength(4)
    Definition Classes
    BaseDeidParams
  316. def setStaticObfuscationPairs(pairs: ArrayList[ArrayList[String]]): DeIdentificationModel.this.type
    Definition Classes
    BaseDeidParams
  317. def setStaticObfuscationPairs(pairs: Array[StaticObfuscationEntity]): DeIdentificationModel.this.type
    Definition Classes
    BaseDeidParams
  318. def setStaticObfuscationPairs(pairs: Array[Array[String]]): DeIdentificationModel.this.type

    Sets the static obfuscation pairs.

    Sets the static obfuscation pairs. Each pair should contain three elements: original, entity type, and fake. The pairs must have exactly 3 elements: [original, entityType, fake].

    pairs

    An array of arrays containing the static obfuscation pairs.

    Definition Classes
    BaseDeidParams
  319. def setUnnormalizedDateMode(mode: String): DeIdentificationModel.this.type

    The mode to use if the date is not formatted.

    The mode to use if the date is not formatted. Options: [mask, obfuscate, skip] Default: obfuscate

    Definition Classes
    BaseDeidParams
  320. def setUseShiftDays(s: Boolean): DeIdentificationModel.this.type

    Sets the value of useShiftDays.

    Sets the value of useShiftDays. Whether to use the random shift day when the document has this in its metadata. DocumentHashCoder can create 'dateshift' based on the document. Default: false

    Definition Classes
    DeIdentificationParamsBaseDeidParams
  321. def setZipCodeTag(s: String): DeIdentificationModel.this.type
    Definition Classes
    DeIdentificationParams
  322. def shouldUseConsistentNameParts(entityClass: String): Boolean
    Attributes
    protected
    Definition Classes
    DeidModelParams
  323. val staticObfuscationPairs: StructFeature[Array[StaticObfuscationEntity]]

    A resource containing static obfuscation pairs.

    A resource containing static obfuscation pairs. Each pair should contain three elements: original, entity type, and fake.

    Definition Classes
    BaseDeidParams
  324. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  325. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  326. final def transform(dataset: Dataset[_]): DataFrame
    Definition Classes
    AnnotatorModel → Transformer
  327. def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" )
  328. def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" ) @varargs()
  329. final def transformSchema(schema: StructType): StructType
    Definition Classes
    RawAnnotator → PipelineStage
  330. def transformSchema(schema: StructType, logging: Boolean): StructType
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  331. def udfDocuments: UserDefinedFunction
  332. def udfProtectedEntities: UserDefinedFunction
  333. val uid: String
    Definition Classes
    DeIdentificationModel → Identifiable
  334. val unnormalizedDateMode: Param[String]

    The mode to use if the date is not formatted.

    The mode to use if the date is not formatted. Options: [mask, obfuscate, skip] Default: obfuscate

    Definition Classes
    BaseDeidParams
  335. val useShifDays: BooleanParam

    Use shift days : Whether to use the random shift day when the document has this in its metadata.

    Use shift days : Whether to use the random shift day when the document has this in its metadata. Default: False

    Definition Classes
    DeIdentificationParams
  336. val useShiftDays: BooleanParam

    Whether to use the random shift day when the document has this in its metadata.

    Whether to use the random shift day when the document has this in its metadata. DocumentHashCoder can create 'dateshift' based on the document. Default: false

    Definition Classes
    BaseDeidParams
  337. def validate(schema: StructType): Boolean
    Attributes
    protected
    Definition Classes
    RawAnnotator
  338. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  339. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  340. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  341. def wrapColumn(col: Column): Column
  342. def wrapColumnMetadata(col: Column): Column
    Attributes
    protected
    Definition Classes
    RawAnnotator
  343. def write: MLWriter
    Definition Classes
    ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable
  344. val zipCodeTag: Param[String]
    Definition Classes
    DeIdentificationParams

Deprecated Value Members

  1. def setUseShiftDayse(s: Boolean): DeIdentificationModel.this.type
    Definition Classes
    DeIdentificationParams
    Annotations
    @deprecated
    Deprecated

    deprecated because of typo

Inherited from CheckLicense

Inherited from HandleExceptionParams

Inherited from HasSimpleAnnotate[DeIdentificationModel]

Inherited from DeidModelParams

Inherited from DeIdentificationParams

Inherited from MaskingParams

Inherited from BaseDeidParams

Inherited from AnnotatorModel[DeIdentificationModel]

Inherited from CanBeLazy

Inherited from RawAnnotator[DeIdentificationModel]

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from HasOutputAnnotatorType

Inherited from ParamsAndFeaturesWritable

Inherited from HasFeatures

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from Model[DeIdentificationModel]

Inherited from Transformer

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Annotator types

Required input and expected output annotator types

Members

Parameter setters

Parameter getters