trait BaseDeidParams extends Params with HasFeatures

A trait that contains all the params that are common in DeIdentificationParams and ObfuscatorParams.

See also

DeIdentificationParams

ObfuscatorParams

DeidModelParams

Linear Supertypes
HasFeatures, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. BaseDeidParams
  2. HasFeatures
  3. Params
  4. Serializable
  5. Serializable
  6. Identifiable
  7. AnyRef
  8. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Abstract Value Members

  1. abstract def copy(extra: ParamMap): Params
    Definition Classes
    Params
  2. abstract val uid: String
    Definition Classes
    Identifiable

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T
    Attributes
    protected
    Definition Classes
    Params
  4. def $$[T](feature: StructFeature[T]): T
    Attributes
    protected
    Definition Classes
    HasFeatures
  5. def $$[K, V](feature: MapFeature[K, V]): Map[K, V]
    Attributes
    protected
    Definition Classes
    HasFeatures
  6. def $$[T](feature: SetFeature[T]): Set[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  7. def $$[T](feature: ArrayFeature[T]): Array[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  8. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  9. val additionalDateFormats: StringArrayParam

    Additional date formats to be considered during date obfuscation.

    Additional date formats to be considered during date obfuscation. This allows users to specify custom date formats in addition to the default dateFormats.

  10. val ageRanges: IntArrayParam

    List of integers specifying limits of the age groups to preserve during obfuscation

  11. val ageRangesByHipaa: BooleanParam

    A Boolean variable indicating whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.

    A Boolean variable indicating whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.

    The HIPAA Privacy Rule mandates that ages from patients older than 90 years must be obfuscated, while age for patients 90 years or younger can remain unchanged.

    When true, age entities larger than 90 will be obfuscated as per HIPAA Privacy Rule, the others will remain unchanged. When false, ageRanges parameter is valid.

  12. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  13. final def clear(param: Param[_]): BaseDeidParams.this.type
    Definition Classes
    Params
  14. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  15. lazy val combinedDateFormats: Array[String]
    Attributes
    protected
  16. val consistentAcrossNameParts: BooleanParam

    Param that indicates whether consistency should be enforced across different parts of a name (e.g., first name, middle name, last name).

    Param that indicates whether consistency should be enforced across different parts of a name (e.g., first name, middle name, last name). When set to true, the same transformation or obfuscation will be applied consistently to all parts of the same name entity, even if those parts appear separately.

    For example, if "John Smith" is obfuscated as "Liam Brown", then:

    • When the full name "John Smith" appears, it will be replaced with "Liam Brown"
    • When "John" or "Smith" appear individually, they will still be obfuscated as "Liam" and "Brown" respectively, ensuring consistency in name transformation.

    Default: true

  17. def copyValues[T <: Params](to: T, extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  18. val countryObfuscation: BooleanParam

    Whether to obfuscate country entities or not.

    Whether to obfuscate country entities or not. If true, country entities will be obfuscated using the Faker module. If false, country entities will be skipped during obfuscation. Default: false

  19. val dateEntities: StringArrayParam

    List of date entities.

    List of date entities. Default: Array("DATE", "DOB", "DOD", "EFFDATE", "FISCAL_YEAR")

  20. val dateFormats: StringArrayParam

    Format of dates to displace

  21. val days: IntParam

    Number of days to obfuscate the dates by displacement.

    Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used

  22. final def defaultCopy[T <: Params](extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  23. val enableDefaultObfuscationEquivalents: BooleanParam

    Whether to enable default obfuscation equivalents for common entities.

    Whether to enable default obfuscation equivalents for common entities. This parameter allows the system to automatically include a set of predefined common English name equivalents. Default: false

  24. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  25. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  26. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  27. def explainParams(): String
    Definition Classes
    Params
  28. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  29. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  30. val fakerLengthOffset: IntParam

    It specifies how much length deviation is accepted in obfuscation, with keepTextSizeForObfuscation enabled.

    It specifies how much length deviation is accepted in obfuscation, with keepTextSizeForObfuscation enabled. Value must be greater than 0. Default is 3.

  31. val features: ArrayBuffer[Feature[_, _, _]]
    Definition Classes
    HasFeatures
  32. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  33. val genderAwareness: BooleanParam

    Whether to use gender-aware names or not during obfuscation.

    Whether to use gender-aware names or not during obfuscation. This param effects only names. If value is true, it might decrease performance. Default: False

  34. val geoConsistency: BooleanParam

    Whether to enforce consistent obfuscation across geographical entities: state, city, street, zip and phone.

    Whether to enforce consistent obfuscation across geographical entities: state, city, street, zip and phone.

    ## Functionality Overview This parameter enables intelligent geographical entity obfuscation that maintains realistic relationships between different geographic components. When enabled, the system ensures that obfuscated addresses form coherent, valid combinations rather than random replacements.

    ## Supported Entity Types The following geographical entities are processed with priority order: - **state** (Priority: 0) - US state names - **city** (Priority: 1) - City names - **zip** (Priority: 2) - Zip codes - **street** (Priority: 3) - Street addresses - **phone** (Priority: 4) - Phone numbers

    ## Language Requirement **IMPORTANT**: Geographic consistency is only applied when: - geoConsistency parameter is set to true AND - language parameter is set to en

    For non-English configurations, this feature is automatically disabled regardless of the parameter setting.

    ## Consistency Algorithm When geographical entities comes from the chunk columns:

    1. **Entity Grouping**: All geographic entities are identified and grouped by type 2. **Fake Address Selection**: A consistent set of fake US addresses is selected using hash-based deterministic selection to ensure reproducibility 3. **Priority-Based Mapping**: Entities are mapped to fake addresses following the priority order (state → city → zip → street → phone) 4. **Consistent Replacement**: All entities of the same type within a document use the same fake address pool, maintaining geographical coherence

    ## Parameter Interactions **IMPORTANT**: Enabling this parameter automatically disables: - keepTextSizeForObfuscation - Text size preservation is not maintained - consistentObfuscation - Standard consistency rules are overridden - file-based fakers

    This is necessary because geographic consistency requires specific fake address selection that may not preserve original text lengths or follow standard obfuscation patterns.

    default: false

  35. def get[T](feature: StructFeature[T]): Option[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  36. def get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  37. def get[T](feature: SetFeature[T]): Option[Set[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  38. def get[T](feature: ArrayFeature[T]): Option[Array[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  39. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  40. def getAdditionalDateFormats: Array[String]

    Gets the value of additionalDateFormats

  41. def getAgeRanges: Array[Int]

    Gets ageRanges param.

  42. def getAgeRangesByHipaa: Boolean

    Gets the value of ageRangesByHipaa.

  43. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  44. def getConsistentAcrossNameParts: Boolean

    Gets the value of consistentAcrossNameParts.

  45. def getCountryObfuscation: Boolean

    Gets the value of countryObfuscation.

  46. def getDateEntities: Array[String]

    Gets dateEntities param.

  47. def getDateFormats: Array[String]

    Gets the value of dateFormats

  48. def getDays: Int

    Gets days param

  49. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  50. def getDefaultObfuscationEquivalents: Array[StaticObfuscationEntity]
  51. def getDefaultObfuscationEquivalentsAsJava: Array[ArrayList[String]]
  52. def getEnableDefaultObfuscationEquivalents: Boolean

    Gets the value of enableDefaultObfuscationEquivalents.

  53. def getEntityBasedObfuscationRefSource(entityClass: String): String
    Attributes
    protected
  54. def getFakerLengthOffset: Int

    Gets fakerLengthOffset param

  55. def getGenderAwareness: Boolean

    Gets genderAwareness param.

  56. def getGeoConsistency: Boolean

    Gets the value of geoConsistency.

  57. def getIsRandomDateDisplacement: Boolean

    Gets isRandomDateDisplacement param

  58. def getKeepMonth: Boolean

    Gets keepMonth param

  59. def getKeepTextSizeForObfuscation: Boolean

    Gets keepTextSizeForObfuscation param

  60. def getKeepYear: Boolean

    Gets keepYear param

  61. def getLanguage: String

    Gets language param.

  62. def getMaxRandomDisplacementDays: Int

    Gets maxRandomDisplacementDays param

  63. def getMode: String

    Gets mode param.

  64. def getObfuscateDate: Boolean

    Gets obfuscateDate param

  65. def getObfuscateRefSource: String

    Gets obfuscateRefSource param.

  66. def getObfuscateZipByHipaa: Boolean

    Gets the value of obfuscateZipByHipaa.

  67. def getObfuscationEquivalents: Option[Array[StaticObfuscationEntity]]

    Gets the value of obfuscationEquivalents.

  68. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  69. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  70. def getRegion: String

    Gets region param.

  71. def getSameLengthFormattedEntities(): Array[String]
  72. def getSeed(): Int
  73. def getSelectiveObfuscateRefSource: Map[String, String]

    Gets selectiveObfuscateRefSource param.

  74. def getSelectiveObfuscateRefSourceAsStr: String
  75. def getSelectiveObfuscationModes: Option[Map[String, Array[String]]]

    Gets selectiveObfuscationModes param.

  76. def getStaticObfuscationPairs: Option[Array[StaticObfuscationEntity]]
  77. def getUnnormalizedDateMode: String

    Gets unnormalizedDateMode param.

  78. def getUseShiftDays: Boolean

    Gets useShiftDays param.

  79. def getValidAgeRanges: Array[Int]

    Gets valid ageRanges whether ageRangesByHipaa is true or not.

    Gets valid ageRanges whether ageRangesByHipaa is true or not.

    Attributes
    protected
  80. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  81. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  82. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  83. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  84. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  85. val isRandomDateDisplacement: BooleanParam

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities,if false use the DeIdentificationParams.days The default value is false.

  86. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  87. val keepMonth: BooleanParam

    Whether to keep the month intact when obfuscating date entities.

    Whether to keep the month intact when obfuscating date entities. If true, the month will remain unchanged during the obfuscation process. If false, the month will be modified along with the year and day. Default: false.

  88. val keepTextSizeForObfuscation: BooleanParam

    It specifies whether the output should maintain the same character length as the input text.

    It specifies whether the output should maintain the same character length as the input text. the output text will remain the same if same length is available, else length might vary.

  89. val keepYear: BooleanParam

    Whether to keep the year intact when obfuscating date entities.

    Whether to keep the year intact when obfuscating date entities. If true, the year will remain unchanged during the obfuscation process. If false, the year will be modified along with the month and day. Default: false.

  90. val language: Param[String]

    The language used to select the regex file and some faker entities.

    The language used to select the regex file and some faker entities. 'en'(English),'de'(German), 'es'(Spanish), 'fr'(French), 'ar'(Arabic) or 'ro'(Romanian) Default:'en'

  91. val maxRandomDisplacementDays: IntParam

    Maximum number of days for random date displacement.

    Maximum number of days for random date displacement. Default is 1825 (5 years). If isRandomDateDisplacement is true, a random number of days between 1 and maxRandomDisplacementDays will be used for date displacement.

  92. val mode: Param[String]

    Mode for Anonymizer ['mask' or 'obfuscate'].

    Mode for Anonymizer ['mask' or 'obfuscate']. Default: 'mask'

    • Mask mode: The entities will be replaced by their entity types.
    • Obfuscate mode: The entity is replaced by an obfuscator's term.
    Example:
    1. Given the following text: "David Hale visited EEUU a couple of years ago"

      • Mask mode: "<PERSON> visited <COUNTRY> a couple of years ago"
      • Obfuscate mode: "Bryan Johnson visited Japan a couple of years ago"
  93. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  94. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  95. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  96. val obfuscateDate: BooleanParam

    When mode=="obfuscate" whether to obfuscate dates or not.

    When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to true, make sure dateFormats param fits the needs. If the value is true and obfuscation is failed, then unnormalizedDateMode will be activated. When setting to 'false', then the date will be masked to <DATE>. Default: false

  97. val obfuscateRefSource: Param[String]

    The source of obfuscation to obfuscate the entities.

    The source of obfuscation to obfuscate the entities. The values ar the following: 'file': Takes the entities from the obfuscatorRefFile 'faker': Takes the entities from the Faker module 'both': Takes the entities from the obfuscatorRefFile and the faker module randomly.

  98. val obfuscateZipByHipaa: BooleanParam

    Whether to apply HIPAA Safe Harbor ZIP code obfuscation rules.

    Whether to apply HIPAA Safe Harbor ZIP code obfuscation rules.

    When enabled (true), ZIP/ZIP+4 obfuscation follows the HIPAA Safe Harbor guidance:

    1. The algorithm extracts the first five digits from the input (accepting formats like "12345", "12345-6789", "123456789" and tolerant forms). 2. If the first three-digit ZIP prefix is in the HIPAA restricted list (the 17 prefixes derived from 2000 Census data), the ZIP MUST be suppressed to the canonical value "000**". 3. Otherwise, the ZIP is generalized to the first three digits followed by "**" (i.e. XXX**). The +4 portion will be masked with asterisks if present.

    When disabled (false), HIPAA-specific ZIP obfuscation is not applied and the component's default/custom ZIP obfuscation is used instead.

    Implementation notes and cautions:

  99. val obfuscationEquivalents: StructFeature[Array[StaticObfuscationEntity]]

    variant-to-canonical entity mappings to ensure consistent obfuscation.

    variant-to-canonical entity mappings to ensure consistent obfuscation.

    This method allows you to define equivalence rules for entity variants that should be obfuscated the same way. For example, the names "Alex" and "Alexander" will always be mapped to the same obfuscated value if they are linked to the same canonical form.

    It accepts an array of string triplets, where each triplet defines:

    • variant: A non-standard, short, or alternative form of a value (e.g., "Alex")
    • entityType: The type of the entity (e.g., "NAME", "STATE", "COUNTRY")
    • canonical: The standardized form all variants map to (e.g., "Alexander")

    variant and entityType comparisons are case-insensitive during processing.

    This is especially useful in de-identification tasks to ensure consistent replacement of semantically identical values. It also allows cross-variant normalization across different occurrences of sensitive data.

  100. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  101. lazy val randomDateFormat: String
    Attributes
    protected
  102. val region: Param[String]

    With this property, you can select particular dateFormats.

    With this property, you can select particular dateFormats. This property is especially used when obfuscating dates. You can decide whether the first part of 11/11/2023 is a day or the second part is a day when obfuscating dates.

    • The values are following:
    • 'eu' for European Union
    • 'us' for USA
  103. val sameLengthFormattedEntities: StringArrayParam

    List of formatted entities to generate the same length outputs as original ones during obfuscation.

    List of formatted entities to generate the same length outputs as original ones during obfuscation. The supported and default formatted entities are: "phone", "fax", "contact," "id", "idnum", "bioid", "medicalrecord", "zip", "vin", "ssn", "dln", "plate", "license", "IRS", "CFN", "account".

  104. val seed: IntParam

    It is the seed to select the entities on obfuscate mode.

    It is the seed to select the entities on obfuscate mode. With the seed, you can reply to an execution several times with the same output.

  105. val selectiveObfuscateRefSource: MapFeature[String, String]

    A map of entity names to their obfuscation modes.

    A map of entity names to their obfuscation modes. This is used to selectively apply different obfuscation methods to specific entities. The keys are entity names and the values are the obfuscation sources. If an entity is not specified in this map, the obfuscateRefSource param is used to determine the obfuscation source.

    Example:
    1. val selectiveSources = Map(
       "PHONE" -> "file",
       "EMAIL" -> "faker",
       "NAME" -> "faker",
       "ADDRESS" -> "both"
       )
  106. val selectiveObfuscationModes: StructFeature[Map[String, Array[String]]]

    The dictionary of modes to enable multi-mode deidentification.

    The dictionary of modes to enable multi-mode deidentification.

    • 'obfuscate': Replace the values with random values.
    • 'mask_same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.
    • 'mask_entity_labels': Replace the values with the entity value.
    • 'mask_fixed_length_chars': Replace the name with the asterix with fixed length. You can also invoke "setFixedMaskLength()"
    • 'mask_entity_labels_without_brackets': Replace the values with the entity value without brackets.
    • 'mask_same_length_chars_without_brackets': Replace the name with the asterix with same length without brackets.
    • 'skip': Skip the entities (intact)

    The entities which have not been given in dictionary will deidentify according to setMode()

  107. def set[T](feature: StructFeature[T], value: T): BaseDeidParams.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  108. def set[K, V](feature: MapFeature[K, V], value: Map[K, V]): BaseDeidParams.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  109. def set[T](feature: SetFeature[T], value: Set[T]): BaseDeidParams.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  110. def set[T](feature: ArrayFeature[T], value: Array[T]): BaseDeidParams.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  111. final def set(paramPair: ParamPair[_]): BaseDeidParams.this.type
    Attributes
    protected
    Definition Classes
    Params
  112. final def set(param: String, value: Any): BaseDeidParams.this.type
    Attributes
    protected
    Definition Classes
    Params
  113. final def set[T](param: Param[T], value: T): BaseDeidParams.this.type
    Definition Classes
    Params
  114. def setAdditionalDateFormats(formats: Array[String]): BaseDeidParams.this.type

    Sets additionalDateFormats param

  115. def setAgeRanges(mode: Array[Int]): BaseDeidParams.this.type

    List of integers specifying limits of the age groups to preserve during obfuscation

  116. def setAgeRangesByHipaa(value: Boolean): BaseDeidParams.this.type

    Sets whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.

    Sets whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.

    The HIPAA Privacy Rule mandates that ages from patients older than 90 years must be obfuscated, while age for patients 90 years or younger can remain unchanged.

    value

    If true, age entities larger than 90 will be obfuscated as per HIPAA Privacy Rule, the others will remain unchanged. If false, ageRanges parameter is valid. Default: false.

  117. def setConsistentAcrossNameParts(value: Boolean): BaseDeidParams.this.type

    Sets the value of consistentAcrossNameParts.

    Sets the value of consistentAcrossNameParts.

    value

    Boolean flag to enforce consistency across name parts

    returns

    this instance

  118. def setCountryObfuscation(value: Boolean): BaseDeidParams.this.type

    Sets whether to obfuscate country entities or not.

    Sets whether to obfuscate country entities or not. If true, country entities will be obfuscated using the Faker module. If false, country entities will be skipped during obfuscation. Default: false

  119. def setDateEntities(value: Array[String]): BaseDeidParams.this.type

    Sets the value of dateEntities.

    Sets the value of dateEntities. Default: Array("DATE", "DOB", "DOD", "EFFDATE", "FISCAL_YEAR")

  120. def setDateFormats(s: Array[String]): BaseDeidParams.this.type

    Format of dates to displace

  121. def setDays(k: Int): BaseDeidParams.this.type

    Number of days to obfuscate the dates by displacement.

    Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used

  122. def setDefault[T](feature: StructFeature[T], value: () ⇒ T): BaseDeidParams.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  123. def setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): BaseDeidParams.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  124. def setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): BaseDeidParams.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  125. def setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): BaseDeidParams.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  126. final def setDefault(paramPairs: ParamPair[_]*): BaseDeidParams.this.type
    Attributes
    protected
    Definition Classes
    Params
  127. final def setDefault[T](param: Param[T], value: T): BaseDeidParams.this.type
    Attributes
    protected[org.apache.spark.ml]
    Definition Classes
    Params
  128. def setEnableDefaultObfuscationEquivalents(value: Boolean): BaseDeidParams.this.type

    Sets whether to enable default obfuscation equivalents for common entities.

    Sets whether to enable default obfuscation equivalents for common entities. This parameter allows the system to automatically include a set of predefined common English name equivalents. Default: false

  129. def setFakerLengthOffset(value: Int): BaseDeidParams.this.type

    Sets fakerLengthOffset param

  130. def setGenderAwareness(value: Boolean): BaseDeidParams.this.type

    Whether to use gender-aware names or not during obfuscation.

    Whether to use gender-aware names or not during obfuscation. This param effects only names. If value is true, it might decrease performance. Default: False

  131. def setGeoConsistency(value: Boolean): BaseDeidParams.this.type

    Sets the value of geoConsistency.

    Sets the value of geoConsistency. When set to true, it enables consistent obfuscation across geographical entities such as state, city, street, zip, and phone.

  132. def setIsRandomDateDisplacement(s: Boolean): BaseDeidParams.this.type

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities, if false use the DeIdentificationParams.days The default value is false.

  133. def setKeepMonth(value: Boolean): BaseDeidParams.this.type

    Sets whether to keep the month intact when obfuscating date entities.

    Sets whether to keep the month intact when obfuscating date entities. If true, the month will remain unchanged during the obfuscation process. If false, the month will be modified along with the year and day. Default: false.

  134. def setKeepTextSizeForObfuscation(value: Boolean): BaseDeidParams.this.type

    Sets keepTextSizeForObfuscation param

  135. def setKeepYear(value: Boolean): BaseDeidParams.this.type

    Sets whether to keep the year intact when obfuscating date entities.

    Sets whether to keep the year intact when obfuscating date entities. If true, the year will remain unchanged during the obfuscation process. If false, the year will be modified along with the month and day. Default: false.

  136. def setLanguage(s: String): BaseDeidParams.this.type

    The language used to select the regex file and some faker entities.

    The language used to select the regex file and some faker entities. 'en'(English),'de'(German), 'es'(Spanish), 'fr'(French), 'ar'(Arabic) or 'ro'(Romanian). Default:'en'

  137. def setMaxRandomDisplacementDays(value: Int): BaseDeidParams.this.type

    Sets maxRandomDisplacementDays param

  138. def setMode(m: String): BaseDeidParams.this.type

    Mode for Anonymizer ['mask'|'obfuscate'].

    Mode for Anonymizer ['mask'|'obfuscate']. Default: 'mask'

    • Mask mode: The entities will be replaced by their entity types.
    • Obfuscate mode: The entity is replaced by an obfuscator's term.
    Example:
    1. Given the following text: "David Hale visited EEUU a couple of years ago"

      • Mask mode: "<PERSON> visited <COUNTRY> a couple of years ago"
      • Obfuscate mode: "Bryan Johnson visited Japan a couple of years ago"
  139. def setObfuscateDate(s: Boolean): BaseDeidParams.this.type

    When mode=="obfuscate" whether to obfuscate dates or not.

    When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to true, make sure dateFormats param fits the needs. If the value is true and obfuscation is failed, then unnormalizedDateMode will be activated. When setting to 'false' then the date will be masked to <DATE> . Default: false

  140. def setObfuscateRefSource(s: String): BaseDeidParams.this.type

    The source of obfuscation to obfuscate the entities.

    The source of obfuscation to obfuscate the entities. The values are the following: 'file': Takes the fakes from the obfuscatorRefFile 'faker': Takes the fakes from the Faker module 'both': Takes the fakes from the obfuscatorRefFile and the faker module randomly.

  141. def setObfuscateZipByHipaa(value: Boolean): BaseDeidParams.this.type

    Sets whether HIPAA Safe Harbor ZIP obfuscation rules should be applied.

    Sets whether HIPAA Safe Harbor ZIP obfuscation rules should be applied.

    Behavior:

    • true: Apply HIPAA rules as described in obfuscateZipByHipaa: extract five digits, map restricted 3-digit prefixes to \"000**\", otherwise generalize to XXX**. The +4 portion will be masked with asterisks if present.
    • false: Do **not** apply HIPAA Safe Harbor behavior; use the component's default/custom ZIP obfuscation instead.

    Implementation & defaults:

    • Default: false (HIPAA behavior is opt-in). If you want HIPAA Safe Harbor behavior by default, change the default value where the parameter is declared.
  142. def setObfuscationEquivalents(equivalents: ArrayList[ArrayList[String]]): BaseDeidParams.this.type
  143. def setObfuscationEquivalents(equivalents: Array[Array[String]]): BaseDeidParams.this.type

    Sets variant-to-canonical entity mappings to ensure consistent obfuscation.

    Sets variant-to-canonical entity mappings to ensure consistent obfuscation.

    This method allows you to define equivalence rules for entity variants that should be obfuscated the same way. For example, the names "Alex" and "Alexander" will always be mapped to the same obfuscated value if they are linked to the same canonical form.

    It accepts an array of string triplets, where each triplet defines:

    • variant: A non-standard, short, or alternative form of a value (e.g., "Alex")
    • entityType: The type of the entity (e.g., "NAME", "STATE", "COUNTRY")
    • canonical: The standardized form all variants map to (e.g., "Alexander")

    variant and entityType comparisons are case-insensitive during processing.

    This is especially useful in de-identification tasks to ensure consistent replacement of semantically identical values. It also allows cross-variant normalization across different occurrences of sensitive data.

    Example
    val equivalents = Array(
      Array("Alex", "NAME", "Alexander"),
      Array("Rob", "NAME", "Robert"),
      Array("CA", "STATE", "California"),
      Array("Calif.", "STATE", "California")
    )
    
    myDeidTransformer.setObfuscationEquivalents(equivalents)
    equivalents

    Array of [variant, entityType, canonical] entries.

    Exceptions thrown

    IllegalArgumentException if any entry does not have exactly 3 elements.

  144. def setObfuscationEquivalents(equivalents: Array[StaticObfuscationEntity]): BaseDeidParams.this.type

    Sets obfuscationEquivalents param.

  145. def setRegion(s: String): BaseDeidParams.this.type

    With this property, you can select particular dateFormats.

    With this property, you can select particular dateFormats. This property is especially used when obfuscating dates. You can decide whether the first part of 11/11/2023 is a day or the second part is a day when obfuscating dates. The values are following:

    • 'eu' for European Union
    • 'us' for USA
  146. def setSameLengthFormattedEntities(entities: Array[String]): BaseDeidParams.this.type

    List of formatted entities to generate the same length outputs as original ones during obfuscation.

    List of formatted entities to generate the same length outputs as original ones during obfuscation. The supported and default formatted entities are: PHONE, FAX, CONTACT, ID, IDNUM, BIOID, MEDICALRECORD, ZIP, VIN, SSN, DLN, LICENSE, PLATE, IRS, CFN, ACCOUNT.

  147. def setSeed(s: Int): BaseDeidParams.this.type

    It is the seed to select the entities on obfuscate mode.

    It is the seed to select the entities on obfuscate mode. With the seed, you can reply to an execution several times with the same output.

  148. def setSelectiveObfuscateRefSource(value: HashMap[String, String]): BaseDeidParams.this.type
  149. def setSelectiveObfuscateRefSource(value: Map[String, String]): BaseDeidParams.this.type

    Sets the value of selectiveObfuscateRefSource.

    Sets the value of selectiveObfuscateRefSource. This is used to selectively apply different obfuscation methods to specific entities. The keys are entity names and the values are the obfuscation sources. If an entity is not specified in this map, the obfuscateRefSource param is used to determine the obfuscation method. The values can be: - 'file': Takes the fakes from the file. - 'faker': Takes the fakes from the embedded faker module. - 'both': Takes the fakes from the file and the faker module.

    Example:
    1. val modes = Map(
       "PHONE" -> "file",
       "EMAIL" -> "faker",
       "NAME" -> "faker",
       "ADDRESS" -> "both"
       )
  150. def setSelectiveObfuscationModes(value: HashMap[String, List[String]]): BaseDeidParams.this.type
  151. def setSelectiveObfuscationModes(value: Map[String, Array[String]]): BaseDeidParams.this.type

    Sets the value of selectiveObfuscationModes.

    Sets the value of selectiveObfuscationModes. The dictionary of modes to enable multi-mode deidentification.

    • 'obfuscate': Replace the values with random values.
    • 'mask_same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.
    • 'mask_entity_labels': Replace the values with the entity value.
    • 'mask_fixed_length_chars': Replace the name with the asterix with fixed length. You should also invoke "setFixedMaskLength()"
    • 'mask_entity_labels_without_brackets': Replace the values with the entity value without brackets.
    • 'mask_same_length_chars_without_brackets': Replace the name with the asterix with same length without brackets.
    • 'skip': Skip the entities (intact)

    The entities which have not been given in dictionary will deidentify according to setMode()

    Example:

    deidAnnotator
    .setMode("mask")
    .setSelectiveObfuscationModes(Map(
        "OBFUSCATE" -> Array("PHONE", "email"),
        "mask_entity_labels" -> Array("NAME", "CITY"),
        "skip" -> Array("id", "idnum"),
        "mask_same_length_chars" -> Array("fax"),
        "mask_fixed_length_chars" -> Array("zip")
    ))
    .setFixedMaskLength(4)
  152. def setStaticObfuscationPairs(pairs: ArrayList[ArrayList[String]]): BaseDeidParams.this.type
  153. def setStaticObfuscationPairs(pairs: Array[StaticObfuscationEntity]): BaseDeidParams.this.type
  154. def setStaticObfuscationPairs(pairs: Array[Array[String]]): BaseDeidParams.this.type

    Sets the static obfuscation pairs.

    Sets the static obfuscation pairs. Each pair should contain three elements: original, entity type, and fake. The pairs must have exactly 3 elements: [original, entityType, fake].

    pairs

    An array of arrays containing the static obfuscation pairs.

  155. def setUnnormalizedDateMode(mode: String): BaseDeidParams.this.type

    The mode to use if the date is not formatted.

    The mode to use if the date is not formatted. Options: [mask, obfuscate, skip] Default: obfuscate

  156. def setUseShiftDays(s: Boolean): BaseDeidParams.this.type

    Sets the value of useShiftDays.

    Sets the value of useShiftDays. Whether to use the random shift day when the document has this in its metadata. DocumentHashCoder can create 'dateshift' based on the document. Default: false

  157. val staticObfuscationPairs: StructFeature[Array[StaticObfuscationEntity]]

    A resource containing static obfuscation pairs.

    A resource containing static obfuscation pairs. Each pair should contain three elements: original, entity type, and fake.

  158. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  159. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  160. val unnormalizedDateMode: Param[String]

    The mode to use if the date is not formatted.

    The mode to use if the date is not formatted. Options: [mask, obfuscate, skip] Default: obfuscate

  161. val useShiftDays: BooleanParam

    Whether to use the random shift day when the document has this in its metadata.

    Whether to use the random shift day when the document has this in its metadata. DocumentHashCoder can create 'dateshift' based on the document. Default: false

  162. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  163. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  164. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from HasFeatures

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

getParam

Parameters

Parameter setters

Ungrouped