Packages

c

com.johnsnowlabs.nlp.annotators.deid.fhir

BaseFhirDeIdentification

abstract class BaseFhirDeIdentification extends Transformer with HasFeatures with LightDeIdentificationParams with DeidModelParams with CheckLicense with HasInputCol with HasOutputAnnotationCol with ParamsAndFeaturesWritable

Linear Supertypes
ParamsAndFeaturesWritable, DefaultParamsWritable, MLWritable, HasOutputAnnotationCol, HasInputCol, CheckLicense, DeidModelParams, LightDeIdentificationParams, MaskingParams, BaseDeidParams, HasFeatures, Transformer, PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. BaseFhirDeIdentification
  2. ParamsAndFeaturesWritable
  3. DefaultParamsWritable
  4. MLWritable
  5. HasOutputAnnotationCol
  6. HasInputCol
  7. CheckLicense
  8. DeidModelParams
  9. LightDeIdentificationParams
  10. MaskingParams
  11. BaseDeidParams
  12. HasFeatures
  13. Transformer
  14. PipelineStage
  15. Logging
  16. Params
  17. Serializable
  18. Serializable
  19. Identifiable
  20. AnyRef
  21. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new BaseFhirDeIdentification()

Abstract Value Members

  1. abstract def deIdentifyInternal(input: String, rules: Map[String, String]): String

    Main entry point for string de-id

    Main entry point for string de-id

    Attributes
    protected
  2. abstract def getDefaultDateFormat: String
    Attributes
    protected
  3. abstract val uid: String
    Definition Classes
    Identifiable

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T
    Attributes
    protected
    Definition Classes
    Params
  4. def $$[T](feature: StructFeature[T]): T
    Attributes
    protected
    Definition Classes
    HasFeatures
  5. def $$[K, V](feature: MapFeature[K, V]): Map[K, V]
    Attributes
    protected
    Definition Classes
    HasFeatures
  6. def $$[T](feature: SetFeature[T]): Set[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  7. def $$[T](feature: ArrayFeature[T]): Array[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  8. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  9. val GEOGRAPHIC_ENTITIES_PRIORITY: Map[String, Int]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  10. val GEO_METADATA_KEY: String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  11. val additionalDateFormats: StringArrayParam

    Additional date formats to be considered during date obfuscation.

    Additional date formats to be considered during date obfuscation. This allows users to specify custom date formats in addition to the default dateFormats.

    Definition Classes
    BaseDeidParams
  12. val ageRanges: IntArrayParam

    List of integers specifying limits of the age groups to preserve during obfuscation

    List of integers specifying limits of the age groups to preserve during obfuscation

    Definition Classes
    BaseDeidParams
  13. val ageRangesByHipaa: BooleanParam

    A Boolean variable indicating whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.

    A Boolean variable indicating whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.

    The HIPAA Privacy Rule mandates that ages from patients older than 90 years must be obfuscated, while age for patients 90 years or younger can remain unchanged.

    When true, age entities larger than 90 will be obfuscated as per HIPAA Privacy Rule, the others will remain unchanged. When false, ageRanges parameter is valid.

    Definition Classes
    BaseDeidParams
  14. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  15. val blackListEntities: StringArrayParam

    List of entities coming from NER or regex rules that will be ignored for masking or obfuscation.

    List of entities coming from NER or regex rules that will be ignored for masking or obfuscation. The rest entities will be processed. Defaults to an empty array.

    Definition Classes
    BaseDeidParams
  16. def checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String], metadata: Option[Map[String, Value]]): Unit
    Definition Classes
    CheckLicense
  17. def checkValidScope(scope: String): Unit
    Definition Classes
    CheckLicense
  18. def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean, metadata: Option[Map[String, Value]]): Unit
    Definition Classes
    CheckLicense
  19. def checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean, metadata: Option[Map[String, Value]]): Unit
    Definition Classes
    CheckLicense
  20. final def clear(param: Param[_]): BaseFhirDeIdentification.this.type
    Definition Classes
    Params
  21. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  22. lazy val combinedDateFormats: Array[String]
    Attributes
    protected
    Definition Classes
    BaseDeidParams
  23. val consistentAcrossNameParts: BooleanParam

    Param that indicates whether consistency should be enforced across different parts of a name (e.g., first name, middle name, last name).

    Param that indicates whether consistency should be enforced across different parts of a name (e.g., first name, middle name, last name). When set to true, the same transformation or obfuscation will be applied consistently to all parts of the same name entity, even if those parts appear separately.

    For example, if "John Smith" is obfuscated as "Liam Brown", then:

    • When the full name "John Smith" appears, it will be replaced with "Liam Brown"
    • When "John" or "Smith" appear individually, they will still be obfuscated as "Liam" and "Brown" respectively, ensuring consistency in name transformation.

    Default: true

    Definition Classes
    BaseDeidParams
  24. def copy(extra: ParamMap): BaseFhirDeIdentification.this.type
    Definition Classes
    BaseFhirDeIdentification → Transformer → PipelineStage → Params
  25. def copyValues[T <: Params](to: T, extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  26. val countryObfuscation: BooleanParam

    Whether to obfuscate country entities or not.

    Whether to obfuscate country entities or not. If true, country entities will be obfuscated using the Faker module. If false, country entities will be skipped during obfuscation. Default: false

    Definition Classes
    BaseDeidParams
  27. val customFakers: MapFeature[String, Array[String]]

    The dictionary of custom fakers to specify the obfuscation terms for the entities.

    The dictionary of custom fakers to specify the obfuscation terms for the entities. You can specify the entity and the terms to be used for obfuscation.

    Definition Classes
    LightDeIdentificationParams
  28. val dateEntities: StringArrayParam

    List of date entities.

    List of date entities. Default: Array("DATE", "DOB", "DOD", "EFFDATE", "FISCAL_YEAR")

    Definition Classes
    BaseDeidParams
  29. val dateFormats: StringArrayParam

    Format of dates to displace

    Format of dates to displace

    Definition Classes
    BaseDeidParams
  30. val days: IntParam

    Number of days to obfuscate the dates by displacement.

    Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used

    Definition Classes
    BaseDeidParams
  31. final def defaultCopy[T <: Params](extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  32. def deidentify(jsonStr: String): String
  33. def deidentifyWordToReplace(wordToReplace: String, entityClass: String, namePartsMemory: Map[String, String]): String
    Attributes
    protected
  34. def deidentify_list(jsonStrs: ArrayList[String]): List[String]
  35. def deidentify_list(jsonStrs: Array[String]): Array[String]
  36. val enableDefaultObfuscationEquivalents: BooleanParam

    Whether to enable default obfuscation equivalents for common entities.

    Whether to enable default obfuscation equivalents for common entities. This parameter allows the system to automatically include a set of predefined common English name equivalents. Default: false

    Definition Classes
    BaseDeidParams
  37. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  38. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  39. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  40. def explainParams(): String
    Definition Classes
    Params
  41. def extractDateAndRest(wordToReplace: String): (String, String)
    Attributes
    protected
  42. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  43. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  44. val fakerLengthOffset: IntParam

    It specifies how much length deviation is accepted in obfuscation, with keepTextSizeForObfuscation enabled.

    It specifies how much length deviation is accepted in obfuscation, with keepTextSizeForObfuscation enabled. Value must be greater than 0. Default is 3.

    Definition Classes
    BaseDeidParams
  45. val features: ArrayBuffer[Feature[_, _, _]]
    Definition Classes
    HasFeatures
  46. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  47. val fixedMaskLength: IntParam

    Select the fixed mask length: this is the length of the masking sequence that will be used when the 'fixed_length_chars' masking policy is selected.

    Select the fixed mask length: this is the length of the masking sequence that will be used when the 'fixed_length_chars' masking policy is selected.

    Definition Classes
    MaskingParams
  48. val genderAwareness: BooleanParam

    Whether to use gender-aware names or not during obfuscation.

    Whether to use gender-aware names or not during obfuscation. This param effects only names. If value is true, it might decrease performance. Default: False

    Definition Classes
    BaseDeidParams
  49. def generateFakeBySameLength(wordToReplace: String, entity: String): String

    obfuscating digits to new digits, letters to new letters and others remains the same

    obfuscating digits to new digits, letters to new letters and others remains the same

    Definition Classes
    DeidModelParams
  50. def generateFakeBySameLengthUsingHash(wordToReplace: String, entity: String): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  51. val geoConsistency: BooleanParam

    Whether to enforce consistent obfuscation across geographical entities: state, city, street, zip and phone.

    Whether to enforce consistent obfuscation across geographical entities: state, city, street, zip and phone.

    ## Functionality Overview This parameter enables intelligent geographical entity obfuscation that maintains realistic relationships between different geographic components. When enabled, the system ensures that obfuscated addresses form coherent, valid combinations rather than random replacements.

    ## Supported Entity Types The following geographical entities are processed with priority order: - **state** (Priority: 0) - US state names - **city** (Priority: 1) - City names - **zip** (Priority: 2) - Zip codes - **street** (Priority: 3) - Street addresses - **phone** (Priority: 4) - Phone numbers

    ## Language Requirement **IMPORTANT**: Geographic consistency is only applied when: - geoConsistency parameter is set to true AND - language parameter is set to en

    For non-English configurations, this feature is automatically disabled regardless of the parameter setting.

    ## Consistency Algorithm When geographical entities comes from the chunk columns:

    1. **Entity Grouping**: All geographic entities are identified and grouped by type 2. **Fake Address Selection**: A consistent set of fake US addresses is selected using hash-based deterministic selection to ensure reproducibility 3. **Priority-Based Mapping**: Entities are mapped to fake addresses following the priority order (state → city → zip → street → phone) 4. **Consistent Replacement**: All entities of the same type within a document use the same fake address pool, maintaining geographical coherence

    ## Parameter Interactions **IMPORTANT**: Enabling this parameter automatically disables: - keepTextSizeForObfuscation - Text size preservation is not maintained - consistentObfuscation - Standard consistency rules are overridden - file-based fakers

    This is necessary because geographic consistency requires specific fake address selection that may not preserve original text lengths or follow standard obfuscation patterns.

    default: false

    Definition Classes
    BaseDeidParams
  52. def get[T](feature: StructFeature[T]): Option[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  53. def get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  54. def get[T](feature: SetFeature[T]): Option[Set[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  55. def get[T](feature: ArrayFeature[T]): Option[Array[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  56. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  57. def getAdditionalDateFormats: Array[String]

    Gets the value of additionalDateFormats

    Gets the value of additionalDateFormats

    Definition Classes
    BaseDeidParams
  58. def getAgeRanges: Array[Int]

    Gets ageRanges param.

    Gets ageRanges param.

    Definition Classes
    BaseDeidParams
  59. def getAgeRangesByHipaa: Boolean

    Gets the value of ageRangesByHipaa.

    Gets the value of ageRangesByHipaa.

    Definition Classes
    BaseDeidParams
  60. def getBlackListEntities: Array[String]

    Gets blackListEntities param

    Definition Classes
    BaseDeidParams
  61. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  62. def getConsistentAcrossNameParts: Boolean

    Gets the value of consistentAcrossNameParts.

    Gets the value of consistentAcrossNameParts.

    Definition Classes
    BaseDeidParams
  63. def getCountryObfuscation: Boolean

    Gets the value of countryObfuscation.

    Gets the value of countryObfuscation.

    Definition Classes
    BaseDeidParams
  64. def getCustomFakers: Map[String, List[String]]

    Gets customFakers param.

    Gets customFakers param.

    Attributes
    protected
    Definition Classes
    LightDeIdentificationParams
  65. def getDateEntities: Array[String]

    Gets dateEntities param.

    Gets dateEntities param.

    Definition Classes
    BaseDeidParams
  66. def getDateFormats: Array[String]

    Gets the value of dateFormats

    Gets the value of dateFormats

    Definition Classes
    BaseDeidParams
  67. def getDays: Int

    Gets days param

    Gets days param

    Definition Classes
    BaseDeidParams
  68. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  69. def getDefaultObfuscationEquivalents: Array[StaticObfuscationEntity]
    Definition Classes
    BaseDeidParams
  70. def getDefaultObfuscationEquivalentsAsJava: Array[ArrayList[String]]
    Definition Classes
    BaseDeidParams
  71. def getDocumentIDFromSentences(sentences: Seq[Annotation]): Option[String]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  72. def getEnableDefaultObfuscationEquivalents: Boolean

    Gets the value of enableDefaultObfuscationEquivalents.

    Definition Classes
    BaseDeidParams
  73. def getEntitiesBySentence(chunks: Seq[Annotation], sentenceCount: Int): Seq[Seq[Annotation]]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  74. def getEntityBasedObfuscationRefSource(entityClass: String): String
    Attributes
    protected
    Definition Classes
    BaseDeidParams
  75. def getEntityField(annotation: Annotation): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  76. def getExternalFakers(entityClass: String, customFakers: Map[String, List[String]], wordToReplace: String): List[String]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  77. def getFakeByHashcode(fakes: Seq[String], wordToReplace: String, entity: String, seed: Int): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  78. def getFakeWithSameSize(fakes: Seq[String], wordToReplace: String, entity: String, lengthDeviation: Int, seed: Int): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  79. def getFakerLengthOffset: Int

    Gets fakerLengthOffset param

    Gets fakerLengthOffset param

    Definition Classes
    BaseDeidParams
  80. def getFakersEntity(entity: String, result: String): Seq[String]
    Definition Classes
    DeidModelParams
  81. def getFixedMaskLength: Int

    Gets fixedMaskLength param.

    Gets fixedMaskLength param.

    Definition Classes
    MaskingParams
  82. def getGenderAwareness: Boolean

    Gets genderAwareness param.

    Gets genderAwareness param.

    Definition Classes
    BaseDeidParams
  83. def getGeoConsistency: Boolean

    Gets the value of geoConsistency.

    Gets the value of geoConsistency.

    Definition Classes
    BaseDeidParams
  84. final def getInputCol: String
    Definition Classes
    HasInputCol
  85. def getIsRandomDateDisplacement: Boolean

    Gets isRandomDateDisplacement param

    Definition Classes
    BaseDeidParams
  86. def getKeepMonth: Boolean

    Gets keepMonth param

    Gets keepMonth param

    Definition Classes
    BaseDeidParams
  87. def getKeepTextSizeForObfuscation: Boolean

    Gets keepTextSizeForObfuscation param

    Definition Classes
    BaseDeidParams
  88. def getKeepYear: Boolean

    Gets keepYear param

    Gets keepYear param

    Definition Classes
    BaseDeidParams
  89. def getLanguage: String

    Gets language param.

    Gets language param.

    Definition Classes
    BaseDeidParams
  90. def getMappingRules: Map[String, String]
  91. def getMappingRulesAsStr: String
  92. def getMaskStatus(entityClass: String): String
    Attributes
    protected
    Definition Classes
    MaskingParams
  93. def getMaskingPolicy: String

    Gets maskingPolicy param.

    Gets maskingPolicy param.

    Definition Classes
    MaskingParams
  94. def getMaxRandomDisplacementDays: Int

    Gets maxRandomDisplacementDays param

    Definition Classes
    BaseDeidParams
  95. def getMaxSentence(annotations: Seq[Annotation]): Int
    Attributes
    protected
    Definition Classes
    DeidModelParams
  96. def getMode: String

    Gets mode param.

    Gets mode param.

    Definition Classes
    BaseDeidParams
  97. def getObfuscateDate: Boolean

    Gets obfuscateDate param

    Gets obfuscateDate param

    Definition Classes
    BaseDeidParams
  98. def getObfuscateRefSource: String

    Gets obfuscateRefSource param.

    Gets obfuscateRefSource param.

    Definition Classes
    BaseDeidParams
  99. def getObfuscateZipByHipaa: Boolean

    Gets the value of obfuscateZipByHipaa.

    Gets the value of obfuscateZipByHipaa.

    Definition Classes
    BaseDeidParams
  100. def getObfuscateZipKeepDigits: Int

    Gets the value of obfuscateZipKeepDigits.

    Gets the value of obfuscateZipKeepDigits.

    Definition Classes
    BaseDeidParams
  101. def getObfuscationEquivalents: Option[Array[StaticObfuscationEntity]]

    Gets the value of obfuscationEquivalents.

    Gets the value of obfuscationEquivalents.

    Definition Classes
    BaseDeidParams
  102. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  103. final def getOutputCol: String
    Definition Classes
    HasOutputAnnotationCol
  104. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  105. def getRegion: String

    Gets region param.

    Gets region param.

    Definition Classes
    BaseDeidParams
  106. def getSameLengthFormattedEntities(): Array[String]
    Definition Classes
    BaseDeidParams
  107. final def getScopes: Seq[String]
    Attributes
    protected
  108. def getSeed(): Int
    Definition Classes
    BaseDeidParams
  109. def getSelectiveObfuscateRefSource: Map[String, String]

    Gets selectiveObfuscateRefSource param.

    Definition Classes
    BaseDeidParams
  110. def getSelectiveObfuscateRefSourceAsStr: String
    Definition Classes
    BaseDeidParams
  111. def getSelectiveObfuscationModes: Option[Map[String, Array[String]]]

    Gets selectiveObfuscationModes param.

    Definition Classes
    BaseDeidParams
  112. def getShiftDaysFromSentences(sentences: Seq[Annotation]): Option[Int]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  113. def getStaticObfuscationFakes(entityClass: String, wordToReplace: String): Option[Seq[String]]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  114. def getStaticObfuscationPairs: Option[Array[StaticObfuscationEntity]]
    Definition Classes
    BaseDeidParams
  115. def getUnnormalizedDateMode: String

    Gets unnormalizedDateMode param.

    Definition Classes
    BaseDeidParams
  116. def getUseShiftDays: Boolean

    Gets useShiftDays param.

    Gets useShiftDays param.

    Definition Classes
    BaseDeidParams
  117. def getValidAgeRanges: Array[Int]

    Gets valid ageRanges whether ageRangesByHipaa is true or not.

    Gets valid ageRanges whether ageRangesByHipaa is true or not.

    Attributes
    protected
    Definition Classes
    BaseDeidParams
  118. def handleCasing(originalFake: String, wordToReplace: String): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  119. def handleGeographicConsistency(protectedEntities: Seq[Seq[Annotation]]): Seq[Seq[Annotation]]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  120. def handleObfuscationEquivalents(sentenceBaseAnnotations: Seq[Seq[Annotation]]): Seq[Seq[Annotation]]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  121. def handleSelectiveObfuscationModes(wordToReplace: String, entityClass: String, namePartsMemory: Map[String, String]): String
    Attributes
    protected
  122. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  123. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  124. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  125. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  126. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  127. final val inputCol: Param[String]
    Definition Classes
    HasInputCol
  128. def isArabic: Boolean
    Attributes
    protected
    Definition Classes
    MaskingParams
  129. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  130. def isEmptyString(value: String): Boolean
    Attributes
    protected
    Definition Classes
    DeidModelParams
  131. def isGeoEntity(annotation: Annotation): Boolean
    Attributes
    protected
    Definition Classes
    DeidModelParams
  132. def isGeoObfuscationEnabled: Boolean
    Attributes
    protected
    Definition Classes
    DeidModelParams
  133. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  134. def isObfuscateDate(entityClass: String): Boolean
    Attributes
    protected
    Definition Classes
    DeidModelParams
  135. val isRandomDateDisplacement: BooleanParam

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities,if false use the DeIdentificationParams.days The default value is false.

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities,if false use the DeIdentificationParams.days The default value is false.

    Definition Classes
    BaseDeidParams
  136. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  137. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  138. val keepMonth: BooleanParam

    Whether to keep the month intact when obfuscating date entities.

    Whether to keep the month intact when obfuscating date entities. If true, the month will remain unchanged during the obfuscation process. If false, the month will be modified along with the year and day. Default: false.

    Definition Classes
    BaseDeidParams
  139. val keepTextSizeForObfuscation: BooleanParam

    It specifies whether the output should maintain the same character length as the input text.

    It specifies whether the output should maintain the same character length as the input text. the output text will remain the same if same length is available, else length might vary.

    Definition Classes
    BaseDeidParams
  140. val keepYear: BooleanParam

    Whether to keep the year intact when obfuscating date entities.

    Whether to keep the year intact when obfuscating date entities. If true, the year will remain unchanged during the obfuscation process. If false, the year will be modified along with the month and day. Default: false.

    Definition Classes
    BaseDeidParams
  141. val language: Param[String]

    The language used to select the regex file and some faker entities.

    The language used to select the regex file and some faker entities. 'en'(English),'de'(German), 'es'(Spanish), 'fr'(French), 'ar'(Arabic) or 'ro'(Romanian) Default:'en'

    Definition Classes
    BaseDeidParams
  142. implicit lazy val locale: Locale
    Attributes
    protected
    Definition Classes
    DeidModelParams
  143. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  144. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  145. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  146. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  147. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  148. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  149. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  150. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  151. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  152. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  153. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  154. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  155. val mappingRules: MapFeature[String, String]

    FHIR field de-identification rules for primitive type obfuscation.

    FHIR field de-identification rules for primitive type obfuscation.

    Overview

    Defines how specific FHIR elements should be de-identified using FHIR Path syntax. Supports all FHIR primitive types with built-in obfuscation strategies.

  156. def maskEntity(wordToReplace: String, entityClass: String): String
    Attributes
    protected
    Definition Classes
    MaskingParams
  157. def maskEntity(annotation: Annotation, entityClass: String): String
    Attributes
    protected
    Definition Classes
    MaskingParams
  158. def maskEntityWithPolicy(wordToReplace: String, maskingPolicy: String, entityClass: String): String
    Attributes
    protected
    Definition Classes
    MaskingParams
  159. def maskEntityWithPolicy(annotation: Annotation, maskingPolicy: String, entityClass: String): String
    Attributes
    protected
    Definition Classes
    MaskingParams
  160. val maskingPolicy: Param[String]

    Select the masking policy:

    Select the masking policy:

    • 'entity_labels': Replace the values with the entity value.
    • 'same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.If the entity is less than 3 chars (like Jo, or 5), we can just use asterix without brackets.
    • 'fixed_length_chars': Replace the obfuscated entity with a masking sequence composed of a fixed number of asterisk.
    • 'entity_labels_without_brackets': Replace the values with the entity value without brackets.
    • 'same_length_chars_without_brackets': Replace the name with the asterix with same length without brackets.
    • Default: 'entity_labels'
    Definition Classes
    MaskingParams
  161. val maxRandomDisplacementDays: IntParam

    Maximum number of days for random date displacement.

    Maximum number of days for random date displacement. Default is 1825 (5 years). If isRandomDateDisplacement is true, a random number of days between 1 and maxRandomDisplacementDays will be used for date displacement.

    Definition Classes
    BaseDeidParams
  162. val mode: Param[String]

    Mode for Anonymizer ['mask' or 'obfuscate'].

    Mode for Anonymizer ['mask' or 'obfuscate']. Default: 'mask'

    • Mask mode: The entities will be replaced by their entity types.
    • Obfuscate mode: The entity is replaced by an obfuscator's term.
    Definition Classes
    BaseDeidParams
    Example:
    1. Given the following text: "David Hale visited EEUU a couple of years ago"

      • Mask mode: "<PERSON> visited <COUNTRY> a couple of years ago"
      • Obfuscate mode: "Bryan Johnson visited Japan a couple of years ago"
  163. val nameEntities: Seq[String]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  164. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  165. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  166. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  167. val obfuscateDate: BooleanParam

    When mode=="obfuscate" whether to obfuscate dates or not.

    When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to true, make sure dateFormats param fits the needs. If the value is true and obfuscation is failed, then unnormalizedDateMode will be activated. When setting to 'false', then the date will be masked to <DATE>. Default: false

    Definition Classes
    BaseDeidParams
  168. def obfuscateDateEntity(wordToReplace: String): String
    Attributes
    protected
  169. def obfuscateEntity(wordToReplace: String, entityClass: String, namePartsMemory: Map[String, String]): String
    Attributes
    protected
  170. def obfuscateNameEntity(originalName: String, keepTextSize: Boolean, lengthDeviation: Int, namePartsMemory: Map[String, String]): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  171. val obfuscateRefSource: Param[String]

    The source of obfuscation to obfuscate the entities.

    The source of obfuscation to obfuscate the entities. The values ar the following: 'file': Takes the entities from the obfuscatorRefFile 'faker': Takes the entities from the Faker module 'both': Takes the entities from the obfuscatorRefFile and the faker module randomly.

    Definition Classes
    BaseDeidParams
  172. def obfuscateZIP(wordToReplace: String): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  173. val obfuscateZipByHipaa: BooleanParam

    Whether to apply HIPAA Safe Harbor ZIP code obfuscation rules.

    Whether to apply HIPAA Safe Harbor ZIP code obfuscation rules.

    When enabled (true), ZIP/ZIP+4 obfuscation follows the HIPAA Safe Harbor guidance:

    1. The algorithm extracts the first five digits from the input (accepting formats like "12345", "12345-6789", "123456789" and tolerant forms). 2. If the first three-digit ZIP prefix is in the HIPAA restricted list (the 17 prefixes derived from 2000 Census data), the ZIP MUST be suppressed to the canonical value "000**". 3. Otherwise, the ZIP is generalized to the first three digits followed by "**" (i.e. XXX**). The +4 portion will be masked with asterisks if present.

    When disabled (false), HIPAA-specific ZIP obfuscation is not applied and the component's default/custom ZIP obfuscation is used instead.

    Implementation notes and cautions:

    Definition Classes
    BaseDeidParams
  174. val obfuscateZipKeepDigits: IntParam

    Number of leading ZIP code digits to preserve when applying HIPAA-based ZIP obfuscation.

    Number of leading ZIP code digits to preserve when applying HIPAA-based ZIP obfuscation. This parameter is only effective when obfuscateZipByHipaa is enabled.

    Behavior:

    • Preserves the first value digits of the ZIP code.
    • Masks all remaining digits (including the ZIP+4 segment, if present) with asterisks (*).
    • Default: 3

    Examples:

    • 12345 → 123**
    • If the preserved digit count is set to 2: 12345 → 12***

    This setting overrides the default HIPAA Safe Harbor ZIP generalization pattern (XXX**) by allowing clients to customize how many digits remain unmasked under expert-determination requirements.

    Definition Classes
    BaseDeidParams
  175. val obfuscationEquivalents: StructFeature[Array[StaticObfuscationEntity]]

    variant-to-canonical entity mappings to ensure consistent obfuscation.

    variant-to-canonical entity mappings to ensure consistent obfuscation.

    This method allows you to define equivalence rules for entity variants that should be obfuscated the same way. For example, the names "Alex" and "Alexander" will always be mapped to the same obfuscated value if they are linked to the same canonical form.

    It accepts an array of string triplets, where each triplet defines:

    • variant: A non-standard, short, or alternative form of a value (e.g., "Alex")
    • entityType: The type of the entity (e.g., "NAME", "STATE", "COUNTRY")
    • canonical: The standardized form all variants map to (e.g., "Alexander")

    variant and entityType comparisons are case-insensitive during processing.

    This is especially useful in de-identification tasks to ensure consistent replacement of semantically identical values. It also allows cross-variant normalization across different occurrences of sensitive data.

    Definition Classes
    BaseDeidParams
  176. def onWrite(path: String, spark: SparkSession): Unit
    Attributes
    protected
    Definition Classes
    ParamsAndFeaturesWritable
  177. final val outputCol: Param[String]
    Attributes
    protected
    Definition Classes
    HasOutputAnnotationCol
  178. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  179. def preventDateToMask(entityClass: String): Unit
    Attributes
    protected
  180. lazy val randomDateFormat: String
    Attributes
    protected
    Definition Classes
    BaseDeidParams
  181. val region: Param[String]

    With this property, you can select particular dateFormats.

    With this property, you can select particular dateFormats. This property is especially used when obfuscating dates. You can decide whether the first part of 11/11/2023 is a day or the second part is a day when obfuscating dates.

    • The values are following:
    • 'eu' for European Union
    • 'us' for USA
    Definition Classes
    BaseDeidParams
  182. val sameLengthFormattedEntities: StringArrayParam

    List of formatted entities to generate the same length outputs as original ones during obfuscation.

    List of formatted entities to generate the same length outputs as original ones during obfuscation. The supported and default formatted entities are: "phone", "fax", "contact," "id", "idnum", "bioid", "medicalrecord", "zip", "vin", "ssn", "dln", "plate", "license", "IRS", "CFN", "account".

    Definition Classes
    BaseDeidParams
  183. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  184. val seed: IntParam

    It is the seed to select the entities on obfuscate mode.

    It is the seed to select the entities on obfuscate mode. With the seed, you can reply to an execution several times with the same output.

    Definition Classes
    BaseDeidParams
  185. def selectFakeFromAllFakes(wordToReplace: String, entityClass: String, maskedEntity: String, allFakes: Seq[String]): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  186. val selectiveObfuscateRefSource: MapFeature[String, String]

    A map of entity names to their obfuscation modes.

    A map of entity names to their obfuscation modes. This is used to selectively apply different obfuscation methods to specific entities. The keys are entity names and the values are the obfuscation sources. If an entity is not specified in this map, the obfuscateRefSource param is used to determine the obfuscation source.

    Definition Classes
    BaseDeidParams
    Example:
    1. val selectiveSources = Map(
       "PHONE" -> "file",
       "EMAIL" -> "faker",
       "NAME" -> "faker",
       "ADDRESS" -> "both"
       )
  187. val selectiveObfuscationModes: StructFeature[Map[String, Array[String]]]

    The dictionary of modes to enable multi-mode deidentification.

    The dictionary of modes to enable multi-mode deidentification.

    • 'obfuscate': Replace the values with random values.
    • 'mask_same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.
    • 'mask_entity_labels': Replace the values with the entity value.
    • 'mask_fixed_length_chars': Replace the name with the asterix with fixed length. You can also invoke "setFixedMaskLength()"
    • 'mask_entity_labels_without_brackets': Replace the values with the entity value without brackets.
    • 'mask_same_length_chars_without_brackets': Replace the name with the asterix with same length without brackets.
    • 'skip': Skip the entities (intact)

    The entities which have not been given in dictionary will deidentify according to setMode()

    Definition Classes
    BaseDeidParams
  188. def set[T](feature: StructFeature[T], value: T): BaseFhirDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  189. def set[K, V](feature: MapFeature[K, V], value: Map[K, V]): BaseFhirDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  190. def set[T](feature: SetFeature[T], value: Set[T]): BaseFhirDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  191. def set[T](feature: ArrayFeature[T], value: Array[T]): BaseFhirDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  192. final def set(paramPair: ParamPair[_]): BaseFhirDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    Params
  193. final def set(param: String, value: Any): BaseFhirDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    Params
  194. final def set[T](param: Param[T], value: T): BaseFhirDeIdentification.this.type
    Definition Classes
    Params
  195. def setAdditionalDateFormats(formats: Array[String]): BaseFhirDeIdentification.this.type

    Sets additionalDateFormats param

    Definition Classes
    BaseDeidParams
  196. def setAgeRanges(mode: Array[Int]): BaseFhirDeIdentification.this.type

    List of integers specifying limits of the age groups to preserve during obfuscation

    List of integers specifying limits of the age groups to preserve during obfuscation

    Definition Classes
    BaseDeidParams
  197. def setAgeRangesByHipaa(value: Boolean): BaseFhirDeIdentification.this.type

    Sets whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.

    Sets whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.

    The HIPAA Privacy Rule mandates that ages from patients older than 90 years must be obfuscated, while age for patients 90 years or younger can remain unchanged.

    value

    If true, age entities larger than 90 will be obfuscated as per HIPAA Privacy Rule, the others will remain unchanged. If false, ageRanges parameter is valid. Default: false.

    Definition Classes
    BaseDeidParams
  198. def setBlackListEntities(value: Array[String]): BaseFhirDeIdentification.this.type

    blackListEntities param is not supported in FhirDeIdentification.

    blackListEntities param is not supported in FhirDeIdentification. Please use mappingRules instead.

    Definition Classes
    BaseFhirDeIdentificationBaseDeidParams
    Exceptions thrown
  199. def setConsistentAcrossNameParts(value: Boolean): BaseFhirDeIdentification.this.type

    Sets the value of consistentAcrossNameParts.

    Sets the value of consistentAcrossNameParts.

    value

    Boolean flag to enforce consistency across name parts

    returns

    this instance

    Definition Classes
    BaseDeidParams
  200. def setCountryObfuscation(value: Boolean): BaseFhirDeIdentification.this.type

    countryObfuscation param is not supported in FhirDeIdentification.

    countryObfuscation param is not supported in FhirDeIdentification. Please use mappingRules instead.

    Definition Classes
    BaseFhirDeIdentificationBaseDeidParams
    Exceptions thrown
  201. def setCustomFakers(value: HashMap[String, List[String]]): BaseFhirDeIdentification.this.type
    Definition Classes
    LightDeIdentificationParams
  202. def setCustomFakers(value: Map[String, Array[String]]): BaseFhirDeIdentification.this.type

    Sets the value of customFakers.

    Sets the value of customFakers. The dictionary of custom fakers to specify the obfuscation terms for the entities. You can specify the entity and the terms to be used for obfuscation.

    Example:

    new LightDeIdentification()
     .setInputCols(Array("ner_chunk", "sentence")).setOutputCol("dei")
     .setMode("obfuscate")
     .setObfuscateRefSource("custom")
     .setCustomFakers(Map(
         "NAME" -> Array("George", "Taylor"),
         "SCHOOL" -> Array("Oxford", "Harvard"),
         "city" -> Array("ROMA")
     ))
    Definition Classes
    LightDeIdentificationParams
  203. def setDateEntities(value: Array[String]): BaseFhirDeIdentification.this.type

    Sets the value of dateEntities.

    Sets the value of dateEntities. Default: Array("DATE", "DOB", "DOD", "EFFDATE", "FISCAL_YEAR")

    Definition Classes
    BaseDeidParams
  204. def setDateFormats(s: Array[String]): BaseFhirDeIdentification.this.type

    Format of dates to displace

    Format of dates to displace

    Definition Classes
    BaseDeidParams
  205. def setDays(k: Int): BaseFhirDeIdentification.this.type

    Number of days to obfuscate the dates by displacement.

    Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used

    Definition Classes
    BaseDeidParams
  206. def setDefault[T](feature: StructFeature[T], value: () ⇒ T): BaseFhirDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  207. def setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): BaseFhirDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  208. def setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): BaseFhirDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  209. def setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): BaseFhirDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  210. final def setDefault(paramPairs: ParamPair[_]*): BaseFhirDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    Params
  211. final def setDefault[T](param: Param[T], value: T): BaseFhirDeIdentification.this.type
    Attributes
    protected[org.apache.spark.ml]
    Definition Classes
    Params
  212. def setEnableDefaultObfuscationEquivalents(value: Boolean): BaseFhirDeIdentification.this.type

    Sets whether to enable default obfuscation equivalents for common entities.

    Sets whether to enable default obfuscation equivalents for common entities. This parameter allows the system to automatically include a set of predefined common English name equivalents. Default: false

    Definition Classes
    BaseDeidParams
  213. def setFakerLengthOffset(value: Int): BaseFhirDeIdentification.this.type

    Sets fakerLengthOffset param

    Sets fakerLengthOffset param

    Definition Classes
    BaseDeidParams
  214. def setFixedMaskLength(value: Int): BaseFhirDeIdentification.this.type

    Sets the value of fixedMaskLength.

    Sets the value of fixedMaskLength. This is the length of the masking sequence that will be used when the 'fixed_length_chars' masking policy is selected.

    Definition Classes
    MaskingParams
  215. def setGenderAwareness(value: Boolean): BaseFhirDeIdentification.this.type

    Whether to use gender-aware names or not during obfuscation.

    Whether to use gender-aware names or not during obfuscation. This param effects only names. If value is true, it might decrease performance. Default: False

    Definition Classes
    BaseDeidParams
  216. def setGeoConsistency(value: Boolean): BaseFhirDeIdentification.this.type

    geoConsistency param is not supported in FhirDeIdentification.

    geoConsistency param is not supported in FhirDeIdentification.

    Definition Classes
    BaseFhirDeIdentificationBaseDeidParams
    Exceptions thrown
  217. def setInputCol(value: String): BaseFhirDeIdentification.this.type

    Set the input column name.

    Set the input column name. The input column should contain the FHIR string.

  218. def setIsRandomDateDisplacement(s: Boolean): BaseFhirDeIdentification.this.type

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities, if false use the DeIdentificationParams.days The default value is false.

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities, if false use the DeIdentificationParams.days The default value is false.

    Definition Classes
    BaseDeidParams
  219. def setKeepMonth(value: Boolean): BaseFhirDeIdentification.this.type

    Sets whether to keep the month intact when obfuscating date entities.

    Sets whether to keep the month intact when obfuscating date entities. If true, the month will remain unchanged during the obfuscation process. If false, the month will be modified along with the year and day. Default: false.

    Definition Classes
    BaseDeidParams
  220. def setKeepTextSizeForObfuscation(value: Boolean): BaseFhirDeIdentification.this.type

    Sets keepTextSizeForObfuscation param

    Definition Classes
    BaseDeidParams
  221. def setKeepYear(value: Boolean): BaseFhirDeIdentification.this.type

    Sets whether to keep the year intact when obfuscating date entities.

    Sets whether to keep the year intact when obfuscating date entities. If true, the year will remain unchanged during the obfuscation process. If false, the year will be modified along with the month and day. Default: false.

    Definition Classes
    BaseDeidParams
  222. def setLanguage(s: String): BaseFhirDeIdentification.this.type

    The language used to select the regex file and some faker entities.

    The language used to select the regex file and some faker entities. 'en'(English),'de'(German), 'es'(Spanish), 'fr'(French), 'ar'(Arabic) or 'ro'(Romanian). Default:'en'

    Definition Classes
    BaseDeidParams
  223. def setMappingRules(value: HashMap[String, String]): BaseFhirDeIdentification.this.type
  224. def setMappingRules(value: Map[String, String]): BaseFhirDeIdentification.this.type

    Sets FHIR field de-identification rules for primitive type obfuscation.

    Sets FHIR field de-identification rules for primitive type obfuscation.

    Overview

    Defines how specific FHIR elements should be de-identified using FHIR Path syntax. Supports all FHIR primitive types with built-in obfuscation strategies.

    Rule Format
    Map(
      "ResourceType.field.path" -> "SupportedEntityClass",
    )
    value

    A mapping between FHIR paths and target primitive types. Keys must use standard FHIR Path notation (dot-delimited). Values must be one of the supported de-identification entity classes or given as a custom list.

    Example:
    1. Basic Usage

      new FhirDeIdentification()
        .setMappingRules(Map(
           "Patient.birthDate" -> "Date",
           "Patient.name.given" -> "Name",
           "Patient.telecom.value" -> "Email",
           "Patient.address.city" -> "City",
        ))
    Exceptions thrown

    If:

    • Unsupported primitive type provided
    • Malformed FHIR path detected
    • Non-primitive field targeted
    Note

    Important Constraints: 1. Paths are case-sensitive and must match FHIR element names exactly 2. Array elements should use standard FHIR Path syntax (e.g., Patient.name.given) 3. Only primitive types are supported for de-identification

    See also

    FHIR Path Specification

  225. def setMaskingPolicy(value: String): BaseFhirDeIdentification.this.type

    Select the masking policy:

    Select the masking policy:

    • 'entity_labels': Replace the values with the entity value.
    • 'same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.If the entity is less than 3 chars (like Jo, or 5), we can just use asterix without brackets.
    • 'fixed_length_chars': Replace the obfuscated entity with a masking sequence composed of a fixed number of asterisk.
    • 'entity_labels_without_brackets': Replace the values with the entity value without brackets.
    • 'same_length_chars_without_brackets': Replace the name with the asterix with same length without brackets.
    • Default: 'entity_labels'
    Definition Classes
    MaskingParams
  226. def setMaxRandomDisplacementDays(value: Int): BaseFhirDeIdentification.this.type

    Sets maxRandomDisplacementDays param

    Definition Classes
    BaseDeidParams
  227. def setMode(m: String): BaseFhirDeIdentification.this.type

    Mode for Anonymizer ['mask'|'obfuscate'].

    Mode for Anonymizer ['mask'|'obfuscate']. Default: 'mask'

    • Mask mode: The entities will be replaced by their entity types.
    • Obfuscate mode: The entity is replaced by an obfuscator's term.
    Definition Classes
    BaseDeidParams
    Example:
    1. Given the following text: "David Hale visited EEUU a couple of years ago"

      • Mask mode: "<PERSON> visited <COUNTRY> a couple of years ago"
      • Obfuscate mode: "Bryan Johnson visited Japan a couple of years ago"
  228. def setObfuscateDate(s: Boolean): BaseFhirDeIdentification.this.type

    obfuscateDate param is not supported in FhirDeIdentification.

    obfuscateDate param is not supported in FhirDeIdentification. It is always true.

    Definition Classes
    BaseFhirDeIdentificationBaseDeidParams
    Exceptions thrown
  229. def setObfuscateRefSource(s: String): BaseFhirDeIdentification.this.type

    The source of obfuscation to obfuscate the entities.

    The source of obfuscation to obfuscate the entities. The values are the following: 'file': Takes the fakes from the obfuscatorRefFile 'faker': Takes the fakes from the Faker module 'both': Takes the fakes from the obfuscatorRefFile and the faker module randomly.

    Definition Classes
    BaseDeidParams
  230. def setObfuscateZipByHipaa(value: Boolean): BaseFhirDeIdentification.this.type

    Sets whether HIPAA Safe Harbor ZIP obfuscation rules should be applied.

    Sets whether HIPAA Safe Harbor ZIP obfuscation rules should be applied.

    Behavior:

    • true: Apply HIPAA rules as described in obfuscateZipByHipaa: extract five digits, map restricted 3-digit prefixes to \"000**\", otherwise generalize to XXX**. The +4 portion will be masked with asterisks if present.
    • false: Do **not** apply HIPAA Safe Harbor behavior; use the component's default/custom ZIP obfuscation instead.

    Implementation & defaults:

    • Default: false (HIPAA behavior is opt-in). If you want HIPAA Safe Harbor behavior by default, change the default value where the parameter is declared.
    Definition Classes
    BaseDeidParams
  231. def setObfuscateZipKeepDigits(value: Int): BaseFhirDeIdentification.this.type

    Sets the number of leading ZIP code digits to preserve when applying HIPAA-based ZIP obfuscation.

    Sets the number of leading ZIP code digits to preserve when applying HIPAA-based ZIP obfuscation. This parameter is only effective when obfuscateZipByHipaa is enabled.

    Behavior:

    • Preserves the first value digits of the ZIP code.
    • Masks all remaining digits (including the ZIP+4 segment, if present) with asterisks (*).
    • Default: 3

    Examples:

    • 12345 → 123**
    • If value = 2: 12345 → 12***

    This setting overrides the default HIPAA Safe Harbor ZIP generalization pattern (XXX**) by allowing clients to customize how many digits remain unmasked under expert-determination requirements.

    Definition Classes
    BaseDeidParams
  232. def setObfuscationEquivalents(equivalents: ArrayList[ArrayList[String]]): BaseFhirDeIdentification.this.type
    Definition Classes
    BaseDeidParams
  233. def setObfuscationEquivalents(equivalents: Array[Array[String]]): BaseFhirDeIdentification.this.type

    Sets variant-to-canonical entity mappings to ensure consistent obfuscation.

    Sets variant-to-canonical entity mappings to ensure consistent obfuscation.

    This method allows you to define equivalence rules for entity variants that should be obfuscated the same way. For example, the names "Alex" and "Alexander" will always be mapped to the same obfuscated value if they are linked to the same canonical form.

    It accepts an array of string triplets, where each triplet defines:

    • variant: A non-standard, short, or alternative form of a value (e.g., "Alex")
    • entityType: The type of the entity (e.g., "NAME", "STATE", "COUNTRY")
    • canonical: The standardized form all variants map to (e.g., "Alexander")

    variant and entityType comparisons are case-insensitive during processing.

    This is especially useful in de-identification tasks to ensure consistent replacement of semantically identical values. It also allows cross-variant normalization across different occurrences of sensitive data.

    Example
    val equivalents = Array(
      Array("Alex", "NAME", "Alexander"),
      Array("Rob", "NAME", "Robert"),
      Array("CA", "STATE", "California"),
      Array("Calif.", "STATE", "California")
    )
    
    myDeidTransformer.setObfuscationEquivalents(equivalents)
    equivalents

    Array of [variant, entityType, canonical] entries.

    Definition Classes
    BaseDeidParams
    Exceptions thrown

    IllegalArgumentException if any entry does not have exactly 3 elements.

  234. def setObfuscationEquivalents(equivalents: Array[StaticObfuscationEntity]): BaseFhirDeIdentification.this.type

    Sets obfuscationEquivalents param.

    Definition Classes
    BaseDeidParams
  235. final def setOutputCol(value: String): BaseFhirDeIdentification.this.type
    Definition Classes
    HasOutputAnnotationCol
  236. def setRegion(value: String): BaseFhirDeIdentification.this.type

    region param is not supported in FhirDeIdentification.

    region param is not supported in FhirDeIdentification. Please use dateFormats instead.

    Definition Classes
    BaseFhirDeIdentificationBaseDeidParams
    Exceptions thrown
  237. def setSameLengthFormattedEntities(entities: Array[String]): BaseFhirDeIdentification.this.type

    List of formatted entities to generate the same length outputs as original ones during obfuscation.

    List of formatted entities to generate the same length outputs as original ones during obfuscation. The supported and default formatted entities are: PHONE, FAX, CONTACT, ID, IDNUM, BIOID, MEDICALRECORD, ZIP, VIN, SSN, DLN, LICENSE, PLATE, IRS, CFN, ACCOUNT.

    Definition Classes
    BaseDeidParams
  238. def setSeed(s: Int): BaseFhirDeIdentification.this.type

    It is the seed to select the entities on obfuscate mode.

    It is the seed to select the entities on obfuscate mode. With the seed, you can reply to an execution several times with the same output.

    Definition Classes
    BaseDeidParams
  239. def setSelectiveObfuscateRefSource(value: HashMap[String, String]): BaseFhirDeIdentification.this.type
    Definition Classes
    BaseDeidParams
  240. def setSelectiveObfuscateRefSource(value: Map[String, String]): BaseFhirDeIdentification.this.type

    Sets the value of selectiveObfuscateRefSource.

    Sets the value of selectiveObfuscateRefSource. This is used to selectively apply different obfuscation methods to specific entities. The keys are entity names and the values are the obfuscation sources. If an entity is not specified in this map, the obfuscateRefSource param is used to determine the obfuscation method. The values can be: - 'file': Takes the fakes from the file. - 'faker': Takes the fakes from the embedded faker module. - 'both': Takes the fakes from the file and the faker module.

    Definition Classes
    BaseDeidParams
    Example:
    1. val modes = Map(
       "PHONE" -> "file",
       "EMAIL" -> "faker",
       "NAME" -> "faker",
       "ADDRESS" -> "both"
       )
  241. def setSelectiveObfuscationModes(value: HashMap[String, List[String]]): BaseFhirDeIdentification.this.type
    Definition Classes
    BaseDeidParams
  242. def setSelectiveObfuscationModes(value: Map[String, Array[String]]): BaseFhirDeIdentification.this.type

    Sets the value of selectiveObfuscationModes.

    Sets the value of selectiveObfuscationModes. The dictionary of modes to enable multi-mode deidentification.

    • 'obfuscate': Replace the values with random values.
    • 'mask_same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.
    • 'mask_entity_labels': Replace the values with the entity value.
    • 'mask_fixed_length_chars': Replace the name with the asterix with fixed length. You should also invoke "setFixedMaskLength()"
    • 'mask_entity_labels_without_brackets': Replace the values with the entity value without brackets.
    • 'mask_same_length_chars_without_brackets': Replace the name with the asterix with same length without brackets.
    • 'skip': Skip the entities (intact)

    The entities which have not been given in dictionary will deidentify according to setMode()

    Example:

    deidAnnotator
    .setMode("mask")
    .setSelectiveObfuscationModes(Map(
        "OBFUSCATE" -> Array("PHONE", "email"),
        "mask_entity_labels" -> Array("NAME", "CITY"),
        "skip" -> Array("id", "idnum"),
        "mask_same_length_chars" -> Array("fax"),
        "mask_fixed_length_chars" -> Array("zip")
    ))
    .setFixedMaskLength(4)
    Definition Classes
    BaseDeidParams
  243. def setStaticObfuscationPairs(pairs: ArrayList[ArrayList[String]]): BaseFhirDeIdentification.this.type
    Definition Classes
    BaseDeidParams
  244. def setStaticObfuscationPairs(pairs: Array[StaticObfuscationEntity]): BaseFhirDeIdentification.this.type
    Definition Classes
    BaseDeidParams
  245. def setStaticObfuscationPairs(pairs: Array[Array[String]]): BaseFhirDeIdentification.this.type

    Sets the static obfuscation pairs.

    Sets the static obfuscation pairs. Each pair should contain three elements: original, entity type, and fake. The pairs must have exactly 3 elements: [original, entityType, fake].

    pairs

    An array of arrays containing the static obfuscation pairs.

    Definition Classes
    BaseDeidParams
  246. def setUnnormalizedDateMode(mode: String): BaseFhirDeIdentification.this.type

    The mode to use if the date is not formatted.

    The mode to use if the date is not formatted. Options: [mask, obfuscate, skip] Default: obfuscate

    Definition Classes
    BaseDeidParams
  247. def setUseShiftDays(s: Boolean): BaseFhirDeIdentification.this.type

    useShiftDays param is not supported in FhirDeIdentification.

    useShiftDays param is not supported in FhirDeIdentification. Please use days instead.

    Definition Classes
    BaseFhirDeIdentificationBaseDeidParams
    Exceptions thrown
  248. def shouldUseConsistentNameParts(entityClass: String): Boolean
    Attributes
    protected
    Definition Classes
    DeidModelParams
  249. val staticObfuscationPairs: StructFeature[Array[StaticObfuscationEntity]]

    A resource containing static obfuscation pairs.

    A resource containing static obfuscation pairs. Each pair should contain three elements: original, entity type, and fake.

    Definition Classes
    BaseDeidParams
  250. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  251. final def throwUnSupportedError(): Nothing
    Attributes
    protected
  252. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  253. def transform(dataset: Dataset[_]): DataFrame
    Definition Classes
    BaseFhirDeIdentification → Transformer
  254. def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" )
  255. def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" ) @varargs()
  256. def transformSchema(schema: StructType): StructType
    Definition Classes
    BaseFhirDeIdentification → PipelineStage
  257. def transformSchema(schema: StructType, logging: Boolean): StructType
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  258. val unnormalizedDateMode: Param[String]

    The mode to use if the date is not formatted.

    The mode to use if the date is not formatted. Options: [mask, obfuscate, skip] Default: obfuscate

    Definition Classes
    BaseDeidParams
  259. val useShiftDays: BooleanParam

    Whether to use the random shift day when the document has this in its metadata.

    Whether to use the random shift day when the document has this in its metadata. DocumentHashCoder can create 'dateshift' based on the document. Default: false

    Definition Classes
    BaseDeidParams
  260. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  261. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  262. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  263. def write: MLWriter
    Definition Classes
    ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable

Inherited from ParamsAndFeaturesWritable

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from HasOutputAnnotationCol

Inherited from HasInputCol

Inherited from CheckLicense

Inherited from DeidModelParams

Inherited from MaskingParams

Inherited from BaseDeidParams

Inherited from HasFeatures

Inherited from Transformer

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Parameter setters

Parameter getters

Ungrouped