c

com.johnsnowlabs.nlp.annotators.deid

ObfuscatorAnnotatorModel

class ObfuscatorAnnotatorModel extends AnnotatorModel[ObfuscatorAnnotatorModel] with ObfuscatorParams with DeidModelParams with HasSimpleAnnotate[ObfuscatorAnnotatorModel]

Linear Supertypes
HasSimpleAnnotate[ObfuscatorAnnotatorModel], DeidModelParams, ObfuscatorParams, BaseDeidParams, AnnotatorModel[ObfuscatorAnnotatorModel], CanBeLazy, RawAnnotator[ObfuscatorAnnotatorModel], HasOutputAnnotationCol, HasInputAnnotationCols, HasOutputAnnotatorType, ParamsAndFeaturesWritable, HasFeatures, DefaultParamsWritable, MLWritable, Model[ObfuscatorAnnotatorModel], Transformer, PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. ObfuscatorAnnotatorModel
  2. HasSimpleAnnotate
  3. DeidModelParams
  4. ObfuscatorParams
  5. BaseDeidParams
  6. AnnotatorModel
  7. CanBeLazy
  8. RawAnnotator
  9. HasOutputAnnotationCol
  10. HasInputAnnotationCols
  11. HasOutputAnnotatorType
  12. ParamsAndFeaturesWritable
  13. HasFeatures
  14. DefaultParamsWritable
  15. MLWritable
  16. Model
  17. Transformer
  18. PipelineStage
  19. Logging
  20. Params
  21. Serializable
  22. Serializable
  23. Identifiable
  24. AnyRef
  25. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new ObfuscatorAnnotatorModel()
  2. new ObfuscatorAnnotatorModel(uid: String)

Type Members

  1. type AnnotationContent = Seq[Row]
    Attributes
    protected
    Definition Classes
    AnnotatorModel
  2. type AnnotatorType = String
    Definition Classes
    HasOutputAnnotatorType

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T
    Attributes
    protected
    Definition Classes
    Params
  4. def $$[T](feature: StructFeature[T]): T
    Attributes
    protected
    Definition Classes
    HasFeatures
  5. def $$[K, V](feature: MapFeature[K, V]): Map[K, V]
    Attributes
    protected
    Definition Classes
    HasFeatures
  6. def $$[T](feature: SetFeature[T]): Set[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  7. def $$[T](feature: ArrayFeature[T]): Array[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  8. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  9. val GEOGRAPHIC_ENTITIES_PRIORITY: Map[String, Int]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  10. val GEO_METADATA_KEY: String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  11. def _transform(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): DataFrame
    Attributes
    protected
    Definition Classes
    AnnotatorModel
  12. val additionalDateFormats: StringArrayParam

    Additional date formats to be considered during date obfuscation.

    Additional date formats to be considered during date obfuscation. This allows users to specify custom date formats in addition to the default dateFormats.

    Definition Classes
    BaseDeidParams
  13. def afterAnnotate(dataset: DataFrame): DataFrame
    Attributes
    protected
    Definition Classes
    AnnotatorModel
  14. val ageRanges: IntArrayParam

    List of integers specifying limits of the age groups to preserve during obfuscation

    List of integers specifying limits of the age groups to preserve during obfuscation

    Definition Classes
    BaseDeidParams
  15. val ageRangesByHipaa: BooleanParam

    A Boolean variable indicating whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.

    A Boolean variable indicating whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.

    The HIPAA Privacy Rule mandates that ages from patients older than 90 years must be obfuscated, while age for patients 90 years or younger can remain unchanged.

    When true, age entities larger than 90 will be obfuscated as per HIPAA Privacy Rule, the others will remain unchanged. When false, ageRanges parameter is valid.

    Definition Classes
    BaseDeidParams
  16. val allTerms: SetFeature[String]
  17. def annotate(annotations: Seq[Annotation]): Seq[Annotation]
    Definition Classes
    ObfuscatorAnnotatorModel → HasSimpleAnnotate
  18. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  19. def beforeAnnotate(dataset: Dataset[_]): Dataset[_]
    Attributes
    protected
    Definition Classes
    AnnotatorModel
  20. final def checkSchema(schema: StructType, inputAnnotatorType: String): Boolean
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  21. final def clear(param: Param[_]): ObfuscatorAnnotatorModel.this.type
    Definition Classes
    Params
  22. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  23. lazy val combinedDateFormats: Array[String]
    Attributes
    protected
    Definition Classes
    BaseDeidParams
  24. val consistentAcrossNameParts: BooleanParam

    Param that indicates whether consistency should be enforced across different parts of a name (e.g., first name, middle name, last name).

    Param that indicates whether consistency should be enforced across different parts of a name (e.g., first name, middle name, last name). When set to true, the same transformation or obfuscation will be applied consistently to all parts of the same name entity, even if those parts appear separately.

    For example, if "John Smith" is obfuscated as "Liam Brown", then:

    • When the full name "John Smith" appears, it will be replaced with "Liam Brown"
    • When "John" or "Smith" appear individually, they will still be obfuscated as "Liam" and "Brown" respectively, ensuring consistency in name transformation.

    Default: true

    Definition Classes
    BaseDeidParams
  25. def copy(extra: ParamMap): ObfuscatorAnnotatorModel
    Definition Classes
    RawAnnotator → Model → Transformer → PipelineStage → Params
  26. def copyValues[T <: Params](to: T, extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  27. val countryObfuscation: BooleanParam

    Whether to obfuscate country entities or not.

    Whether to obfuscate country entities or not. If true, country entities will be obfuscated using the Faker module. If false, country entities will be skipped during obfuscation. Default: false

    Definition Classes
    BaseDeidParams
  28. val dateEntities: StringArrayParam

    List of date entities.

    List of date entities. Default: Array("DATE", "DOB", "DOD", "EFFDATE", "FISCAL_YEAR")

    Definition Classes
    BaseDeidParams
  29. val dateFormats: StringArrayParam

    Format of dates to displace

    Format of dates to displace

    Definition Classes
    BaseDeidParams
  30. val days: IntParam

    Number of days to obfuscate the dates by displacement.

    Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used

    Definition Classes
    BaseDeidParams
  31. final def defaultCopy[T <: Params](extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  32. def dfAnnotate: UserDefinedFunction
    Definition Classes
    HasSimpleAnnotate
  33. val entity: Param[String]
    Definition Classes
    ObfuscatorParams
  34. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  35. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  36. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  37. def explainParams(): String
    Definition Classes
    Params
  38. def extraValidate(structType: StructType): Boolean
    Attributes
    protected
    Definition Classes
    RawAnnotator
  39. def extraValidateMsg: String
    Attributes
    protected
    Definition Classes
    RawAnnotator
  40. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  41. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  42. val fakerLengthOffset: IntParam

    It specifies how much length deviation is accepted in obfuscation, with keepTextSizeForObfuscation enabled.

    It specifies how much length deviation is accepted in obfuscation, with keepTextSizeForObfuscation enabled. Value must be greater than 0. Default is 3.

    Definition Classes
    BaseDeidParams
  43. val features: ArrayBuffer[Feature[_, _, _]]
    Definition Classes
    HasFeatures
  44. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  45. val genderAwareness: BooleanParam

    Whether to use gender-aware names or not during obfuscation.

    Whether to use gender-aware names or not during obfuscation. This param effects only names. If value is true, it might decrease performance. Default: False

    Definition Classes
    BaseDeidParams
  46. def generateFakeBySameLength(wordToReplace: String, entity: String): String

    obfuscating digits to new digits, letters to new letters and others remains the same

    obfuscating digits to new digits, letters to new letters and others remains the same

    Definition Classes
    DeidModelParams
  47. def generateFakeBySameLengthUsingHash(wordToReplace: String, entity: String): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  48. val geoConsistency: BooleanParam

    Whether to enforce consistent obfuscation across geographical entities: state, city, street, zip and phone.

    Whether to enforce consistent obfuscation across geographical entities: state, city, street, zip and phone.

    ## Functionality Overview This parameter enables intelligent geographical entity obfuscation that maintains realistic relationships between different geographic components. When enabled, the system ensures that obfuscated addresses form coherent, valid combinations rather than random replacements.

    ## Supported Entity Types The following geographical entities are processed with priority order: - **state** (Priority: 0) - US state names - **city** (Priority: 1) - City names - **zip** (Priority: 2) - Zip codes - **street** (Priority: 3) - Street addresses - **phone** (Priority: 4) - Phone numbers

    ## Language Requirement **IMPORTANT**: Geographic consistency is only applied when: - geoConsistency parameter is set to true AND - language parameter is set to en

    For non-English configurations, this feature is automatically disabled regardless of the parameter setting.

    ## Consistency Algorithm When geographical entities comes from the chunk columns:

    1. **Entity Grouping**: All geographic entities are identified and grouped by type 2. **Fake Address Selection**: A consistent set of fake US addresses is selected using hash-based deterministic selection to ensure reproducibility 3. **Priority-Based Mapping**: Entities are mapped to fake addresses following the priority order (state → city → zip → street → phone) 4. **Consistent Replacement**: All entities of the same type within a document use the same fake address pool, maintaining geographical coherence

    ## Parameter Interactions **IMPORTANT**: Enabling this parameter automatically disables: - keepTextSizeForObfuscation - Text size preservation is not maintained - consistentObfuscation - Standard consistency rules are overridden - file-based fakers

    This is necessary because geographic consistency requires specific fake address selection that may not preserve original text lengths or follow standard obfuscation patterns.

    default: false

    Definition Classes
    BaseDeidParams
  49. def get[T](feature: StructFeature[T]): Option[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  50. def get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  51. def get[T](feature: SetFeature[T]): Option[Set[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  52. def get[T](feature: ArrayFeature[T]): Option[Array[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  53. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  54. def getAdditionalDateFormats: Array[String]

    Gets the value of additionalDateFormats

    Gets the value of additionalDateFormats

    Definition Classes
    BaseDeidParams
  55. def getAgeRanges: Array[Int]

    Gets ageRanges param.

    Gets ageRanges param.

    Definition Classes
    BaseDeidParams
  56. def getAgeRangesByHipaa: Boolean

    Gets the value of ageRangesByHipaa.

    Gets the value of ageRangesByHipaa.

    Definition Classes
    BaseDeidParams
  57. def getAllTerms: Set[String]
  58. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  59. def getConsistentAcrossNameParts: Boolean

    Gets the value of consistentAcrossNameParts.

    Gets the value of consistentAcrossNameParts.

    Definition Classes
    BaseDeidParams
  60. def getCountryObfuscation: Boolean

    Gets the value of countryObfuscation.

    Gets the value of countryObfuscation.

    Definition Classes
    BaseDeidParams
  61. def getDateEntities: Array[String]

    Gets dateEntities param.

    Gets dateEntities param.

    Definition Classes
    BaseDeidParams
  62. def getDateFormats: Array[String]

    Gets the value of dateFormats

    Gets the value of dateFormats

    Definition Classes
    BaseDeidParams
  63. def getDays: Int

    Gets days param

    Gets days param

    Definition Classes
    BaseDeidParams
  64. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  65. def getEntitiesBySentence(chunks: Seq[Annotation], sentenceCount: Int): Seq[Seq[Annotation]]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  66. def getEntity: String
    Definition Classes
    ObfuscatorParams
  67. def getEntityBasedObfuscationRefSource(entityClass: String): String
    Attributes
    protected
    Definition Classes
    BaseDeidParams
  68. def getEntityField(annotation: Annotation): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  69. def getExternalFakers(entityClass: String, customFakers: Map[String, List[String]], wordToReplace: String): List[String]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  70. def getFakeByHashcode(fakes: Seq[String], wordToReplace: String, entity: String, seed: Int): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  71. def getFakeWithSameSize(fakes: Seq[String], wordToReplace: String, entity: String, lengthDeviation: Int, seed: Int): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  72. def getFakerLengthOffset: Int

    Gets fakerLengthOffset param

    Gets fakerLengthOffset param

    Definition Classes
    BaseDeidParams
  73. def getFakersEntity(entity: String, result: String): Seq[String]
    Definition Classes
    DeidModelParams
  74. def getGenderAwareness: Boolean

    Gets genderAwareness param.

    Gets genderAwareness param.

    Definition Classes
    BaseDeidParams
  75. def getGeoConsistency: Boolean

    Gets the value of geoConsistency.

    Gets the value of geoConsistency.

    Definition Classes
    BaseDeidParams
  76. def getInputCols: Array[String]
    Definition Classes
    HasInputAnnotationCols
  77. def getKeepMonth: Boolean

    Gets keepMonth param

    Gets keepMonth param

    Definition Classes
    BaseDeidParams
  78. def getKeepTextSizeForObfuscation: Boolean

    Gets keepTextSizeForObfuscation param

    Definition Classes
    BaseDeidParams
  79. def getKeepYear: Boolean

    Gets keepYear param

    Gets keepYear param

    Definition Classes
    BaseDeidParams
  80. def getLanguage: String

    Gets language param.

    Gets language param.

    Definition Classes
    BaseDeidParams
  81. def getLazyAnnotator: Boolean
    Definition Classes
    CanBeLazy
  82. def getMaxSentence(annotations: Seq[Annotation]): Int
    Attributes
    protected
    Definition Classes
    DeidModelParams
  83. def getMode: String

    Gets mode param.

    Gets mode param.

    Definition Classes
    BaseDeidParams
  84. def getObfuscateDate: Boolean

    Gets obfuscateDate param

    Gets obfuscateDate param

    Definition Classes
    BaseDeidParams
  85. def getObfuscateRefSource: String

    Gets obfuscateRefSource param.

    Gets obfuscateRefSource param.

    Definition Classes
    BaseDeidParams
  86. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  87. final def getOutputCol: String
    Definition Classes
    HasOutputAnnotationCol
  88. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  89. def getRegion: String

    Gets region param.

    Gets region param.

    Definition Classes
    BaseDeidParams
  90. def getSameLengthFormattedEntities(): Array[String]
    Definition Classes
    BaseDeidParams
  91. def getSeed(): Int
    Definition Classes
    BaseDeidParams
  92. def getSelectiveObfuscateRefSource: Map[String, String]

    Gets selectiveObfuscateRefSource param.

    Definition Classes
    BaseDeidParams
  93. def getSelectiveObfuscateRefSourceAsStr: String
    Definition Classes
    BaseDeidParams
  94. def getShiftDaysFromSentences(sentences: Seq[Annotation]): Option[Int]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  95. def getStaticObfuscationFakes(entityClass: String, wordToReplace: String): Option[Seq[String]]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  96. def getStaticObfuscationPairs: Option[Array[StaticObfuscationEntity]]
    Definition Classes
    BaseDeidParams
  97. def getUnnormalizedDateMode: String

    Gets unnormalizedDateMode param.

    Definition Classes
    BaseDeidParams
  98. def getUseShiftDays: Boolean

    Gets useShiftDays param.

    Gets useShiftDays param.

    Definition Classes
    BaseDeidParams
  99. def getValidAgeRanges: Array[Int]

    Gets valid ageRanges whether ageRangesByHipaa is true or not.

    Gets valid ageRanges whether ageRangesByHipaa is true or not.

    Attributes
    protected
    Definition Classes
    BaseDeidParams
  100. def handleCasing(originalFake: String, wordToReplace: String): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  101. def handleGeographicConsistency(protectedEntities: Seq[Seq[Annotation]]): Seq[Seq[Annotation]]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  102. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  103. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  104. def hasParent: Boolean
    Definition Classes
    Model
  105. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  106. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  107. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  108. val inputAnnotatorTypes: Array[AnnotatorType]
    Definition Classes
    ObfuscatorAnnotatorModel → HasInputAnnotationCols
  109. final val inputCols: StringArrayParam
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  110. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  111. def isEmptyString(value: String): Boolean
    Attributes
    protected
    Definition Classes
    DeidModelParams
  112. def isGeoEntity(annotation: Annotation): Boolean
    Attributes
    protected
    Definition Classes
    DeidModelParams
  113. def isGeoObfuscationEnabled: Boolean
    Attributes
    protected
    Definition Classes
    DeidModelParams
  114. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  115. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  116. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  117. val keepMonth: BooleanParam

    Whether to keep the month intact when obfuscating date entities.

    Whether to keep the month intact when obfuscating date entities. If true, the month will remain unchanged during the obfuscation process. If false, the month will be modified along with the year and day. Default: false.

    Definition Classes
    BaseDeidParams
  118. val keepTextSizeForObfuscation: BooleanParam

    It specifies whether the output should maintain the same character length as the input text.

    It specifies whether the output should maintain the same character length as the input text. the output text will remain the same if same length is available, else length might vary.

    Definition Classes
    BaseDeidParams
  119. val keepYear: BooleanParam

    Whether to keep the year intact when obfuscating date entities.

    Whether to keep the year intact when obfuscating date entities. If true, the year will remain unchanged during the obfuscation process. If false, the year will be modified along with the month and day. Default: false.

    Definition Classes
    BaseDeidParams
  120. val language: Param[String]

    The language used to select the regex file and some faker entities.

    The language used to select the regex file and some faker entities. 'en'(English),'de'(German), 'es'(Spanish), 'fr'(French), 'ar'(Arabic) or 'ro'(Romanian) Default:'en'

    Definition Classes
    BaseDeidParams
  121. val lazyAnnotator: BooleanParam
    Definition Classes
    CanBeLazy
  122. implicit lazy val locale: Locale
    Attributes
    protected
    Definition Classes
    DeidModelParams
  123. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  124. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  125. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  126. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  127. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  128. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  129. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  130. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  131. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  132. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  133. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  134. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  135. val mode: Param[String]

    Mode for Anonymizer ['mask' or 'obfuscate'].

    Mode for Anonymizer ['mask' or 'obfuscate']. Default: 'mask'

    • Mask mode: The entities will be replaced by their entity types.
    • Obfuscate mode: The entity is replaced by an obfuscator's term.
    Definition Classes
    BaseDeidParams
    Example:
    1. Given the following text: "David Hale visited EEUU a couple of years ago"

      • Mask mode: "<PERSON> visited <COUNTRY> a couple of years ago"
      • Obfuscate mode: "Bryan Johnson visited Japan a couple of years ago"
  136. def msgHelper(schema: StructType): String
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  137. val nameEntities: Seq[String]
    Attributes
    protected
    Definition Classes
    DeidModelParams
  138. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  139. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  140. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  141. val obfuscateDate: BooleanParam

    When mode=="obfuscate" whether to obfuscate dates or not.

    When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to true, make sure dateFormats param fits the needs. If the value is true and obfuscation is failed, then unnormalizedDateMode will be activated. When setting to 'false', then the date will be masked to <DATE>. Default: false

    Definition Classes
    BaseDeidParams
  142. def obfuscateNameEntity(originalName: String, keepTextSize: Boolean, lengthDeviation: Int, namePartsMemory: Map[String, String]): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  143. val obfuscateRefSource: Param[String]

    The source of obfuscation to obfuscate the entities.

    The source of obfuscation to obfuscate the entities. The values ar the following: 'file': Takes the entities from the obfuscatorRefFile 'faker': Takes the entities from the Faker module 'both': Takes the entities from the obfuscatorRefFile and the faker module randomly.

    Definition Classes
    BaseDeidParams
  144. def onWrite(path: String, spark: SparkSession): Unit
    Attributes
    protected
    Definition Classes
    ParamsAndFeaturesWritable
  145. val optionalInputAnnotatorTypes: Array[String]
    Definition Classes
    HasInputAnnotationCols
  146. val outputAnnotatorType: AnnotatorType
    Definition Classes
    ObfuscatorAnnotatorModel → HasOutputAnnotatorType
  147. final val outputCol: Param[String]
    Attributes
    protected
    Definition Classes
    HasOutputAnnotationCol
  148. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  149. var parent: Estimator[ObfuscatorAnnotatorModel]
    Definition Classes
    Model
  150. lazy val randomDateFormat: String
    Attributes
    protected
    Definition Classes
    BaseDeidParams
  151. val region: Param[String]

    With this property, you can select particular dateFormats.

    With this property, you can select particular dateFormats. This property is especially used when obfuscating dates. You can decide whether the first part of 11/11/2023 is a day or the second part is a day when obfuscating dates.

    • The values are following:
    • 'eu' for European Union
    • 'us' for USA
    Definition Classes
    BaseDeidParams
  152. val sameLengthFormattedEntities: StringArrayParam

    List of formatted entities to generate the same length outputs as original ones during obfuscation.

    List of formatted entities to generate the same length outputs as original ones during obfuscation. The supported and default formatted entities are: "phone", "fax", "contact," "id", "idnum", "bioid", "medicalrecord", "zip", "vin", "ssn", "dln", "plate", "license", "IRS", "CFN", "account".

    Definition Classes
    BaseDeidParams
  153. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  154. val seed: IntParam

    It is the seed to select the entities on obfuscate mode.

    It is the seed to select the entities on obfuscate mode. With the seed, you can reply to an execution several times with the same output.

    Definition Classes
    BaseDeidParams
  155. def selectFakeFromAllFakes(wordToReplace: String, entityClass: String, maskedEntity: String, allFakes: Seq[String]): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  156. val selectiveObfuscateRefSource: MapFeature[String, String]

    A map of entity names to their obfuscation modes.

    A map of entity names to their obfuscation modes. This is used to selectively apply different obfuscation methods to specific entities. The keys are entity names and the values are the obfuscation sources. If an entity is not specified in this map, the obfuscateRefSource param is used to determine the obfuscation source.

    Definition Classes
    BaseDeidParams
    Example:
    1. val selectiveSources = Map(
       "PHONE" -> "file",
       "EMAIL" -> "faker",
       "NAME" -> "faker",
       "ADDRESS" -> "both"
       )
  157. def set[T](feature: StructFeature[T], value: T): ObfuscatorAnnotatorModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  158. def set[K, V](feature: MapFeature[K, V], value: Map[K, V]): ObfuscatorAnnotatorModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  159. def set[T](feature: SetFeature[T], value: Set[T]): ObfuscatorAnnotatorModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  160. def set[T](feature: ArrayFeature[T], value: Array[T]): ObfuscatorAnnotatorModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  161. final def set(paramPair: ParamPair[_]): ObfuscatorAnnotatorModel.this.type
    Attributes
    protected
    Definition Classes
    Params
  162. final def set(param: String, value: Any): ObfuscatorAnnotatorModel.this.type
    Attributes
    protected
    Definition Classes
    Params
  163. final def set[T](param: Param[T], value: T): ObfuscatorAnnotatorModel.this.type
    Definition Classes
    Params
  164. def setAdditionalDateFormats(formats: Array[String]): ObfuscatorAnnotatorModel.this.type

    Sets additionalDateFormats param

    Definition Classes
    BaseDeidParams
  165. def setAgeRanges(mode: Array[Int]): ObfuscatorAnnotatorModel.this.type

    List of integers specifying limits of the age groups to preserve during obfuscation

    List of integers specifying limits of the age groups to preserve during obfuscation

    Definition Classes
    BaseDeidParams
  166. def setAgeRangesByHipaa(value: Boolean): ObfuscatorAnnotatorModel.this.type

    Sets whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.

    Sets whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.

    The HIPAA Privacy Rule mandates that ages from patients older than 90 years must be obfuscated, while age for patients 90 years or younger can remain unchanged.

    value

    If true, age entities larger than 90 will be obfuscated as per HIPAA Privacy Rule, the others will remain unchanged. If false, ageRanges parameter is valid. Default: false.

    Definition Classes
    BaseDeidParams
  167. def setAllTerms(value: Set[String]): ObfuscatorAnnotatorModel.this.type
  168. def setConsistentAcrossNameParts(value: Boolean): ObfuscatorAnnotatorModel.this.type

    Sets the value of consistentAcrossNameParts.

    Sets the value of consistentAcrossNameParts.

    value

    Boolean flag to enforce consistency across name parts

    returns

    this instance

    Definition Classes
    BaseDeidParams
  169. def setCountryObfuscation(value: Boolean): ObfuscatorAnnotatorModel.this.type

    Sets whether to obfuscate country entities or not.

    Sets whether to obfuscate country entities or not. If true, country entities will be obfuscated using the Faker module. If false, country entities will be skipped during obfuscation. Default: false

    Definition Classes
    BaseDeidParams
  170. def setDateEntities(value: Array[String]): ObfuscatorAnnotatorModel.this.type

    Sets the value of dateEntities.

    Sets the value of dateEntities. Default: Array("DATE", "DOB", "DOD", "EFFDATE", "FISCAL_YEAR")

    Definition Classes
    BaseDeidParams
  171. def setDateFormats(s: Array[String]): ObfuscatorAnnotatorModel.this.type

    Format of dates to displace

    Format of dates to displace

    Definition Classes
    BaseDeidParams
  172. def setDays(k: Int): ObfuscatorAnnotatorModel.this.type

    Number of days to obfuscate the dates by displacement.

    Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used

    Definition Classes
    BaseDeidParams
  173. def setDefault[T](feature: StructFeature[T], value: () ⇒ T): ObfuscatorAnnotatorModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  174. def setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): ObfuscatorAnnotatorModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  175. def setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): ObfuscatorAnnotatorModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  176. def setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): ObfuscatorAnnotatorModel.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  177. final def setDefault(paramPairs: ParamPair[_]*): ObfuscatorAnnotatorModel.this.type
    Attributes
    protected
    Definition Classes
    Params
  178. final def setDefault[T](param: Param[T], value: T): ObfuscatorAnnotatorModel.this.type
    Attributes
    protected[org.apache.spark.ml]
    Definition Classes
    Params
  179. def setEntity(e: String): ObfuscatorAnnotatorModel.this.type
    Definition Classes
    ObfuscatorParams
  180. def setFakerLengthOffset(value: Int): ObfuscatorAnnotatorModel.this.type

    Sets fakerLengthOffset param

    Sets fakerLengthOffset param

    Definition Classes
    BaseDeidParams
  181. def setGenderAwareness(value: Boolean): ObfuscatorAnnotatorModel.this.type

    Whether to use gender-aware names or not during obfuscation.

    Whether to use gender-aware names or not during obfuscation. This param effects only names. If value is true, it might decrease performance. Default: False

    Definition Classes
    BaseDeidParams
  182. def setGeoConsistency(value: Boolean): ObfuscatorAnnotatorModel.this.type

    Sets the value of geoConsistency.

    Sets the value of geoConsistency. When set to true, it enables consistent obfuscation across geographical entities such as state, city, street, zip, and phone.

    Definition Classes
    BaseDeidParams
  183. final def setInputCols(value: String*): ObfuscatorAnnotatorModel.this.type
    Definition Classes
    HasInputAnnotationCols
  184. def setInputCols(value: Array[String]): ObfuscatorAnnotatorModel.this.type
    Definition Classes
    HasInputAnnotationCols
  185. def setKeepMonth(value: Boolean): ObfuscatorAnnotatorModel.this.type

    Sets whether to keep the month intact when obfuscating date entities.

    Sets whether to keep the month intact when obfuscating date entities. If true, the month will remain unchanged during the obfuscation process. If false, the month will be modified along with the year and day. Default: false.

    Definition Classes
    BaseDeidParams
  186. def setKeepTextSizeForObfuscation(value: Boolean): ObfuscatorAnnotatorModel.this.type

    Sets keepTextSizeForObfuscation param

    Definition Classes
    BaseDeidParams
  187. def setKeepYear(value: Boolean): ObfuscatorAnnotatorModel.this.type

    Sets whether to keep the year intact when obfuscating date entities.

    Sets whether to keep the year intact when obfuscating date entities. If true, the year will remain unchanged during the obfuscation process. If false, the year will be modified along with the month and day. Default: false.

    Definition Classes
    BaseDeidParams
  188. def setLanguage(s: String): ObfuscatorAnnotatorModel.this.type

    The language used to select the regex file and some faker entities.

    The language used to select the regex file and some faker entities. 'en'(English),'de'(German), 'es'(Spanish), 'fr'(French), 'ar'(Arabic) or 'ro'(Romanian). Default:'en'

    Definition Classes
    BaseDeidParams
  189. def setLazyAnnotator(value: Boolean): ObfuscatorAnnotatorModel.this.type
    Definition Classes
    CanBeLazy
  190. def setMode(m: String): ObfuscatorAnnotatorModel.this.type

    Mode for Anonymizer ['mask'|'obfuscate'].

    Mode for Anonymizer ['mask'|'obfuscate']. Default: 'mask'

    • Mask mode: The entities will be replaced by their entity types.
    • Obfuscate mode: The entity is replaced by an obfuscator's term.
    Definition Classes
    ObfuscatorParamsBaseDeidParams
    Example:
    1. Given the following text: "David Hale visited EEUU a couple of years ago"

      • Mask mode: "<PERSON> visited <COUNTRY> a couple of years ago"
      • Obfuscate mode: "Bryan Johnson visited Japan a couple of years ago"
  191. def setObfuscateDate(s: Boolean): ObfuscatorAnnotatorModel.this.type

    When mode=="obfuscate" whether to obfuscate dates or not.

    When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to true, make sure dateFormats param fits the needs. If the value is true and obfuscation is failed, then unnormalizedDateMode will be activated. When setting to 'false' then the date will be masked to <DATE> . Default: false

    Definition Classes
    BaseDeidParams
  192. def setObfuscateRefSource(s: String): ObfuscatorAnnotatorModel.this.type

    The source of obfuscation to obfuscate the entities.

    The source of obfuscation to obfuscate the entities. The values are the following: 'file': Takes the fakes from the obfuscatorRefFile 'faker': Takes the fakes from the Faker module 'both': Takes the fakes from the obfuscatorRefFile and the faker module randomly.

    Definition Classes
    BaseDeidParams
  193. final def setOutputCol(value: String): ObfuscatorAnnotatorModel.this.type
    Definition Classes
    HasOutputAnnotationCol
  194. def setParent(parent: Estimator[ObfuscatorAnnotatorModel]): ObfuscatorAnnotatorModel
    Definition Classes
    Model
  195. def setRegion(s: String): ObfuscatorAnnotatorModel.this.type

    With this property, you can select particular dateFormats.

    With this property, you can select particular dateFormats. This property is especially used when obfuscating dates. You can decide whether the first part of 11/11/2023 is a day or the second part is a day when obfuscating dates. The values are following:

    • 'eu' for European Union
    • 'us' for USA
    Definition Classes
    BaseDeidParams
  196. def setSameLengthFormattedEntities(entities: Array[String]): ObfuscatorAnnotatorModel.this.type

    List of formatted entities to generate the same length outputs as original ones during obfuscation.

    List of formatted entities to generate the same length outputs as original ones during obfuscation. The supported and default formatted entities are: PHONE, FAX, CONTACT, ID, IDNUM, BIOID, MEDICALRECORD, ZIP, VIN, SSN, DLN, LICENSE, PLATE, IRS, CFN, ACCOUNT.

    Definition Classes
    BaseDeidParams
  197. def setSeed(s: Int): ObfuscatorAnnotatorModel.this.type

    It is the seed to select the entities on obfuscate mode.

    It is the seed to select the entities on obfuscate mode. With the seed, you can reply to an execution several times with the same output.

    Definition Classes
    BaseDeidParams
  198. def setSelectiveObfuscateRefSource(value: HashMap[String, String]): ObfuscatorAnnotatorModel.this.type
    Definition Classes
    BaseDeidParams
  199. def setSelectiveObfuscateRefSource(value: Map[String, String]): ObfuscatorAnnotatorModel.this.type

    Sets the value of selectiveObfuscateRefSource.

    Sets the value of selectiveObfuscateRefSource. This is used to selectively apply different obfuscation methods to specific entities. The keys are entity names and the values are the obfuscation sources. If an entity is not specified in this map, the obfuscateRefSource param is used to determine the obfuscation method. The values can be: - 'file': Takes the fakes from the file. - 'faker': Takes the fakes from the embedded faker module. - 'both': Takes the fakes from the file and the faker module.

    Definition Classes
    BaseDeidParams
    Example:
    1. val modes = Map(
       "PHONE" -> "file",
       "EMAIL" -> "faker",
       "NAME" -> "faker",
       "ADDRESS" -> "both"
       )
  200. def setStaticObfuscationPairs(pairs: ArrayList[ArrayList[String]]): ObfuscatorAnnotatorModel.this.type
    Definition Classes
    BaseDeidParams
  201. def setStaticObfuscationPairs(pairs: Array[StaticObfuscationEntity]): ObfuscatorAnnotatorModel.this.type
    Definition Classes
    BaseDeidParams
  202. def setStaticObfuscationPairs(pairs: Array[Array[String]]): ObfuscatorAnnotatorModel.this.type

    Sets the static obfuscation pairs.

    Sets the static obfuscation pairs. Each pair should contain three elements: original, entity type, and fake. The pairs must have exactly 3 elements: [original, entityType, fake].

    pairs

    An array of arrays containing the static obfuscation pairs.

    Definition Classes
    BaseDeidParams
  203. def setUnnormalizedDateMode(mode: String): ObfuscatorAnnotatorModel.this.type

    The mode to use if the date is not formatted.

    The mode to use if the date is not formatted. Options: [mask, obfuscate, skip] Default: obfuscate

    Definition Classes
    BaseDeidParams
  204. def setUseRandomDateDisplacement(s: Boolean): ObfuscatorAnnotatorModel.this.type

    Use a random displacement days in dates entities,that random number is based on the BaseDeidParams.seed If true use random displacement days in dates entities, if false use the BaseDeidParams.days The default value is false.

    Use a random displacement days in dates entities,that random number is based on the BaseDeidParams.seed If true use random displacement days in dates entities, if false use the BaseDeidParams.days The default value is false.

    Definition Classes
    ObfuscatorParams
  205. def setUseShiftDays(s: Boolean): ObfuscatorAnnotatorModel.this.type

    Sets the value of useShiftDays.

    Sets the value of useShiftDays. Whether to use the random shift day when the document has this in its metadata. DocumentHashCoder can create 'dateshift' based on the document. Default: false

    Definition Classes
    BaseDeidParams
  206. def shouldUseConsistentNameParts(entityClass: String): Boolean
    Attributes
    protected
    Definition Classes
    DeidModelParams
  207. val staticObfuscationPairs: StructFeature[Array[StaticObfuscationEntity]]

    A resource containing static obfuscation pairs.

    A resource containing static obfuscation pairs. Each pair should contain three elements: original, entity type, and fake.

    Definition Classes
    BaseDeidParams
  208. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  209. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  210. final def transform(dataset: Dataset[_]): DataFrame
    Definition Classes
    AnnotatorModel → Transformer
  211. def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" )
  212. def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" ) @varargs()
  213. final def transformSchema(schema: StructType): StructType
    Definition Classes
    RawAnnotator → PipelineStage
  214. def transformSchema(schema: StructType, logging: Boolean): StructType
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  215. val uid: String
    Definition Classes
    ObfuscatorAnnotatorModel → Identifiable
  216. val unnormalizedDateMode: Param[String]

    The mode to use if the date is not formatted.

    The mode to use if the date is not formatted. Options: [mask, obfuscate, skip] Default: obfuscate

    Definition Classes
    BaseDeidParams
  217. val useRandomDateDisplacement: BooleanParam

    Use a random displacement days in dates entities,that random number is based on the seed If true use random displacement days in dates entities,if false use the ObfuscatorParams.days The default value is false.

    Use a random displacement days in dates entities,that random number is based on the seed If true use random displacement days in dates entities,if false use the ObfuscatorParams.days The default value is false.

    Definition Classes
    ObfuscatorParams
  218. val useShiftDays: BooleanParam

    Whether to use the random shift day when the document has this in its metadata.

    Whether to use the random shift day when the document has this in its metadata. DocumentHashCoder can create 'dateshift' based on the document. Default: false

    Definition Classes
    BaseDeidParams
  219. def validate(schema: StructType): Boolean
    Attributes
    protected
    Definition Classes
    RawAnnotator
  220. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  221. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  222. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  223. def wrapColumnMetadata(col: Column): Column
    Attributes
    protected
    Definition Classes
    RawAnnotator
  224. def write: MLWriter
    Definition Classes
    ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable

Inherited from HasSimpleAnnotate[ObfuscatorAnnotatorModel]

Inherited from DeidModelParams

Inherited from ObfuscatorParams

Inherited from BaseDeidParams

Inherited from AnnotatorModel[ObfuscatorAnnotatorModel]

Inherited from CanBeLazy

Inherited from RawAnnotator[ObfuscatorAnnotatorModel]

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from HasOutputAnnotatorType

Inherited from ParamsAndFeaturesWritable

Inherited from HasFeatures

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from Model[ObfuscatorAnnotatorModel]

Inherited from Transformer

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

getParam

Parameters

Parameter setters

Ungrouped