class DeIdentification extends AnnotatorApproach[DeIdentificationModel] with DeIdentificationParams with DeidApproachParams with HandleExceptionParams with CheckLicense

Contains all the methods for training a DeIdentificationModel model. This module can obfuscate or mask the entities that contains personal information. These can be set with a file of regex patterns with setRegexPatternsDictionary, where each line is a mapping of entity to regex.

DATE \d{4}
AID \d{6,7}

Additionally, obfuscation strings can be defined with DeidApproachParams.setObfuscateRefFile, where each line is a mapping of string to entity. The format and seperator can be speficied with DeidApproachParams.setRefFileFormat and DeidApproachParams.setRefSep.

Dr. Gregory House#DOCTOR
01010101#MEDICALRECORD

The configuration params for that module are in trait DeIdentificationParams.

Exceptions thrown

java.security.NoSuchAlgorithmException If no Provider supports a SecureRandom implementation for specified algorithm name.

Note

If the mode is set to obfuscate, the DeIdentification uses java.security.SecureRandom for generating fake data. You can select a generation algorithm by configuring the system environment variable SPARK_NLP_JSL_SEED_ALGORITHM. The chosen algorithm may impact the generation of fake data, performance, and potential blocking issues. For information about standard RNG algorithm names, refer to the SecureRandom section in the Number Generation Algorithm. The default algorithm is 'SHA1PRNG'.

See also

DeIdentificationModel

DeIdentificationParams

DeidApproachParams

train Ideally this annotator works in conjunction with Demographic Named EntityRecognizers that can be trained either using TextMatchers, RegexMatchers, DateMatchers, NerCRFs or NerDLs Example of pipeline for deidentification.

Example

val documentAssembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val sentenceDetector = new SentenceDetector()
    .setInputCols(Array("document"))
    .setOutputCol("sentence")
    .setUseAbbreviations(true)

val tokenizer = new Tokenizer()
    .setInputCols(Array("sentence"))
    .setOutputCol("token")

val embeddings = WordEmbeddingsModel
    .pretrained("embeddings_clinical", "en", "clinical/models")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")

Ner entities

 val clinical_sensitive_entities = MedicalNerModel.pretrained("ner_deid_enriched", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings")).setOutputCol("ner")

 val nerConverter = new NerConverter()
     .setInputCols(Array("sentence", "token", "ner"))
     .setOutputCol("ner_chunk")

Deidentification

val deIdentification = new DeIdentification()
    .setInputCols(Array("ner_chunk", "token", "sentence"))
    .setOutputCol("dei")
    // file with custom regex patterns for custom entities
    .setRegexPatternsDictionary("path/to/dic_regex_patterns_main_categories.txt")
    // file with custom obfuscator names for the entities
    .setObfuscateRefFile("path/to/obfuscate_fixed_entities.txt")
    .setRefFileFormat("csv")
    .setRefSep("#")
    .setMode("obfuscate")
    .setDateFormats(Array("MM/dd/yy","yyyy-MM-dd"))
    .setObfuscateDate(true)
    .setDateTag("DATE")
    .setDays(5)
    .setObfuscateRefSource("file")

Pipeline

val data = Seq(
  "# 7194334 Date : 01/13/93 PCP : Oliveira , 25 years-old , Record date : 2079-11-09."
).toDF("text")

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  sentenceDetector,
  tokenizer,
  embeddings,
  clinical_sensitive_entities,
  nerConverter,
  deIdentification
))
val result = pipeline.fit(data).transform(data)


result.select("dei.result").show(truncate = false)

Show Results

result.select("dei.result").show(truncate = false)
+--------------------------------------------------------------------------------------------------+
|result                                                                                            |
+--------------------------------------------------------------------------------------------------+
|[# 01010101 Date : 01/18/93 PCP : Dr. Gregory House , <AGE> years-old , Record date : 2079-11-14.]|
+--------------------------------------------------------------------------------------------------+
Linear Supertypes
CheckLicense, HandleExceptionParams, DeidApproachParams, DeIdentificationParams, HasFeatures, BaseDeidParams, AnnotatorApproach[DeIdentificationModel], CanBeLazy, DefaultParamsWritable, MLWritable, HasOutputAnnotatorType, HasOutputAnnotationCol, HasInputAnnotationCols, Estimator[DeIdentificationModel], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. DeIdentification
  2. CheckLicense
  3. HandleExceptionParams
  4. DeidApproachParams
  5. DeIdentificationParams
  6. HasFeatures
  7. BaseDeidParams
  8. AnnotatorApproach
  9. CanBeLazy
  10. DefaultParamsWritable
  11. MLWritable
  12. HasOutputAnnotatorType
  13. HasOutputAnnotationCol
  14. HasInputAnnotationCols
  15. Estimator
  16. PipelineStage
  17. Logging
  18. Params
  19. Serializable
  20. Serializable
  21. Identifiable
  22. AnyRef
  23. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DeIdentification()
  2. new DeIdentification(uid: String)

    uid

    a unique identifier for the instanced Annotator

    Exceptions thrown

    java.security.NoSuchAlgorithmException If no Provider supports a SecureRandom implementation for specified algorithm name.

Type Members

  1. type AnnotatorType = String
    Definition Classes
    HasOutputAnnotatorType

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T
    Attributes
    protected
    Definition Classes
    Params
  4. def $$[T](feature: StructFeature[T]): T
    Attributes
    protected
    Definition Classes
    HasFeatures
  5. def $$[K, V](feature: MapFeature[K, V]): Map[K, V]
    Attributes
    protected
    Definition Classes
    HasFeatures
  6. def $$[T](feature: SetFeature[T]): Set[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  7. def $$[T](feature: ArrayFeature[T]): Array[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  8. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  9. def _fit(dataset: Dataset[_], recursiveStages: Option[PipelineModel]): DeIdentificationModel
    Attributes
    protected
    Definition Classes
    AnnotatorApproach
  10. val ageGroups: StructFeature[Map[String, Array[Int]]]

    A map of age groups to obfuscate ages.

    A map of age groups to obfuscate ages. For this parameter to be active, the obfuscateByAgeGroups parameter must be true. If the given ageGroups do not fully contain the ages, the ages continue to be obfuscated according to the ageRanges. The map should contain the age group name as the key and an array of two integers as the value. The first integer is the lower bound of the age group, and the second integer is the upper bound of the age group. Default age groups are as follows in the English language:

    Map(
    "baby" -> Array(0, 1),
    "toddler" -> Array(1, 4),
    "child" -> Array(4, 13),
    "teenager" -> Array(13, 20),
    "adult" -> Array(20, 65),
    "senior" -> Array(65, 200)
    )
    Definition Classes
    DeIdentificationParams
  11. val ageRanges: IntArrayParam

    List of integers specifying limits of the age groups to preserve during obfuscation

    List of integers specifying limits of the age groups to preserve during obfuscation

    Definition Classes
    BaseDeidParams
  12. val ageRangesByHipaa: BooleanParam

    A Boolean variable indicating whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.

    A Boolean variable indicating whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.

    The HIPAA Privacy Rule mandates that ages from patients older than 90 years must be obfuscated, while age for patients 90 years or younger can remain unchanged.

    When true, age entities larger than 90 will be obfuscated as per HIPAA Privacy Rule, the others will remain unchanged. When false, ageRanges parameter is valid.

    Definition Classes
    DeIdentificationParams
  13. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  14. def beforeTraining(spark: SparkSession): Unit
    Definition Classes
    AnnotatorApproach
  15. val blackList: StringArrayParam

    List of entities that will be ignored in the regex file.

    List of entities that will be ignored in the regex file. The rest will be processed. The default values are "IBAN","ZIP","NPI","DLN","PASSPORT","C_CARD","DEA","SSN", "IP", "DEA".

    Definition Classes
    DeIdentificationParams
  16. val blackListEntities: StringArrayParam

    List of entities coming from NER or regex rules that will be ignored for masking or obfuscation.

    List of entities coming from NER or regex rules that will be ignored for masking or obfuscation. The rest entities will be processed. Defaults to an empty array.

    Definition Classes
    DeIdentificationParams
  17. final def checkSchema(schema: StructType, inputAnnotatorType: String): Boolean
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  18. def checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit
    Definition Classes
    CheckLicense
  19. def checkValidScope(scope: String): Unit
    Definition Classes
    CheckLicense
  20. def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  21. def checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  22. val chunkMatching: MapFeature[String, Double]

    Performs entity chunk matching across rows or within groups in a DataFrame.

    Performs entity chunk matching across rows or within groups in a DataFrame. Useful in de-identification pipelines where certain entity labels like "NAME" or "DATE" may be missing in some rows and need to be filled from other rows in the same group.

    Definition Classes
    DeIdentificationParams
    Note

    When applying the method across multiple rows, the usage of groupByCol parameter is required.

  23. final def clear(param: Param[_]): DeIdentification.this.type
    Definition Classes
    Params
  24. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  25. val combineRegexPatterns: BooleanParam

    If the value is true both loaded regex file and default regex file are used together; if the value is false, either loaded regex file or default regex file is used.

    If the value is true both loaded regex file and default regex file are used together; if the value is false, either loaded regex file or default regex file is used. The default value is false If the value is true, it uses the default regex file regardless of the value of the regexOverride.

  26. val consistentAcrossNameParts: BooleanParam

    Param that indicates whether consistency should be enforced across different parts of a name (e.g., first name, middle name, last name).

    Param that indicates whether consistency should be enforced across different parts of a name (e.g., first name, middle name, last name). When set to true, the same transformation or obfuscation will be applied consistently to all parts of the same name entity, even if those parts appear separately.

    For example, if "John Smith" is obfuscated as "Liam Brown", then:

    • When the full name "John Smith" appears, it will be replaced with "Liam Brown"
    • When "John" or "Smith" appear individually, they will still be obfuscated as "Liam" and "Brown" respectively, ensuring consistency in name transformation.

    Default: true

    Definition Classes
    BaseDeidParams
  27. val consistentObfuscation: BooleanParam

    Whether to replace very similar entities in a document with the same randomized term (default: true) The similarity is based on the Levenshtein Distance between the words.

    Whether to replace very similar entities in a document with the same randomized term (default: true) The similarity is based on the Levenshtein Distance between the words.

    Definition Classes
    DeIdentificationParams
  28. final def copy(extra: ParamMap): Estimator[DeIdentificationModel]
    Definition Classes
    AnnotatorApproach → Estimator → PipelineStage → Params
  29. def copyValues[T <: Params](to: T, extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  30. val dateFormats: StringArrayParam

    Format of dates to displace

    Format of dates to displace

    Definition Classes
    BaseDeidParams
  31. val dateTag: Param[String]

    Tag representing what are the NER entity (default: DATE)

    Tag representing what are the NER entity (default: DATE)

    Definition Classes
    DeIdentificationParams
  32. val dateToYear: BooleanParam

    true if dates must be converted to years, false otherwise

    true if dates must be converted to years, false otherwise

    Definition Classes
    DeIdentificationParams
  33. val days: IntParam

    Number of days to obfuscate the dates by displacement.

    Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used

    Definition Classes
    BaseDeidParams
  34. final def defaultCopy[T <: Params](extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  35. val description: String
    Definition Classes
    DeIdentification → AnnotatorApproach
  36. val doExceptionHandling: BooleanParam

    If true, exceptions are handled.

    If true, exceptions are handled. If exception causing data is passed to the model, a error annotation is emitted which has the exception message. Processing continues with the next one. This comes with a performance penalty.

    Definition Classes
    HandleExceptionParams
  37. val entityCasingModesPath: Param[String]

    Dictionary path where is the json that contains the entity casing modes.

    Dictionary path where is the json that contains the entity casing modes. 'lowercase': Converts all characters to lower case using the rules of the default locale. 'uppercase': Converts all characters to upper case using the rules of the default locale. 'capitalize': Converts the first character to upper case and converts others to lower case. 'titlecase': Converts the first character in every token to upper case and converts others to lower case.

  38. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  39. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  40. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  41. def explainParams(): String
    Definition Classes
    Params
  42. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  43. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  44. val fakerLengthOffset: IntParam

    It specifies how much length deviation is accepted in obfuscation, with keepTextSizeForObfuscation enabled.

    It specifies how much length deviation is accepted in obfuscation, with keepTextSizeForObfuscation enabled. Value must be greater than 0. Default is 3.

    Definition Classes
    BaseDeidParams
  45. val features: ArrayBuffer[Feature[_, _, _]]
    Definition Classes
    HasFeatures
  46. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  47. final def fit(dataset: Dataset[_]): DeIdentificationModel
    Definition Classes
    AnnotatorApproach → Estimator
  48. def fit(dataset: Dataset[_], paramMaps: Seq[ParamMap]): Seq[DeIdentificationModel]
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  49. def fit(dataset: Dataset[_], paramMap: ParamMap): DeIdentificationModel
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  50. def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DeIdentificationModel
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" ) @varargs()
  51. val fixedMaskLength: IntParam

    Select the fixed mask length: this is the length of the masking sequence that will be used when the 'fixed_length_chars' masking policy is selected.

    Select the fixed mask length: this is the length of the masking sequence that will be used when the 'fixed_length_chars' masking policy is selected.

    Definition Classes
    DeIdentificationParams
  52. val genderAwareness: BooleanParam

    Whether to use gender-aware names or not during obfuscation.

    Whether to use gender-aware names or not during obfuscation. This param effects only names. If value is true, it might decrease performance. Default: False

    Definition Classes
    BaseDeidParams
  53. def get[T](feature: StructFeature[T]): Option[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  54. def get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  55. def get[T](feature: SetFeature[T]): Option[Set[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  56. def get[T](feature: ArrayFeature[T]): Option[Array[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  57. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  58. def getBlackListEntities: Array[String]

    Gets blackListEntities param

    Definition Classes
    DeIdentificationParams
  59. def getChunkMatching: Map[String, Double]
    Definition Classes
    DeIdentificationParams
  60. def getChunkMatchingAsStr: String
    Definition Classes
    DeIdentificationParams
  61. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  62. def getCombineRegexPatterns: Boolean
  63. def getConsistentAcrossNameParts: Boolean

    Gets the value of consistentAcrossNameParts.

    Gets the value of consistentAcrossNameParts.

    returns

    Boolean value indicating if consistency is enforced across name parts

    Definition Classes
    BaseDeidParams
  64. def getConsistentObfuscation: Boolean
    Definition Classes
    DeIdentificationParams
  65. def getDateFormats: Array[String]
    Definition Classes
    BaseDeidParams
  66. def getDateTag: String
    Definition Classes
    DeIdentificationParams
  67. def getDateToYear: Boolean
    Definition Classes
    DeIdentificationParams
  68. def getDays: Int
    Definition Classes
    BaseDeidParams
  69. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  70. def getFakerLengthOffset: Int

    Gets fakerLengthOffset param

    Gets fakerLengthOffset param

    Definition Classes
    BaseDeidParams
  71. def getFixedMaskLength: Int

    Get fixedMaskLength param

    Get fixedMaskLength param

    Definition Classes
    DeIdentificationParams
  72. def getGroupByCol: String

    Gets groupByCol param

    Gets groupByCol param

    Definition Classes
    DeIdentificationParams
  73. def getIgnoreRegex: Boolean
    Definition Classes
    DeIdentificationParams
  74. def getInputCols: Array[String]
    Definition Classes
    HasInputAnnotationCols
  75. def getKeepMonth: Boolean

    Gets keepMonth param

    Gets keepMonth param

    Definition Classes
    DeIdentificationParams
  76. def getKeepTextSizeForObfuscation: Boolean

    Gets keepTextSizeForObfuscation param

    Definition Classes
    BaseDeidParams
  77. def getKeepYear: Boolean

    Gets keepYear param

    Gets keepYear param

    Definition Classes
    DeIdentificationParams
  78. def getLanguage: String
    Definition Classes
    BaseDeidParams
  79. def getLazyAnnotator: Boolean
    Definition Classes
    CanBeLazy
  80. def getMappingsColumn: String
    Definition Classes
    DeIdentificationParams
  81. def getMaskingPolicy: String
    Definition Classes
    DeIdentificationParams
  82. def getMetadataMaskingPolicy: String

    Gets metadataMaskingPolicy param

    Definition Classes
    DeIdentificationParams
  83. def getMinYear: Int
    Definition Classes
    DeIdentificationParams
  84. def getMode: String
    Definition Classes
    DeIdentificationParams
  85. def getObfuscateByAgeGroups: Boolean

    Gets obfuscateByAgeGroups param

    Definition Classes
    DeIdentificationParams
  86. def getObfuscateDate: Boolean
    Definition Classes
    DeIdentificationParams
  87. def getObfuscateRefSource: String
    Definition Classes
    BaseDeidParams
  88. def getObfuscationStrategyOnException: String
    Definition Classes
    DeIdentificationParams
  89. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  90. final def getOutputCol: String
    Definition Classes
    HasOutputAnnotationCol
  91. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  92. def getRegexOverride: Boolean
    Definition Classes
    DeIdentificationParams
  93. def getRegexPatternsDictionaryAsJsonString: String
  94. def getReturnEntityMappings: Boolean
    Definition Classes
    DeIdentificationParams
  95. def getSameEntityThreshold: Double
    Definition Classes
    DeIdentificationParams
  96. def getSameLengthFormattedEntities(): Array[String]
    Definition Classes
    BaseDeidParams
  97. def getSeed(): Int
    Definition Classes
    BaseDeidParams
  98. def getUseShiftDays: Boolean

    Getter method of useShiftDays

    Getter method of useShiftDays

    Definition Classes
    DeIdentificationParams
  99. def getZipCodeTag: String
    Definition Classes
    DeIdentificationParams
  100. val groupByCol: Param[String]

    The column name used to group the dataset.

    The column name used to group the dataset. This parameter is used in conjunction with consistentObfuscation to ensure consistent obfuscation within each group. When groupByCol is set, the dataset is partitioned into groups based on the values of the specified column.

    Default: "" (empty string, meaning no grouping)

    • The column name must be a valid string in the input dataset.
    • The column must be of StringType.
    Definition Classes
    DeIdentificationParams
    Note

    This functionality can change order of the dataset, so it is recommended to use it with caution.

    ,

    This functionality cannot be supported by LightPipeline.

  101. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  102. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  103. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  104. val ignoreRegex: BooleanParam

    Select if you want to use regex file loaded in the model.

    Select if you want to use regex file loaded in the model. If true the default regex file will be not used The default value is false.

    Definition Classes
    DeIdentificationParams
  105. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  106. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  107. val inputAnnotatorTypes: Array[AnnotatorType]

    Input annotator type: DOCUMENT, TOKEN, CHUNK

    Input annotator type: DOCUMENT, TOKEN, CHUNK

    Definition Classes
    DeIdentification → HasInputAnnotationCols
  108. final val inputCols: StringArrayParam
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  109. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  110. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  111. val isRandomDateDisplacement: BooleanParam

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities,if false use the DeIdentificationParams.days The default value is false.

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities,if false use the DeIdentificationParams.days The default value is false.

    Definition Classes
    DeIdentificationParams
  112. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  113. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  114. val keepMonth: BooleanParam

    Whether to keep the month intact when obfuscating date entities.

    Whether to keep the month intact when obfuscating date entities. If true, the month will remain unchanged during the obfuscation process. If false, the month will be modified along with the year and day. Default: false.

    Definition Classes
    DeIdentificationParams
  115. val keepTextSizeForObfuscation: BooleanParam

    It specifies whether the output should maintain the same character length as the input text.

    It specifies whether the output should maintain the same character length as the input text. the output text will remain the same if same length is available, else length might vary.

    Definition Classes
    BaseDeidParams
  116. val keepYear: BooleanParam

    Whether to keep the year intact when obfuscating date entities.

    Whether to keep the year intact when obfuscating date entities. If true, the year will remain unchanged during the obfuscation process. If false, the year will be modified along with the month and day. Default: false.

    Definition Classes
    DeIdentificationParams
  117. val language: Param[String]

    The language used to select the regex file and some faker entities.

    The language used to select the regex file and some faker entities. 'en'(English),'de'(German), 'es'(Spanish), 'fr'(French), 'ar'(Arabic) or 'ro'(Romanian) Default:'en'

    Definition Classes
    BaseDeidParams
  118. val lazyAnnotator: BooleanParam
    Definition Classes
    CanBeLazy
  119. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  120. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  121. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  122. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  123. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  124. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  125. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  126. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  127. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  128. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  129. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  130. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  131. val mappingsColumn: Param[String]

    This is the mapping column that will return the Annotations chunks with the fake entities

    This is the mapping column that will return the Annotations chunks with the fake entities

    Definition Classes
    DeIdentificationParams
  132. val maskingPolicy: Param[String]

    Select the masking policy:

    Select the masking policy:

    • 'entity_labels': Replace the values with the entity value.
    • 'same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.If the entity is less than 3 chars (like Jo, or 5), we can just use asterix without brackets.
    • 'fixed_length_chars': Replace the obfuscated entity with a masking sequence composed of a fixed number of asterisk.
    • Default: 'entity_labels'
    Definition Classes
    DeIdentificationParams
  133. val metadataMaskingPolicy: Param[String]

    If specified, the metadata includes the masked form of the document.

    If specified, the metadata includes the masked form of the document. Select the following masking policy if you want to return mask form in the metadata:

    • 'entity_labels': Replace the values with the entity value.
    • 'same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.If the entity is less than 3 chars (like Jo, or 5), we can just use asterix without brackets.
    • 'fixed_length_chars': Replace the obfuscated entity with a masking sequence composed of a fixed number of asterisk.
    • Default: ""
    Definition Classes
    DeIdentificationParams
  134. val minYear: IntParam

    Minimum year to use when converting date to year

    Minimum year to use when converting date to year

    Definition Classes
    DeIdentificationParams
  135. val mode: Param[String]

    Mode for Anonymizer ['mask'|'obfuscate'].

    Mode for Anonymizer ['mask'|'obfuscate']. Default: 'mask'

    • Mask mode: The entities will be replaced by their entity types.
    • Obfuscate mode: The entity is replaced by an obfuscator's term.
    Definition Classes
    DeIdentificationParams
    Example:
    1. Given the following text: "David Hale visited EEUU a couple of years ago"

      • Mask mode: "<PERSON> visited <COUNTRY> a couple of years ago"
      • Obfuscate mode: "Bryan Johnson visited Japan a couple of years ago"
  136. def msgHelper(schema: StructType): String
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  137. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  138. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  139. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  140. val obfuscateByAgeGroups: BooleanParam

    Whether to obfuscate ages based on age groups.

    Whether to obfuscate ages based on age groups.

    When true, the age groups specified in the ageGroups parameter will be used to obfuscate ages. When false, the age ranges specified in the ageRanges parameter will be used to obfuscate ages. Default: false.

    Definition Classes
    DeIdentificationParams
  141. val obfuscateDate: BooleanParam

    When mode=="obfuscate" whether to obfuscate dates or not.

    When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to true, make sure dateFormats param fits the needs. If the value is true and obfuscation is failed, then DeIdentificationParams.unnormalizedDateMode will be activated. When setting to 'false', then the date will be masked to <DATE> Default: false

    Definition Classes
    DeIdentificationParams
  142. val obfuscateRefFile: Param[String]

    File with the terms to be used for Obfuscation

    File with the terms to be used for Obfuscation

    Definition Classes
    DeidApproachParams
  143. val obfuscateRefSource: Param[String]

    The source of obfuscation to obfuscate the entities.

    The source of obfuscation to obfuscate the entities. The values ar the following: 'file': Takes the entities from the obfuscatorRefFile 'faker': Takes the entities from the Faker module 'both': Takes the entities from the obfuscatorRefFile and the faker module randomly.

    Definition Classes
    BaseDeidParams
  144. val obfuscationStrategyOnException: Param[String]

    The obfuscation strategy to be applied when an exception occurs.

    The obfuscation strategy to be applied when an exception occurs.

    The obfuscation strategy determines how obfuscation is handled in case of an exception. Four possible values are supported:

    • "mask": The original chunk is replaced with a masking pattern.
    • "default": The original chunk is replaced with a default faker.
    • "skip": The original chunk is not replaced with any faker.
    • "exception": Throws the exception.

    The default obfuscation strategy is "default".

    Definition Classes
    DeIdentificationParams
  145. def onTrained(model: DeIdentificationModel, spark: SparkSession): Unit
    Definition Classes
    AnnotatorApproach
  146. val optionalInputAnnotatorTypes: Array[String]
    Definition Classes
    HasInputAnnotationCols
  147. val outputAnnotatorType: AnnotatorType

    Output annotator types: DOCUMENT

    Output annotator types: DOCUMENT

    Definition Classes
    DeIdentification → HasOutputAnnotatorType
  148. val outputAsDocument: BooleanParam

    Whether to return all sentences joined into a single document

    Whether to return all sentences joined into a single document

    Definition Classes
    DeIdentificationParams
  149. final val outputCol: Param[String]
    Attributes
    protected
    Definition Classes
    HasOutputAnnotationCol
  150. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  151. val refFileFormat: Param[String]

    Format of the reference file for Obfuscation the default value for that is "csv"

    Format of the reference file for Obfuscation the default value for that is "csv"

    Definition Classes
    DeidApproachParams
  152. val refSep: Param[String]

    Separator character for the csv reference file for Obfuscation de default value is "#"

    Separator character for the csv reference file for Obfuscation de default value is "#"

    Definition Classes
    DeidApproachParams
  153. val regexOverride: BooleanParam

    If the value is true, prioritize the regex entities; if the value is false, prioritize the ner.

    If the value is true, prioritize the regex entities; if the value is false, prioritize the ner. The default value is false. If DeIdentification.combineRegexPatterns is true, this value will be invalid.

    Definition Classes
    DeIdentificationParams
  154. val regexPatternsDictionary: ExternalResourceParam

    dictionary with regular expression patterns that match some protected entity if the dictionary in not setting up we will use the default regex file.

  155. val regexPatternsDictionaryAsJsonString: Param[String]

    dictionary with regular expression patterns given as JSON that match some protected entity if the dictionary is not setting up we will use the default regex file.

  156. val region: Param[String]

    With this property, you can select particular dateFormats.

    With this property, you can select particular dateFormats. This property is especially used when obfuscating dates. You can decide whether the first part of 11/11/2023 is a day or the second part is a day when obfuscating dates. The values are following: 'eu' for European Union 'us' for USA Default: 'eu'

    Definition Classes
    DeIdentificationParams
  157. val returnEntityMappings: BooleanParam

    With this property, you can select if you want to return mapping column.

    With this property, you can select if you want to return mapping column.

    Definition Classes
    DeIdentificationParams
  158. val sameEntityThreshold: DoubleParam

    Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same (default: 0.9) For date entities this method doesn't apply.

    Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same (default: 0.9) For date entities this method doesn't apply.

    Definition Classes
    DeIdentificationParams
  159. val sameLengthFormattedEntities: StringArrayParam

    List of formatted entities to generate the same length outputs as original ones during obfuscation.

    List of formatted entities to generate the same length outputs as original ones during obfuscation. The supported and default formatted entities are: "phone", "fax", "contact," "id", "idnum", "bioid", "medicalrecord", "zip", "vin", "ssn", "dln", "plate", "license", "IRS", "CFN", "account".

    Definition Classes
    BaseDeidParams
  160. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  161. val seed: IntParam

    It is the seed to select the entities on obfuscate mode.

    It is the seed to select the entities on obfuscate mode. With the seed, you can reply to an execution several times with the same output.

    Definition Classes
    BaseDeidParams
  162. val selectiveObfuscationModesPath: Param[String]

    Dictionary path where is the json that contains the selective obfuscation modes

  163. def set[T](feature: StructFeature[T], value: T): DeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  164. def set[K, V](feature: MapFeature[K, V], value: Map[K, V]): DeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  165. def set[T](feature: SetFeature[T], value: Set[T]): DeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  166. def set[T](feature: ArrayFeature[T], value: Array[T]): DeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  167. final def set(paramPair: ParamPair[_]): DeIdentification.this.type
    Attributes
    protected
    Definition Classes
    Params
  168. final def set(param: String, value: Any): DeIdentification.this.type
    Attributes
    protected
    Definition Classes
    Params
  169. final def set[T](param: Param[T], value: T): DeIdentification.this.type
    Definition Classes
    Params
  170. def setAgeGroups(value: Map[String, Array[Int]]): DeIdentification.this.type

    Sets the age groups to obfuscate ages.

    Sets the age groups to obfuscate ages. For this parameter to be active, the obfuscateByAgeGroups parameter must be true. If the given ageGroups do not fully contain the ages, the ages continue to be obfuscated according to the ageRanges. The map should contain the age group name as the key and an array of two integers as the value. The first integer is the lower bound of the age group, and the second integer is the upper bound of the age group. Default age groups are as follows in the English language:

    Map(
    "baby" -> Array(0, 1),
    "toddler" -> Array(1, 3),
    "child" -> Array(3, 12),
    "teenager" -> Array(12, 20),
    "adult" -> Array(20, 65),
    "senior" -> Array(65, 200)
    )
    Definition Classes
    DeIdentificationParams
    Exceptions thrown

    IllegalArgumentException if the value is empty, contains negative values, or is not a pair of integers

  171. def setAgeGroups(value: HashMap[String, ArrayList[Int]]): DeIdentification.this.type
    Definition Classes
    DeIdentificationParams
  172. def setAgeRanges(mode: Array[Int]): DeIdentification.this.type

    List of integers specifying limits of the age groups to preserve during obfuscation

    List of integers specifying limits of the age groups to preserve during obfuscation

    Definition Classes
    BaseDeidParams
  173. def setAgeRangesByHipaa(value: Boolean): DeIdentification.this.type

    Sets whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.

    Sets whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.

    The HIPAA Privacy Rule mandates that ages from patients older than 90 years must be obfuscated, while age for patients 90 years or younger can remain unchanged.

    value

    If true, age entities larger than 90 will be obfuscated as per HIPAA Privacy Rule, the others will remain unchanged. If false, ageRanges parameter is valid. Default: false.

    Definition Classes
    DeIdentificationParams
  174. def setBlackList(list: Array[String]): DeIdentification.this.type

    List of entities that will be ignored to in the regex file.

    List of entities that will be ignored to in the regex file. The rest will be processed. The default values are "IBAN","ZIP","NPI","DLN","PASSPORT","C_CARD","DEA","SSN", "IP", "DEA".

    Definition Classes
    DeIdentificationParams
  175. def setBlackListEntities(value: Array[String]): DeIdentification.this.type

    Sets the list of entities coming from NER or regex rules that will be ignored for masking or obfuscation.

    Sets the list of entities coming from NER or regex rules that will be ignored for masking or obfuscation. The rest entities will be processed. Defaults to an empty array.

    Definition Classes
    DeIdentificationParams
  176. def setChunkMatching(categories: HashMap[String, Double]): DeIdentification.this.type
    Definition Classes
    DeIdentificationParams
  177. def setChunkMatching(value: Map[String, Double]): DeIdentification.this.type

    Performs entity chunk matching across rows or within groups in a DataFrame.

    Performs entity chunk matching across rows or within groups in a DataFrame. Useful in de-identification pipelines where certain entity labels like "NAME" or "DATE" may be missing in some rows and need to be filled from other rows in the same group.

    Notes:

    • When applying the method across multiple rows, the usage of groupByCol parameter is required.
    Definition Classes
    DeIdentificationParams
  178. def setCombineRegexPatterns(s: Boolean): DeIdentification.this.type

    If the value is true both loaded regex file and default regex file are used together; if the value is false, either loaded regex file or default regex file is used.

    If the value is true both loaded regex file and default regex file are used together; if the value is false, either loaded regex file or default regex file is used. The default value is false If the value is true, it uses the default regex file regardless of the value of the regexOverride.

  179. def setConsistentAcrossNameParts(value: Boolean): DeIdentification.this.type

    Sets the value of consistentAcrossNameParts.

    Sets the value of consistentAcrossNameParts.

    value

    Boolean flag to enforce consistency across name parts

    returns

    this instance

    Definition Classes
    BaseDeidParams
  180. def setConsistentObfuscation(s: Boolean): DeIdentification.this.type

    Whether to replace very similar entities in a document with the same randomized term (default: true) The similarity is based on the Levenshtein Distance between the words.

    Whether to replace very similar entities in a document with the same randomized term (default: true) The similarity is based on the Levenshtein Distance between the words.

    Definition Classes
    DeIdentificationParams
  181. def setDateFormats(s: Array[String]): DeIdentification.this.type

    Format of dates to displace

    Format of dates to displace

    Definition Classes
    BaseDeidParams
  182. def setDateTag(s: String): DeIdentification.this.type

    Tag representing what are the NER entity (default: DATE)

    Tag representing what are the NER entity (default: DATE)

    Definition Classes
    DeIdentificationParams
  183. def setDateToYear(s: Boolean): DeIdentification.this.type

    true if dates must be converted to years, false otherwise

    true if dates must be converted to years, false otherwise

    Definition Classes
    DeIdentificationParams
  184. def setDays(k: Int): DeIdentification.this.type

    Number of days to obfuscate the dates by displacement.

    Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used

    Definition Classes
    BaseDeidParams
  185. def setDefault[T](feature: StructFeature[T], value: () ⇒ T): DeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  186. def setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): DeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  187. def setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): DeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  188. def setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): DeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  189. final def setDefault(paramPairs: ParamPair[_]*): DeIdentification.this.type
    Attributes
    protected
    Definition Classes
    Params
  190. final def setDefault[T](param: Param[T], value: T): DeIdentification.this.type
    Attributes
    protected[org.apache.spark.ml]
    Definition Classes
    Params
  191. def setDoExceptionHandling(value: Boolean): DeIdentification.this.type

    If true, exceptions are handled.

    If true, exceptions are handled. If exception causing data is passed to the model, a error annotation is emitted which has the exception message. Processing continues with the next one. This comes with a performance penalty.

    Definition Classes
    HandleExceptionParams
  192. def setEntityCasingModesPath(path: String): DeIdentification.this.type

    Dictionary path where is the json that contains the entity casing modes.

    Dictionary path where is the json that contains the entity casing modes. 'lowercase': Converts all characters to lower case using the rules of the default locale. 'uppercase': Converts all characters to upper case using the rules of the default locale. 'capitalize': Converts the first character to upper case and converts others to lower case. 'titlecase': Converts the first character in every token to upper case and converts others to lower case.

  193. def setFakerLengthOffset(value: Int): DeIdentification.this.type

    Sets fakerLengthOffset param

    Sets fakerLengthOffset param

    Definition Classes
    BaseDeidParams
  194. def setFixedMaskLength(value: Int): DeIdentification.this.type

    fixed mask length: this is the length of the masking sequence that will be used when the 'fixed_length_chars' masking policy is selected.

    fixed mask length: this is the length of the masking sequence that will be used when the 'fixed_length_chars' masking policy is selected.

    Definition Classes
    DeIdentificationParams
  195. def setGenderAwareness(value: Boolean): DeIdentification.this.type

    Whether to use gender-aware names or not during obfuscation.

    Whether to use gender-aware names or not during obfuscation. This param effects only names. If value is true, it might decrease performance. Default: False

    Definition Classes
    BaseDeidParams
  196. def setGroupByCol(value: String): DeIdentification.this.type

    Sets groupByCol param to group the dataset.

    Sets groupByCol param to group the dataset. This parameter is used in conjunction with consistentObfuscation to ensure consistent obfuscation within each group.

    Definition Classes
    DeIdentificationParams
    Note

    This functionality can change order of the dataset, so it is recommended to use it with caution.

    ,

    This functionality cannot be supported by LightPipeline.

  197. def setIgnoreRegex(s: Boolean): DeIdentification.this.type

    Select if you want to use regex file loaded in the model.

    Select if you want to use regex file loaded in the model. If true the default regex file will be not used The default value is false.

    Definition Classes
    DeIdentificationParams
  198. final def setInputCols(value: String*): DeIdentification.this.type
    Definition Classes
    HasInputAnnotationCols
  199. def setInputCols(value: Array[String]): DeIdentification.this.type
    Definition Classes
    HasInputAnnotationCols
  200. def setIsRandomDateDisplacement(s: Boolean): DeIdentification.this.type

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities, if false use the DeIdentificationParams.days The default value is false.

    Use a random displacement days in dates entities,that random number is based on the DeIdentificationParams.seed If true use random displacement days in dates entities, if false use the DeIdentificationParams.days The default value is false.

    Definition Classes
    DeIdentificationParams
  201. def setKeepMonth(value: Boolean): DeIdentification.this.type

    Sets whether to keep the month intact when obfuscating date entities.

    Sets whether to keep the month intact when obfuscating date entities. If true, the month will remain unchanged during the obfuscation process. If false, the month will be modified along with the year and day. Default: false.

    Definition Classes
    DeIdentificationParams
  202. def setKeepTextSizeForObfuscation(value: Boolean): DeIdentification.this.type

    Sets keepTextSizeForObfuscation param

    Definition Classes
    BaseDeidParams
  203. def setKeepYear(value: Boolean): DeIdentification.this.type

    Sets whether to keep the year intact when obfuscating date entities.

    Sets whether to keep the year intact when obfuscating date entities. If true, the year will remain unchanged during the obfuscation process. If false, the year will be modified along with the month and day. Default: false.

    Definition Classes
    DeIdentificationParams
  204. def setLanguage(s: String): DeIdentification.this.type

    The language used to select the regex file and some faker entities.

    The language used to select the regex file and some faker entities. 'en'(English),'de'(German), 'es'(Spanish), 'fr'(French), 'ar'(Arabic) or 'ro'(Romanian). Default:'en'

    Definition Classes
    BaseDeidParams
  205. def setLazyAnnotator(value: Boolean): DeIdentification.this.type
    Definition Classes
    CanBeLazy
  206. def setMappingsColumn(s: String): DeIdentification.this.type

    This is the mapping column that will return the Annotations chunks with the fake entities

    This is the mapping column that will return the Annotations chunks with the fake entities

    Definition Classes
    DeIdentificationParams
  207. def setMaskingPolicy(value: String): DeIdentification.this.type

    Select the masking policy:

    Select the masking policy:

    • 'entity_labels': Replace the values with the entity value.
    • 'same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.If the entity is less than 3 chars (like Jo, or 5), we can just use asterix without brackets.
    • 'fixed_length_chars': Replace the obfuscated entity with a masking sequence composed of a fixed number of asterisk.
    • Default: 'entity_labels'
    Definition Classes
    DeIdentificationParams
  208. def setMetadataMaskingPolicy(value: String): DeIdentification.this.type

    If specified, the metadata includes the masked form of the document.

    If specified, the metadata includes the masked form of the document. Select the following masking policy if you want to return mask form in the metadata:

    • 'entity_labels': Replace the values with the entity value.
    • 'same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.If the entity is less than 3 chars (like Jo, or 5), we can just use asterix without brackets.
    • 'fixed_length_chars': Replace the obfuscated entity with a masking sequence composed of a fixed number of asterisk.
    • Default: ""
    Definition Classes
    DeIdentificationParams
  209. def setMinYear(s: Int): DeIdentification.this.type

    Minimum year to use when converting date to year

    Minimum year to use when converting date to year

    Definition Classes
    DeIdentificationParams
  210. def setMode(m: String): DeIdentification.this.type

    Mode for Anonymizer ['mask'|'obfuscate'].

    Mode for Anonymizer ['mask'|'obfuscate']. Default: 'mask'

    • Mask mode: The entities will be replaced by their entity types.
    • Obfuscate mode: The entity is replaced by an obfuscator's term.
    Definition Classes
    DeIdentificationParams
    Example:
    1. Given the following text: "David Hale visited EEUU a couple of years ago"

      • Mask mode: "<PERSON> visited <COUNTRY> a couple of years ago"
      • Obfuscate mode: "Bryan Johnson visited Japan a couple of years ago"
  211. def setObfuscateByAgeGroups(value: Boolean): DeIdentification.this.type

    Sets whether to obfuscate ages based on age groups.

    Sets whether to obfuscate ages based on age groups.

    When true, the age groups specified in the ageGroups parameter will be used to obfuscate ages. When false, the age ranges specified in the ageRanges parameter will be used to obfuscate ages. Default: false.

    Definition Classes
    DeIdentificationParams
  212. def setObfuscateDate(s: Boolean): DeIdentification.this.type

    When mode=="obfuscate" whether to obfuscate dates or not.

    When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to true, make sure dateFormats param fits the needs. If the value is true and obfuscation is failed, then DeIdentificationParams.unnormalizedDateMode will be activated. When setting to 'false' then the date will be masked to <DATE> Default: false

    Definition Classes
    DeIdentificationParams
  213. def setObfuscateRefFile(f: String): DeIdentification.this.type

    File with the terms to be used for Obfuscation

    File with the terms to be used for Obfuscation

    Definition Classes
    DeidApproachParams
  214. def setObfuscateRefSource(s: String): DeIdentification.this.type

    The source of obfuscation to obfuscate the entities.

    The source of obfuscation to obfuscate the entities. The values are the following: 'file': Takes the entities from the obfuscatorRefFile 'faker': Takes the entities from the Faker module 'both': Takes the entities from the obfuscatorRefFile and the faker module randomly.

    Definition Classes
    BaseDeidParams
  215. def setObfuscationStrategyOnException(value: String): DeIdentification.this.type

    Sets the obfuscation strategy to be applied when an exception occurs.

    Sets the obfuscation strategy to be applied when an exception occurs.

    The obfuscation strategy determines how obfuscation is handled in case of an exception. Four possible values are supported:

    • "mask": The original chunk is replaced with a masking pattern.
    • "default": The original chunk is replaced with a default faker.
    • "skip": The original chunk is not replaced with any faker.
    • "exception": Throws the exception.

    The default obfuscation strategy is "default".

    Definition Classes
    DeIdentificationParams
  216. def setOutputAsDocument(mode: Boolean): DeIdentification.this.type

    Whether to return all sentences joined into a single document

    Whether to return all sentences joined into a single document

    Definition Classes
    DeIdentificationParams
  217. final def setOutputCol(value: String): DeIdentification.this.type
    Definition Classes
    HasOutputAnnotationCol
  218. def setRefFileFormat(f: String): DeIdentification.this.type

    File with the terms to be used for Obfuscation

    File with the terms to be used for Obfuscation

    Definition Classes
    DeidApproachParams
  219. def setRefSep(f: String): DeIdentification.this.type

    Separator character for the csv reference file for Obfuscation de default value is "#"

    Separator character for the csv reference file for Obfuscation de default value is "#"

    Definition Classes
    DeidApproachParams
  220. def setRegexOverride(s: Boolean): DeIdentification.this.type

    If the value is true, prioritize the regex entities; if the value is false, prioritize the ner.

    If the value is true, prioritize the regex entities; if the value is false, prioritize the ner. The default value is false. If DeIdentification.combineRegexPatterns is true, this value will be invalid.

    Definition Classes
    DeIdentificationParams
  221. def setRegexPatternsDictionary(path: String, readAs: Format = ReadAs.TEXT, options: Map[String, String] = Map()): DeIdentification.this.type

    dictionary with regular expression patterns that match some protected entity.When the field is not set then a default regex file will be used.

    dictionary with regular expression patterns that match some protected entity.When the field is not set then a default regex file will be used.

    path

    the string path where the file is allocated.

    readAs

    Format of the the reader

    options

    options to apply to the reader.

  222. def setRegexPatternsDictionary(path: ExternalResource): DeIdentification.this.type

    dictionary with regular expression patterns that match some protected entity.When the field is not set then a default regex file will be used.

    dictionary with regular expression patterns that match some protected entity.When the field is not set then a default regex file will be used.

    path

    the external resource where the file is allocated.

    See also

    ExternalResource

  223. def setRegexPatternsDictionaryAsJsonString(json: String): DeIdentification.this.type

    dictionary with regular expression patterns given as JSON that match some protected entity.When the field is not set then a default regex file will be used.

    dictionary with regular expression patterns given as JSON that match some protected entity.When the field is not set then a default regex file will be used.

    json

    regex(s) as JSON format

  224. def setRegion(s: String): DeIdentification.this.type

    With this property, you can select particular dateFormats.

    With this property, you can select particular dateFormats. This property is especially used when obfuscating dates. You can decide whether the first part of 11/11/2023 is a day or the second part is a day when obfuscating dates. The values are following: 'eu' for European Union 'us' for USA Default: 'eu'

    Definition Classes
    DeIdentificationParams
  225. def setReturnEntityMappings(s: Boolean): DeIdentification.this.type

    With this property, you can select if you want to return mapping column.

    With this property, you can select if you want to return mapping column.

    Definition Classes
    DeIdentificationParams
  226. def setSameEntityThreshold(s: Double): DeIdentification.this.type

    Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same (default: 0.9) For date entities this method doesn't apply.

    Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same (default: 0.9) For date entities this method doesn't apply.

    Definition Classes
    DeIdentificationParams
  227. def setSameLengthFormattedEntities(entities: Array[String]): DeIdentification.this.type

    List of formatted entities to generate the same length outputs as original ones during obfuscation.

    List of formatted entities to generate the same length outputs as original ones during obfuscation. The supported and default formatted entities are: PHONE, FAX, CONTACT, ID, IDNUM, BIOID, MEDICALRECORD, ZIP, VIN, SSN, DLN, LICENSE, PLATE, IRS, CFN, ACCOUNT.

    Definition Classes
    BaseDeidParams
  228. def setSeed(s: Int): DeIdentification.this.type

    It is the seed to select the entities on obfuscate mode.

    It is the seed to select the entities on obfuscate mode. With the seed, you can reply to an execution several times with the same output.

    Definition Classes
    BaseDeidParams
  229. def setSelectiveObfuscationModesPath(path: String): DeIdentification.this.type

    Dictionary path where is the json that contains the selective obfuscation modes

  230. def setUnnormalizedDateMode(mode: String): DeIdentification.this.type

    The mode to use if the date is not formatted.

    The mode to use if the date is not formatted. [mask, obfuscate, skip] Default: obfuscate

    Definition Classes
    DeIdentificationParams
  231. def setUseShiftDays(s: Boolean): DeIdentification.this.type
    Definition Classes
    DeIdentificationParams
  232. def setZipCodeTag(s: String): DeIdentification.this.type
    Definition Classes
    DeIdentificationParams
  233. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  234. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  235. def train(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): DeIdentificationModel

    Returns the DeIdentificationModel Transformer, that can be used to transform input datasets

    Returns the DeIdentificationModel Transformer, that can be used to transform input datasets

    The dataset provided to the fit method should have one chunk per row and contain the following columns: Document, Tokens, Chunks

    This method is called inside the AnnotatorApproach's fit method

    dataset

    a Dataset containing ChunkTokens, ChunkEmbeddings, ClassifierLabel, ResolverLabel, [ResolverNormalized]

    recursivePipeline

    an instance of PipelineModel

    returns

    a trained DeIdentificationModel

    Definition Classes
    DeIdentification → AnnotatorApproach
  236. def transformRegexPatternsDictionary(regexPatternsDictionary: Array[(String, String)]): Map[String, Array[String]]
  237. final def transformSchema(schema: StructType): StructType
    Definition Classes
    AnnotatorApproach → PipelineStage
  238. def transformSchema(schema: StructType, logging: Boolean): StructType
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  239. val uid: String
    Definition Classes
    DeIdentification → Identifiable
  240. val unnormalizedDateMode: Param[String]

    The mode to use if the date is not formatted.

    The mode to use if the date is not formatted. [mask, obfuscate, skip] Default: obfuscate

    Definition Classes
    DeIdentificationParams
  241. val useShifDays: BooleanParam

    Use shift days : Whether to use the random shift day when the document has this in its metadata.

    Use shift days : Whether to use the random shift day when the document has this in its metadata. Default: False

    Definition Classes
    DeIdentificationParams
  242. def validate(schema: StructType): Boolean
    Attributes
    protected
    Definition Classes
    AnnotatorApproach
  243. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  244. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  245. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  246. def write: MLWriter
    Definition Classes
    DefaultParamsWritable → MLWritable
  247. val zipCodeTag: Param[String]
    Definition Classes
    DeIdentificationParams

Deprecated Value Members

  1. def setUseShiftDayse(s: Boolean): DeIdentification.this.type
    Definition Classes
    DeIdentificationParams
    Annotations
    @deprecated
    Deprecated

    deprecated because of typo

Inherited from CheckLicense

Inherited from HandleExceptionParams

Inherited from DeidApproachParams

Inherited from DeIdentificationParams

Inherited from HasFeatures

Inherited from BaseDeidParams

Inherited from AnnotatorApproach[DeIdentificationModel]

Inherited from CanBeLazy

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from HasOutputAnnotatorType

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from Estimator[DeIdentificationModel]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Annotator types

Required input and expected output annotator types

Members

Parameter setters

Parameter getters