class LightDeIdentification extends AnnotatorModel[LightDeIdentification] with HasSimpleAnnotate[LightDeIdentification] with DeidModelParams with LightDeIdentificationParams with CheckLicense

Light DeIdentification is a light version of DeIdentification. It replaces sensitive information in a text with obfuscated or masked fakers. It is designed to work with healthcare data, and it can be used to de-identify patient names, dates, and other sensitive information. It can also be used to obfuscate or mask any other type of sensitive information, such as doctor names, hospital names, and other types of sensitive information.

Additionally, it supports millions of embedded fakers and If desired, custom external fakers can be set with LightDeIdentificationParams.setCustomFakers .

It also supports multiple languages such as English, Spanish, French, German, and Arabic. And it supports multi-mode de-identification with LightDeIdentificationParams.setSelectiveObfuscationModes at the same time.

Example:

val documentAssembler = new DocumentAssembler()
  .setInputCol("text").setOutputCol("document")

val sentenceDetector = new SentenceDetector()
  .setInputCols(Array("document")).setOutputCol("sentence")

val tokenizer = new Tokenizer()
  .setInputCols(Array("sentence")).setOutputCol("token")

val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
  .setInputCols(Array("sentence", "token")).setOutputCol("embeddings")

val clinical_sensitive_entities = MedicalNerModel.pretrained("ner_deid_enriched", "en", "clinical/models")
  .setInputCols(Array("sentence", "token", "embeddings")).setOutputCol("ner")

val nerConverter = new NerConverterInternal()
  .setInputCols(Array("sentence", "token", "ner")).setOutputCol("chunk")

val deIdentification = new LightDeIdentification()
  .setInputCols(Array("chunk", "sentence")).setOutputCol("dei")
  .setMode("obfuscate")
  .setObfuscateDate(true)
  .setDays(5)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  sentenceDetector,
  tokenizer,
  embeddings,
  clinical_sensitive_entities,
  nerConverter,
  deIdentification
))
import spark.implicits._
val data = Seq("""
  |Record date: 2093-01-13, David Hale, M.D., Name: Hendrickson Ora.
  | MR # 7194334 Date: 01/13/93. PCP: Oliveira, 25 years-old, Record date: 2079-11-09.
  |Cocke County Baptist Hospital, 0295 Keats Street, Phone 55-555-5555.""".stripMargin
).toDF("text")

val result = pipeline.fit(data).transform(data)
result.selectExpr("explode(dei) as result").show(truncate = false)

Results:

+--------------------------------------------------------------------------------------------------------------------------------------------------+
|result                                                                                                                                            |
+--------------------------------------------------------------------------------------------------------------------------------------------------+
|{document, 0, 69, Record date: 2093-01-18, Chestine Spore, M.D., Name: Sallyanne Havers., {sentence -> 0, originalIndex -> 2}, []}                |
|{document, 70, 97, MR # 8469629 Date: 01/18/93., {sentence -> 1, originalIndex -> 71}, []}                                                        |
|{document, 98, 156, PCP: Derrill Center, 38 years-old, Record date: 2079-11-14., {sentence -> 2, originalIndex -> 100}, []}                       |
|{document, 157, 237, SELECT SPECIALTY HOSPITAL - DALLAS (GARLAND), 101 Hospital Rd, Phone 52-841-3244., {sentence -> 3, originalIndex -> 155}, []}|
+--------------------------------------------------------------------------------------------------------------------------------------------------+
Exceptions thrown

java.security.NoSuchAlgorithmException If no Provider supports a SecureRandom implementation for specified algorithm name. See for more information and parameters DeidModelParams and LightDeIdentificationParams

Note

If the mode is set to obfuscate, the LightDeIdentification uses java.security.SecureRandom for generating fake data. You can select a generation algorithm by configuring the system environment variable SPARK_NLP_JSL_SEED_ALGORITHM. The chosen algorithm may impact the generation of fake data, performance, and potential blocking issues. For information about standard RNG algorithm names, refer to the SecureRandom section in the Number Generation Algorithm. The default algorithm is 'SHA1PRNG'.

See also

DeidModelParams

LightDeIdentificationParams

Linear Supertypes
CheckLicense, LightDeIdentificationParams, DeidModelParams, BaseDeidParams, HasSimpleAnnotate[LightDeIdentification], AnnotatorModel[LightDeIdentification], CanBeLazy, RawAnnotator[LightDeIdentification], HasOutputAnnotationCol, HasInputAnnotationCols, HasOutputAnnotatorType, ParamsAndFeaturesWritable, HasFeatures, DefaultParamsWritable, MLWritable, Model[LightDeIdentification], Transformer, PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. LightDeIdentification
  2. CheckLicense
  3. LightDeIdentificationParams
  4. DeidModelParams
  5. BaseDeidParams
  6. HasSimpleAnnotate
  7. AnnotatorModel
  8. CanBeLazy
  9. RawAnnotator
  10. HasOutputAnnotationCol
  11. HasInputAnnotationCols
  12. HasOutputAnnotatorType
  13. ParamsAndFeaturesWritable
  14. HasFeatures
  15. DefaultParamsWritable
  16. MLWritable
  17. Model
  18. Transformer
  19. PipelineStage
  20. Logging
  21. Params
  22. Serializable
  23. Serializable
  24. Identifiable
  25. AnyRef
  26. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new LightDeIdentification()
  2. new LightDeIdentification(uid: String)

    uid

    a unique identifier for the instanced Annotator

    Exceptions thrown

    java.security.NoSuchAlgorithmException If no Provider supports a SecureRandom implementation for specified algorithm name. See for more information and parameters DeidModelParams and LightDeIdentificationParams

Type Members

  1. type AnnotationContent = Seq[Row]
    Attributes
    protected
    Definition Classes
    AnnotatorModel
  2. type AnnotatorType = String
    Definition Classes
    HasOutputAnnotatorType

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T
    Attributes
    protected
    Definition Classes
    Params
  4. def $$[T](feature: StructFeature[T]): T
    Attributes
    protected
    Definition Classes
    HasFeatures
  5. def $$[K, V](feature: MapFeature[K, V]): Map[K, V]
    Attributes
    protected
    Definition Classes
    HasFeatures
  6. def $$[T](feature: SetFeature[T]): Set[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  7. def $$[T](feature: ArrayFeature[T]): Array[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  8. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  9. def _transform(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): DataFrame
    Attributes
    protected
    Definition Classes
    AnnotatorModel
  10. def afterAnnotate(dataset: DataFrame): DataFrame
    Attributes
    protected
    Definition Classes
    AnnotatorModel
  11. val ageRanges: IntArrayParam

    List of integers specifying limits of the age groups to preserve during obfuscation

    List of integers specifying limits of the age groups to preserve during obfuscation

    Definition Classes
    BaseDeidParams
  12. def annotate(annotations: Seq[Annotation]): Seq[Annotation]
    Definition Classes
    LightDeIdentification → HasSimpleAnnotate
  13. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  14. def beforeAnnotate(dataset: Dataset[_]): Dataset[_]
    Attributes
    protected
    Definition Classes
    AnnotatorModel
  15. final def checkSchema(schema: StructType, inputAnnotatorType: String): Boolean
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  16. def checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit
    Definition Classes
    CheckLicense
  17. def checkValidScope(scope: String): Unit
    Definition Classes
    CheckLicense
  18. def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  19. def checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  20. final def clear(param: Param[_]): LightDeIdentification.this.type
    Definition Classes
    Params
  21. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  22. def copy(extra: ParamMap): LightDeIdentification
    Definition Classes
    RawAnnotator → Model → Transformer → PipelineStage → Params
  23. def copyValues[T <: Params](to: T, extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  24. val customFakers: MapFeature[String, Array[String]]

    The dictionary of custom fakers to specify the obfuscation terms for the entities.

    The dictionary of custom fakers to specify the obfuscation terms for the entities. You can specify the entity and the terms to be used for obfuscation.

    Definition Classes
    LightDeIdentificationParams
  25. val dateEntities: StringArrayParam

    List of date entities.

    List of date entities. Default: Array("DATE", "DOB", "DOD")

    Definition Classes
    LightDeIdentificationParams
  26. val dateFormats: StringArrayParam

    Format of dates to displace

    Format of dates to displace

    Definition Classes
    BaseDeidParams
  27. val days: IntParam

    Number of days to obfuscate the dates by displacement.

    Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used

    Definition Classes
    BaseDeidParams
  28. final def defaultCopy[T <: Params](extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  29. def dfAnnotate: UserDefinedFunction
    Definition Classes
    HasSimpleAnnotate
  30. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  31. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  32. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  33. def explainParams(): String
    Definition Classes
    Params
  34. def extraValidate(structType: StructType): Boolean
    Attributes
    protected
    Definition Classes
    RawAnnotator
  35. def extraValidateMsg: String
    Attributes
    protected
    Definition Classes
    RawAnnotator
  36. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  37. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  38. val features: ArrayBuffer[Feature[_, _, _]]
    Definition Classes
    HasFeatures
  39. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  40. val fixedMaskLength: IntParam

    Select the fixed mask length: this is the length of the masking sequence that will be used when the 'fixed_length_chars' masking policy is selected.

    Select the fixed mask length: this is the length of the masking sequence that will be used when the 'fixed_length_chars' masking policy is selected.

    Definition Classes
    LightDeIdentificationParams
  41. val genderAwareness: BooleanParam

    Whether to use gender-aware names or not during obfuscation.

    Whether to use gender-aware names or not during obfuscation. This param effects only names. If value is true, it might decrease performance. Default: False

    Definition Classes
    BaseDeidParams
  42. def generateFakeBySameLength(wordToReplace: String, entity: String): String

    obfuscating digits to new digits, letters to new letters and others remains the same

    obfuscating digits to new digits, letters to new letters and others remains the same

    Definition Classes
    DeidModelParams
  43. def generateFakeBySameLengthUsingHash(wordToReplace: String, entity: String): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  44. def get[T](feature: StructFeature[T]): Option[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  45. def get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  46. def get[T](feature: SetFeature[T]): Option[Set[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  47. def get[T](feature: ArrayFeature[T]): Option[Array[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  48. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  49. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  50. def getCustomFakers: Map[String, Array[String]]

    Gets customFakers param.

    Gets customFakers param.

    Attributes
    protected
    Definition Classes
    LightDeIdentificationParams
  51. def getDateEntities: Array[String]

    Gets dateEntities param.

    Gets dateEntities param.

    Definition Classes
    LightDeIdentificationParams
  52. def getDateFormats: Array[String]
    Definition Classes
    BaseDeidParams
  53. def getDays: Int
    Definition Classes
    BaseDeidParams
  54. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  55. def getFakeByHashcode(fakes: Seq[String], wordToReplace: String, entity: String, seed: Int): String
    Attributes
    protected
    Definition Classes
    DeidModelParams
  56. def getFakersEntity(entity: String, result: String): Seq[String]
    Definition Classes
    DeidModelParams
  57. def getFixedMaskLength: Int

    Gets fixedMaskLength param.

    Gets fixedMaskLength param.

    Definition Classes
    LightDeIdentificationParams
  58. def getInputCols: Array[String]
    Definition Classes
    HasInputAnnotationCols
  59. def getLanguage: String
    Definition Classes
    BaseDeidParams
  60. def getLazyAnnotator: Boolean
    Definition Classes
    CanBeLazy
  61. def getMaskingPolicy: String

    Gets maskingPolicy param.

    Gets maskingPolicy param.

    Definition Classes
    LightDeIdentificationParams
  62. def getMode: String

    Gets mode param.

    Gets mode param.

    Definition Classes
    LightDeIdentificationParams
  63. def getObfuscateDate: Boolean

    Gets obfuscateDate param

    Gets obfuscateDate param

    Definition Classes
    LightDeIdentificationParams
  64. def getObfuscateRefSource: String
    Definition Classes
    BaseDeidParams
  65. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  66. final def getOutputCol: String
    Definition Classes
    HasOutputAnnotationCol
  67. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  68. def getRegion: String

    Gets region param.

    Gets region param.

    Definition Classes
    LightDeIdentificationParams
  69. def getSameLengthFormattedEntities(): Array[String]
    Definition Classes
    BaseDeidParams
  70. def getSeed(): Int
    Definition Classes
    BaseDeidParams
  71. def getSelectiveObfuscationModes: Option[Map[String, Array[String]]]

    Gets selectiveObfuscationModes param.

  72. def getUnnormalizedDateMode: String

    Gets unnormalizedDateMode param.

  73. def getUseShiftDays: Boolean

    Gets useShiftDays param.

    Gets useShiftDays param.

    Definition Classes
    LightDeIdentificationParams
  74. def getValidAgeRanges: Array[Int]

    Gets valid ageRanges whether ageRangesByHipaa is true or not.

    Gets valid ageRanges whether ageRangesByHipaa is true or not.

    Definition Classes
    LightDeIdentificationDeidModelParams
  75. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  76. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  77. def hasParent: Boolean
    Definition Classes
    Model
  78. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  79. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  80. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  81. val inputAnnotatorTypes: Array[String]
    Definition Classes
    LightDeIdentification → HasInputAnnotationCols
  82. final val inputCols: StringArrayParam
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  83. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  84. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  85. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  86. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  87. val language: Param[String]

    The language used to select the regex file and some faker entities.

    The language used to select the regex file and some faker entities. 'en'(English),'de'(German), 'es'(Spanish), 'fr'(French), 'ar'(Arabic) or 'ro'(Romanian) Default:'en'

    Definition Classes
    BaseDeidParams
  88. val lazyAnnotator: BooleanParam
    Definition Classes
    CanBeLazy
  89. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  90. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  91. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  92. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  93. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  94. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  95. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  96. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  97. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  98. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  99. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  100. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  101. val maskingPolicy: Param[String]

    Select the masking policy:

    Select the masking policy:

    • 'entity_labels': Replace the values with the entity value.
    • 'same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.If the entity is less than 3 chars (like Jo, or 5), we can just use asterix without brackets.
    • 'fixed_length_chars': Replace the obfuscated entity with a masking sequence composed of a fixed number of asterisk.
    • Default: 'entity_labels'
    Definition Classes
    LightDeIdentificationParams
  102. val mode: Param[String]

    Mode for Anonymizer ['mask' or 'obfuscate'].

    Mode for Anonymizer ['mask' or 'obfuscate']. Default: 'mask'

    • Mask mode: The entities will be replaced by their entity types.
    • Obfuscate mode: The entity is replaced by an obfuscator's term.
    Definition Classes
    LightDeIdentificationParams
    Example:
    1. Given the following text: "David Hale visited EEUU a couple of years ago"

      • Mask mode: "<PERSON> visited <COUNTRY> a couple of years ago"
      • Obfuscate mode: "Bryan Johnson visited Japan a couple of years ago"
  103. def msgHelper(schema: StructType): String
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  104. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  105. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  106. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  107. val obfuscateDate: BooleanParam

    When mode=="obfuscate" whether to obfuscate dates or not.

    When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to true, make sure dateFormats param fits the needs. If the value is true and obfuscation is failed, then unnormalizedDateMode will be activated. When setting to 'false', then the date will be masked to <DATE>. Default: false

    Definition Classes
    LightDeIdentificationParams
  108. val obfuscateRefSource: Param[String]

    The source of obfuscation to obfuscate the entities.

    The source of obfuscation to obfuscate the entities. The values ar the following: 'file': Takes the entities from the obfuscatorRefFile 'faker': Takes the entities from the Faker module 'both': Takes the entities from the obfuscatorRefFile and the faker module randomly.

    Definition Classes
    BaseDeidParams
  109. def onWrite(path: String, spark: SparkSession): Unit
    Attributes
    protected
    Definition Classes
    ParamsAndFeaturesWritable
  110. val optionalInputAnnotatorTypes: Array[String]
    Definition Classes
    HasInputAnnotationCols
  111. val outputAnnotatorType: String
    Definition Classes
    LightDeIdentification → HasOutputAnnotatorType
  112. final val outputCol: Param[String]
    Attributes
    protected
    Definition Classes
    HasOutputAnnotationCol
  113. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  114. var parent: Estimator[LightDeIdentification]
    Definition Classes
    Model
  115. val random: SecureRandom
    Attributes
    protected
    Definition Classes
    DeidModelParams
  116. val region: Param[String]

    With this property, you can select particular dateFormats.

    With this property, you can select particular dateFormats. This property is especially used when obfuscating dates. You can decide whether the first part of 11/11/2023 is a day or the second part is a day when obfuscating dates.

    • The values are following:
    • 'eu' for European Union
    • 'us' for USA
    Definition Classes
    LightDeIdentificationParams
  117. val sameLengthFormattedEntities: StringArrayParam

    List of formatted entities to generate the same length outputs as original ones during obfuscation.

    List of formatted entities to generate the same length outputs as original ones during obfuscation. The supported and default formatted entities are: "phone", "fax", "contact," "id", "idnum", "bioid", "medicalrecord", "zip", "vin", "ssn", "dln", "plate", "license", "IRS", "CFN", "account".

    Definition Classes
    BaseDeidParams
  118. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  119. val seed: IntParam

    It is the seed to select the entities on obfuscate mode.

    It is the seed to select the entities on obfuscate mode. With the seed, you can reply to an execution several times with the same output.

    Definition Classes
    BaseDeidParams
  120. val selectiveObfuscationModes: StructFeature[Map[String, Array[String]]]

    The dictionary of modes to enable multi-mode deidentification.

    The dictionary of modes to enable multi-mode deidentification.

    • 'obfuscate': Replace the values with random values.
    • 'mask_same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.
    • 'entity_labels': Replace the values with the entity value.
    • 'mask_fixed_length_chars': Replace the name with the asterix with fixed length. You can also invoke "setFixedMaskLength()"
    • 'skip': Skip the entities (intact)

    The entities which have not been given in dictionary will deidentify according to setMode()

    Definition Classes
    LightDeIdentificationParams
  121. def set[T](feature: StructFeature[T], value: T): LightDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  122. def set[K, V](feature: MapFeature[K, V], value: Map[K, V]): LightDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  123. def set[T](feature: SetFeature[T], value: Set[T]): LightDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  124. def set[T](feature: ArrayFeature[T], value: Array[T]): LightDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  125. final def set(paramPair: ParamPair[_]): LightDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    Params
  126. final def set(param: String, value: Any): LightDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    Params
  127. final def set[T](param: Param[T], value: T): LightDeIdentification.this.type
    Definition Classes
    Params
  128. def setAgeRanges(mode: Array[Int]): LightDeIdentification.this.type

    List of integers specifying limits of the age groups to preserve during obfuscation

    List of integers specifying limits of the age groups to preserve during obfuscation

    Definition Classes
    BaseDeidParams
  129. def setCustomFakers(value: HashMap[String, List[String]]): LightDeIdentification.this.type
    Definition Classes
    LightDeIdentificationParams
  130. def setCustomFakers(value: Map[String, Array[String]]): LightDeIdentification.this.type

    Sets the value of customFakers.

    Sets the value of customFakers. The dictionary of custom fakers to specify the obfuscation terms for the entities. You can specify the entity and the terms to be used for obfuscation.

    Example:

    new LightDeIdentification()
     .setInputCols(Array("ner_chunk", "sentence")).setOutputCol("dei")
     .setMode("obfuscate")
     .setObfuscateRefSource("custom")
     .setCustomFakers(Map(
         "NAME" -> Array("George", "Taylor"),
         "SCHOOL" -> Array("Oxford", "Harvard"),
         "city" -> Array("ROMA")
     ))
    Definition Classes
    LightDeIdentificationParams
  131. def setDateEntities(value: Array[String]): LightDeIdentification.this.type

    Sets the value of dateEntities.

    Sets the value of dateEntities. Default: Array("DATE", "DOB", "DOD")

    Definition Classes
    LightDeIdentificationParams
  132. def setDateFormats(s: Array[String]): LightDeIdentification.this.type

    Format of dates to displace

    Format of dates to displace

    Definition Classes
    BaseDeidParams
  133. def setDays(k: Int): LightDeIdentification.this.type

    Number of days to obfuscate the dates by displacement.

    Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used

    Definition Classes
    BaseDeidParams
  134. def setDefault[T](feature: StructFeature[T], value: () ⇒ T): LightDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  135. def setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): LightDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  136. def setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): LightDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  137. def setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): LightDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  138. final def setDefault(paramPairs: ParamPair[_]*): LightDeIdentification.this.type
    Attributes
    protected
    Definition Classes
    Params
  139. final def setDefault[T](param: Param[T], value: T): LightDeIdentification.this.type
    Attributes
    protected[org.apache.spark.ml]
    Definition Classes
    Params
  140. def setFixedMaskLength(value: Int): LightDeIdentification.this.type

    Sets the value of fixedMaskLength.

    Sets the value of fixedMaskLength. This is the length of the masking sequence that will be used when the 'fixed_length_chars' masking policy is selected.

    Definition Classes
    LightDeIdentificationParams
  141. def setGenderAwareness(value: Boolean): LightDeIdentification.this.type

    Whether to use gender-aware names or not during obfuscation.

    Whether to use gender-aware names or not during obfuscation. This param effects only names. If value is true, it might decrease performance. Default: False

    Definition Classes
    BaseDeidParams
  142. final def setInputCols(value: String*): LightDeIdentification.this.type
    Definition Classes
    HasInputAnnotationCols
  143. def setInputCols(value: Array[String]): LightDeIdentification.this.type
    Definition Classes
    HasInputAnnotationCols
  144. def setLanguage(s: String): LightDeIdentification.this.type

    The language used to select the regex file and some faker entities.

    The language used to select the regex file and some faker entities. 'en'(English),'de'(German), 'es'(Spanish), 'fr'(French), 'ar'(Arabic) or 'ro'(Romanian). Default:'en'

    Definition Classes
    BaseDeidParams
  145. def setLazyAnnotator(value: Boolean): LightDeIdentification.this.type
    Definition Classes
    CanBeLazy
  146. def setMaskingPolicy(value: String): LightDeIdentification.this.type

    Select the masking policy:

    Select the masking policy:

    • 'entity_labels': Replace the values with the entity value.
    • 'same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.If the entity is less than 3 chars (like Jo, or 5), we can just use asterix without brackets.
    • 'fixed_length_chars': Replace the obfuscated entity with a masking sequence composed of a fixed number of asterisk.
    • Default: 'entity_labels'
    Definition Classes
    LightDeIdentificationParams
  147. def setMode(m: String): LightDeIdentification.this.type

    Mode for Anonymizer ['mask'|'obfuscate'].

    Mode for Anonymizer ['mask'|'obfuscate']. Default: 'mask'

    • Mask mode: The entities will be replaced by their entity types.
    • Obfuscate mode: The entity is replaced by an obfuscator's term.
    Definition Classes
    LightDeIdentificationParams
    Example:
    1. Given the following text: "David Hale visited EEUU a couple of years ago"

      • Mask mode: "<PERSON> visited <COUNTRY> a couple of years ago"
      • Obfuscate mode: "Bryan Johnson visited Japan a couple of years ago"
  148. def setObfuscateDate(s: Boolean): LightDeIdentification.this.type

    When mode=="obfuscate" whether to obfuscate dates or not.

    When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to true, make sure dateFormats param fits the needs. If the value is true and obfuscation is failed, then unnormalizedDateMode will be activated. When setting to 'false' then the date will be masked to <DATE> . Default: false

    Definition Classes
    LightDeIdentificationParams
  149. def setObfuscateRefSource(s: String): LightDeIdentification.this.type

    The source of obfuscation to obfuscate the entities.

    The source of obfuscation to obfuscate the entities. The values are the following: 'file': Takes the entities from the obfuscatorRefFile 'faker': Takes the entities from the Faker module 'both': Takes the entities from the obfuscatorRefFile and the faker module randomly.

    Definition Classes
    BaseDeidParams
  150. final def setOutputCol(value: String): LightDeIdentification.this.type
    Definition Classes
    HasOutputAnnotationCol
  151. def setParent(parent: Estimator[LightDeIdentification]): LightDeIdentification
    Definition Classes
    Model
  152. def setRegion(s: String): LightDeIdentification.this.type

    With this property, you can select particular dateFormats.

    With this property, you can select particular dateFormats. This property is especially used when obfuscating dates. You can decide whether the first part of 11/11/2023 is a day or the second part is a day when obfuscating dates. The values are following:

    • 'eu' for European Union
    • 'us' for USA
    Definition Classes
    LightDeIdentificationParams
  153. def setSameLengthFormattedEntities(entities: Array[String]): LightDeIdentification.this.type

    List of formatted entities to generate the same length outputs as original ones during obfuscation.

    List of formatted entities to generate the same length outputs as original ones during obfuscation. The supported and default formatted entities are: PHONE, FAX, CONTACT, ID, IDNUM, BIOID, MEDICALRECORD, ZIP, VIN, SSN, DLN, LICENSE, PLATE, IRS, CFN, ACCOUNT.

    Definition Classes
    BaseDeidParams
  154. def setSeed(s: Int): LightDeIdentification.this.type

    It is the seed to select the entities on obfuscate mode.

    It is the seed to select the entities on obfuscate mode. With the seed, you can reply to an execution several times with the same output.

    Definition Classes
    DeidModelParamsBaseDeidParams
  155. def setSelectiveObfuscationModes(value: HashMap[String, List[String]]): LightDeIdentification.this.type
    Definition Classes
    LightDeIdentificationParams
  156. def setSelectiveObfuscationModes(value: Map[String, Array[String]]): LightDeIdentification.this.type

    Sets the value of selectiveObfuscationModes.

    Sets the value of selectiveObfuscationModes. The dictionary of modes to enable multi-mode deidentification.

    • 'obfuscate': Replace the values with random values.
    • 'mask_same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.
    • 'entity_labels': Replace the values with the entity value.
    • 'mask_fixed_length_chars': Replace the name with the asterix with fixed length. You should also invoke "setFixedMaskLength()"
    • 'skip': Skip the entities (intact)

    The entities which have not been given in dictionary will deidentify according to setMode()

    Example:

    val deIdentification = new LightDeIdentification()
     .setInputCols(Array("ner_chunk", "sentence")).setOutputCol("dei")
     .setMode("mask")
     .setSelectiveObfuscationModes(Map(
         "OBFUSCATE" -> Array("PHONE", "email"),
         "mask_entity_labels" -> Array("NAME", "CITY"),
         "skip" -> Array("id", "idnum"),
         "mask_same_length_chars" -> Array("fax"),
         "mask_fixed_length_chars" -> Array("zip")
     ))
     .setFixedMaskLength(4)
    Definition Classes
    LightDeIdentificationParams
  157. def setUnnormalizedDateMode(mode: String): LightDeIdentification.this.type

    The mode to use if the date is not formatted.

    The mode to use if the date is not formatted. Options: [mask, obfuscate, skip] Default: obfuscate

    Definition Classes
    LightDeIdentificationParams
  158. def setUseShiftDays(s: Boolean): LightDeIdentification.this.type

    Sets the value of useShiftDays.

    Sets the value of useShiftDays. Whether to use the random shift day when the document has this in its metadata. DocumentHashCoder can create 'dateshift' based on the document. Default: false

    Definition Classes
    LightDeIdentificationParams
  159. val supportedFormattedEntities: Array[String]
    Attributes
    protected
    Definition Classes
    BaseDeidParams
  160. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  161. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  162. final def transform(dataset: Dataset[_]): DataFrame
    Definition Classes
    AnnotatorModel → Transformer
  163. def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" )
  164. def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" ) @varargs()
  165. final def transformSchema(schema: StructType): StructType
    Definition Classes
    RawAnnotator → PipelineStage
  166. def transformSchema(schema: StructType, logging: Boolean): StructType
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  167. val uid: String
    Definition Classes
    LightDeIdentification → Identifiable
  168. val unnormalizedDateMode: Param[String]

    The mode to use if the date is not formatted.

    The mode to use if the date is not formatted. Options: [mask, obfuscate, skip] Default: obfuscate

    Definition Classes
    LightDeIdentificationParams
  169. val useShiftDays: BooleanParam

    Whether to use the random shift day when the document has this in its metadata.

    Whether to use the random shift day when the document has this in its metadata. DocumentHashCoder can create 'dateshift' based on the document. Default: false

    Definition Classes
    LightDeIdentificationParams
  170. def validate(schema: StructType): Boolean
    Attributes
    protected
    Definition Classes
    RawAnnotator
  171. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  172. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  173. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  174. def wrapColumnMetadata(col: Column): Column
    Attributes
    protected
    Definition Classes
    RawAnnotator
  175. def write: MLWriter
    Definition Classes
    ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable

Inherited from CheckLicense

Inherited from DeidModelParams

Inherited from BaseDeidParams

Inherited from HasSimpleAnnotate[LightDeIdentification]

Inherited from AnnotatorModel[LightDeIdentification]

Inherited from CanBeLazy

Inherited from RawAnnotator[LightDeIdentification]

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from HasOutputAnnotatorType

Inherited from ParamsAndFeaturesWritable

Inherited from HasFeatures

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from Model[LightDeIdentification]

Inherited from Transformer

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

Members

Parameter setters

Parameter getters