trait LightDeIdentificationParams extends Params
A trait that contains params that LightDeIdentification has.
- Self Type
- LightDeIdentificationParams with HasFeatures
- See also
- Grouped
- Alphabetic
- By Inheritance
- LightDeIdentificationParams
- Params
- Serializable
- Serializable
- Identifiable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Abstract Value Members
Concrete Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
$[T](param: Param[T]): T
- Attributes
- protected
- Definition Classes
- Params
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
final
def
clear(param: Param[_]): LightDeIdentificationParams.this
- Definition Classes
- Params
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
copyValues[T <: Params](to: T, extra: ParamMap): T
- Attributes
- protected
- Definition Classes
- Params
-
val
customFakers: MapFeature[String, Array[String]]
The dictionary of custom fakers to specify the obfuscation terms for the entities.
The dictionary of custom fakers to specify the obfuscation terms for the entities. You can specify the entity and the terms to be used for obfuscation.
-
val
dateEntities: StringArrayParam
List of date entities.
List of date entities. Default: Array("DATE", "DOB", "DOD")
-
final
def
defaultCopy[T <: Params](extra: ParamMap): T
- Attributes
- protected
- Definition Classes
- Params
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
explainParam(param: Param[_]): String
- Definition Classes
- Params
-
def
explainParams(): String
- Definition Classes
- Params
-
final
def
extractParamMap(): ParamMap
- Definition Classes
- Params
-
final
def
extractParamMap(extra: ParamMap): ParamMap
- Definition Classes
- Params
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
val
fixedMaskLength: IntParam
Select the fixed mask length: this is the length of the masking sequence that will be used when the 'fixed_length_chars' masking policy is selected.
-
final
def
get[T](param: Param[T]): Option[T]
- Definition Classes
- Params
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getCustomFakers: Map[String, Array[String]]
Gets customFakers param.
Gets customFakers param.
- Attributes
- protected
-
def
getDateEntities: Array[String]
Gets dateEntities param.
-
final
def
getDefault[T](param: Param[T]): Option[T]
- Definition Classes
- Params
-
def
getFixedMaskLength: Int
Gets fixedMaskLength param.
-
def
getMaskingPolicy: String
Gets maskingPolicy param.
-
def
getMode: String
Gets mode param.
-
def
getObfuscateDate: Boolean
Gets obfuscateDate param
-
final
def
getOrDefault[T](param: Param[T]): T
- Definition Classes
- Params
-
def
getParam(paramName: String): Param[Any]
- Definition Classes
- Params
-
def
getRegion: String
Gets region param.
-
def
getSelectiveObfuscationModes: Option[Map[String, Array[String]]]
Gets selectiveObfuscationModes param.
-
def
getUnnormalizedDateMode: String
Gets unnormalizedDateMode param.
-
def
getUseShiftDays: Boolean
Gets useShiftDays param.
-
final
def
hasDefault[T](param: Param[T]): Boolean
- Definition Classes
- Params
-
def
hasParam(paramName: String): Boolean
- Definition Classes
- Params
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isDefined(param: Param[_]): Boolean
- Definition Classes
- Params
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
isSet(param: Param[_]): Boolean
- Definition Classes
- Params
-
val
maskingPolicy: Param[String]
Select the masking policy:
Select the masking policy:
- 'entity_labels': Replace the values with the entity value.
- 'same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.If the entity is less than 3 chars (like Jo, or 5), we can just use asterix without brackets.
- 'fixed_length_chars': Replace the obfuscated entity with a masking sequence composed of a fixed number of asterisk.
- Default: 'entity_labels'
-
val
mode: Param[String]
Mode for Anonymizer ['mask' or 'obfuscate'].
Mode for Anonymizer ['mask' or 'obfuscate']. Default: 'mask'
- Mask mode: The entities will be replaced by their entity types.
- Obfuscate mode: The entity is replaced by an obfuscator's term.
Given the following text: "David Hale visited EEUU a couple of years ago"
- Mask mode: "
<
PERSON>
visited<
COUNTRY>
a couple of years ago" - Obfuscate mode: "Bryan Johnson visited Japan a couple of years ago"
- Mask mode: "
Example: -
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
val
obfuscateDate: BooleanParam
When mode=="obfuscate" whether to obfuscate dates or not.
When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to
true
, make sure dateFormats param fits the needs. If the value is true and obfuscation is failed, then unnormalizedDateMode will be activated. When setting to 'false', then the date will be masked to <DATE>. Default: false -
lazy val
params: Array[Param[_]]
- Definition Classes
- Params
-
val
region: Param[String]
With this property, you can select particular dateFormats.
With this property, you can select particular dateFormats. This property is especially used when obfuscating dates. You can decide whether the first part of 11/11/2023 is a day or the second part is a day when obfuscating dates.
- The values are following:
- 'eu' for European Union
- 'us' for USA
-
val
selectiveObfuscationModes: StructFeature[Map[String, Array[String]]]
The dictionary of modes to enable multi-mode deidentification.
The dictionary of modes to enable multi-mode deidentification.
- 'obfuscate': Replace the values with random values.
- 'mask_same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.
- 'entity_labels': Replace the values with the entity value.
- 'mask_fixed_length_chars': Replace the name with the asterix with fixed length. You can also invoke "setFixedMaskLength()"
- 'skip': Skip the entities (intact)
The entities which have not been given in dictionary will deidentify according to setMode()
-
final
def
set(paramPair: ParamPair[_]): LightDeIdentificationParams.this
- Attributes
- protected
- Definition Classes
- Params
-
final
def
set(param: String, value: Any): LightDeIdentificationParams.this
- Attributes
- protected
- Definition Classes
- Params
-
final
def
set[T](param: Param[T], value: T): LightDeIdentificationParams.this
- Definition Classes
- Params
- def setCustomFakers(value: HashMap[String, List[String]]): LightDeIdentificationParams.this
-
def
setCustomFakers(value: Map[String, Array[String]]): LightDeIdentificationParams.this
Sets the value of customFakers.
Sets the value of customFakers. The dictionary of custom fakers to specify the obfuscation terms for the entities. You can specify the entity and the terms to be used for obfuscation.
Example:
new LightDeIdentification() .setInputCols(Array("ner_chunk", "sentence")).setOutputCol("dei") .setMode("obfuscate") .setObfuscateRefSource("custom") .setCustomFakers(Map( "NAME" -> Array("George", "Taylor"), "SCHOOL" -> Array("Oxford", "Harvard"), "city" -> Array("ROMA") ))
-
def
setDateEntities(value: Array[String]): LightDeIdentificationParams.this
Sets the value of dateEntities.
Sets the value of dateEntities. Default: Array("DATE", "DOB", "DOD")
-
final
def
setDefault(paramPairs: ParamPair[_]*): LightDeIdentificationParams.this
- Attributes
- protected
- Definition Classes
- Params
-
final
def
setDefault[T](param: Param[T], value: T): LightDeIdentificationParams.this
- Attributes
- protected[org.apache.spark.ml]
- Definition Classes
- Params
-
def
setFixedMaskLength(value: Int): LightDeIdentificationParams.this
Sets the value of fixedMaskLength.
Sets the value of fixedMaskLength. This is the length of the masking sequence that will be used when the 'fixed_length_chars' masking policy is selected.
-
def
setMaskingPolicy(value: String): LightDeIdentificationParams.this
Select the masking policy:
Select the masking policy:
- 'entity_labels': Replace the values with the entity value.
- 'same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.If the entity is less than 3 chars (like Jo, or 5), we can just use asterix without brackets.
- 'fixed_length_chars': Replace the obfuscated entity with a masking sequence composed of a fixed number of asterisk.
- Default: 'entity_labels'
-
def
setMode(m: String): LightDeIdentificationParams.this
Mode for Anonymizer ['mask'|'obfuscate'].
Mode for Anonymizer ['mask'|'obfuscate']. Default: 'mask'
- Mask mode: The entities will be replaced by their entity types.
- Obfuscate mode: The entity is replaced by an obfuscator's term.
Given the following text: "David Hale visited EEUU a couple of years ago"
- Mask mode: "
<
PERSON>
visited<
COUNTRY>
a couple of years ago" - Obfuscate mode: "Bryan Johnson visited Japan a couple of years ago"
- Mask mode: "
Example: -
def
setObfuscateDate(s: Boolean): LightDeIdentificationParams.this
When mode=="obfuscate" whether to obfuscate dates or not.
When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to
true
, make sure dateFormats param fits the needs. If the value is true and obfuscation is failed, then unnormalizedDateMode will be activated. When setting to 'false' then the date will be masked to <DATE> . Default: false -
def
setRegion(s: String): LightDeIdentificationParams.this
With this property, you can select particular dateFormats.
With this property, you can select particular dateFormats. This property is especially used when obfuscating dates. You can decide whether the first part of 11/11/2023 is a day or the second part is a day when obfuscating dates. The values are following:
- 'eu' for European Union
- 'us' for USA
- def setSelectiveObfuscationModes(value: HashMap[String, List[String]]): LightDeIdentificationParams.this
-
def
setSelectiveObfuscationModes(value: Map[String, Array[String]]): LightDeIdentificationParams.this
Sets the value of selectiveObfuscationModes.
Sets the value of selectiveObfuscationModes. The dictionary of modes to enable multi-mode deidentification.
- 'obfuscate': Replace the values with random values.
- 'mask_same_length_chars': Replace the name with the asterix with same length minus two plus brackets on both end.
- 'entity_labels': Replace the values with the entity value.
- 'mask_fixed_length_chars': Replace the name with the asterix with fixed length. You should also invoke "setFixedMaskLength()"
- 'skip': Skip the entities (intact)
The entities which have not been given in dictionary will deidentify according to setMode()
Example:
val deIdentification = new LightDeIdentification() .setInputCols(Array("ner_chunk", "sentence")).setOutputCol("dei") .setMode("mask") .setSelectiveObfuscationModes(Map( "OBFUSCATE" -> Array("PHONE", "email"), "mask_entity_labels" -> Array("NAME", "CITY"), "skip" -> Array("id", "idnum"), "mask_same_length_chars" -> Array("fax"), "mask_fixed_length_chars" -> Array("zip") )) .setFixedMaskLength(4)
-
def
setUnnormalizedDateMode(mode: String): LightDeIdentificationParams.this
The mode to use if the date is not formatted.
The mode to use if the date is not formatted. Options: [mask, obfuscate, skip] Default: obfuscate
-
def
setUseShiftDays(s: Boolean): LightDeIdentificationParams.this
Sets the value of useShiftDays.
Sets the value of useShiftDays. Whether to use the random shift day when the document has this in its metadata. DocumentHashCoder can create 'dateshift' based on the document. Default: false
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- Identifiable → AnyRef → Any
-
val
unnormalizedDateMode: Param[String]
The mode to use if the date is not formatted.
The mode to use if the date is not formatted. Options: [mask, obfuscate, skip] Default: obfuscate
-
val
useShiftDays: BooleanParam
Whether to use the random shift day when the document has this in its metadata.
Whether to use the random shift day when the document has this in its metadata. DocumentHashCoder can create 'dateshift' based on the document. Default: false
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()