deid

package deid

Linear Supertypes

AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

deid
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Type Members

trait BaseDeidParams extends Params with HasFeatures
A trait that contains all the params that are common in DeIdentificationParams and ObfuscatorParams.
A trait that contains all the params that are common in DeIdentificationParams and ObfuscatorParams.

See also
DeIdentificationParams
ObfuscatorParams
DeidModelParams
class DateChunkObfuscator extends AnnotatorModel[DateChunkObfuscator] with HasSimpleAnnotate[DateChunkObfuscator] with CheckLicense
class DateShiftFiller extends Serializable
Utility class to fill missing or empty shift values in a DataFrame column using deterministic, ID-based pseudo-random values.
Utility class to fill missing or empty shift values in a DataFrame column using deterministic, ID-based pseudo-random values.
This is particularly useful in de-identification tasks where date shift values must be: - Consistent across all rows with the same identifier - Present even when the original shift value is missing or empty
Logic
For each row:
- If another row with the same ID has a known (non-empty) shift value, reuse it.
- If not, generate a fallback shift value deterministically using the ID and a seed. The generated value will always fall in the range [1, maxShiftDays].
Example usage
```
val filler = new DateShiftFiller(spark, seed = 42, maxShiftDays = 60)
val resultDf = filler.fillMissingShifts(df, "note_id", "date_shift", "_filled")
```

class DeIdentification extends AnnotatorApproach[DeIdentificationModel] with DeIdentificationParams with DeidApproachParams with HandleExceptionParams with CheckLicense

Contains all the methods for training a DeIdentificationModel model.

Contains all the methods for training a DeIdentificationModel model. This module can obfuscate or mask the entities that contains personal information. These can be set with a file of regex patterns with setRegexPatternsDictionary, where each line is a mapping of entity to regex.

DATE \d{4}
AID \d{6,7}

Additionally, obfuscation strings can be defined with DeidApproachParams.setObfuscateRefFile, where each line is a mapping of string to entity. The format and seperator can be speficied with DeidApproachParams.setRefFileFormat and DeidApproachParams.setRefSep.

Dr. Gregory House#DOCTOR
01010101#MEDICALRECORD

The configuration params for that module are in trait DeIdentificationParams.

Exceptions thrown

java.security.NoSuchAlgorithmException If no Provider supports a SecureRandom implementation for specified algorithm name.

Note

If the mode is set to obfuscate, the DeIdentification uses java.security.SecureRandom for generating fake data. You can select a generation algorithm by configuring the system environment variable SPARK_NLP_JSL_SEED_ALGORITHM. The chosen algorithm may impact the generation of fake data, performance, and potential blocking issues. For information about standard RNG algorithm names, refer to the SecureRandom section in the Number Generation Algorithm. The default algorithm is 'SHA1PRNG'.

See also

DeIdentificationModel

DeIdentificationParams

DeidApproachParams

train Ideally this annotator works in conjunction with Demographic Named EntityRecognizers that can be trained either using TextMatchers, RegexMatchers, DateMatchers, NerCRFs or NerDLs Example of pipeline for deidentification.

Example

val documentAssembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val sentenceDetector = new SentenceDetector()
    .setInputCols(Array("document"))
    .setOutputCol("sentence")
    .setUseAbbreviations(true)

val tokenizer = new Tokenizer()
    .setInputCols(Array("sentence"))
    .setOutputCol("token")

val embeddings = WordEmbeddingsModel
    .pretrained("embeddings_clinical", "en", "clinical/models")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")

Ner entities

 val clinical_sensitive_entities = MedicalNerModel.pretrained("ner_deid_enriched", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings")).setOutputCol("ner")

 val nerConverter = new NerConverter()
     .setInputCols(Array("sentence", "token", "ner"))
     .setOutputCol("ner_chunk")

Deidentification

val deIdentification = new DeIdentification()
    .setInputCols(Array("ner_chunk", "token", "sentence"))
    .setOutputCol("dei")
    // file with custom regex patterns for custom entities
    .setRegexPatternsDictionary("path/to/dic_regex_patterns_main_categories.txt")
    // file with custom obfuscator names for the entities
    .setObfuscateRefFile("path/to/obfuscate_fixed_entities.txt")
    .setRefFileFormat("csv")
    .setRefSep("#")
    .setMode("obfuscate")
    .setDateFormats(Array("MM/dd/yy","yyyy-MM-dd"))
    .setObfuscateDate(true)
    .setDateTag("DATE")
    .setDays(5)
    .setObfuscateRefSource("file")

Pipeline

val data = Seq(
  "# 7194334 Date : 01/13/93 PCP : Oliveira , 25 years-old , Record date : 2079-11-09."
).toDF("text")

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  sentenceDetector,
  tokenizer,
  embeddings,
  clinical_sensitive_entities,
  nerConverter,
  deIdentification
))
val result = pipeline.fit(data).transform(data)


result.select("dei.result").show(truncate = false)

Show Results

result.select("dei.result").show(truncate = false)
+--------------------------------------------------------------------------------------------------+
|result                                                                                            |
+--------------------------------------------------------------------------------------------------+
|[# 01010101 Date : 01/18/93 PCP : Dr. Gregory House , <AGE> years-old , Record date : 2079-11-14.]|
+--------------------------------------------------------------------------------------------------+

class DeIdentificationModel extends AnnotatorModel[DeIdentificationModel] with DeIdentificationParams with DeidModelParams with HasSimpleAnnotate[DeIdentificationModel] with HandleExceptionParams with HasSafeAnnotate[DeIdentificationModel] with CheckLicense
Contains all the parameters to transform a dataset with three Input Annotations of types DOCUMENT, TOKEN and CHUNK, into its DeIdentified version of by either masking or obfuscating the given CHUNKS.
Contains all the parameters to transform a dataset with three Input Annotations of types DOCUMENT, TOKEN and CHUNK, into its DeIdentified version of by either masking or obfuscating the given CHUNKS.
To create an configured DeIdentificationModel, please see the example of DeIdentification.

See also
BaseDeidParams to see params
DeIdentificationParams to see params
DeidModelParams to see params
DeIdentification to train your own model
trait DeIdentificationParams extends BaseDeidParams with MaskingParams with HasFeatures
A trait that contains all the params that are common between DeIdentificationModel and DeIdentification annotators.
A trait that contains all the params that are common between DeIdentificationModel and DeIdentification annotators.

See also
DeIdentification
DeIdentificationModel
BaseDeidParams
trait DeidApproachParams extends Params
A trait that contains all the params that are common in DeIdentification and NameChunkObfuscatorApproach, and ObfuscatorAnnotatorApproach.
A trait that contains all the params that are common in DeIdentification and NameChunkObfuscatorApproach, and ObfuscatorAnnotatorApproach.

See also
DeIdentification
ObfuscatorAnnotatorApproach
NameChunkObfuscatorApproach
trait DeidModelParams extends AnyRef
A trait that contains all the params that are common in DeIdentificationModel and ObfuscatorAnnotatorModel.
A trait that contains all the params that are common in DeIdentificationModel and ObfuscatorAnnotatorModel.

See also
DeIdentificationModel
LightDeIdentification
BaseDeidParams to see params
class DocumentHashCoder extends Model[DocumentHashCoder] with RawAnnotator[DocumentHashCoder]

class LightDeIdentification extends AnnotatorModel[LightDeIdentification] with HasSimpleAnnotate[LightDeIdentification] with LightDeIdentificationParams with DeidModelParams with CheckLicense

Light DeIdentification is a light version of DeIdentification.

Light DeIdentification is a light version of DeIdentification. It replaces sensitive information in a text with obfuscated or masked fakers. It is designed to work with healthcare data, and it can be used to de-identify patient names, dates, and other sensitive information. It can also be used to obfuscate or mask any other type of sensitive information, such as doctor names, hospital names, and other types of sensitive information.

Additionally, it supports millions of embedded fakers and If desired, custom external fakers can be set with LightDeIdentificationParams.setCustomFakers .

It also supports multiple languages such as English, Spanish, French, German, and Arabic. And it supports multi-mode de-identification with LightDeIdentificationParams.setSelectiveObfuscationModes at the same time.

Example:

val documentAssembler = new DocumentAssembler()
  .setInputCol("text").setOutputCol("document")

val sentenceDetector = new SentenceDetector()
  .setInputCols(Array("document")).setOutputCol("sentence")

val tokenizer = new Tokenizer()
  .setInputCols(Array("sentence")).setOutputCol("token")

val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
  .setInputCols(Array("sentence", "token")).setOutputCol("embeddings")

val clinical_sensitive_entities = MedicalNerModel.pretrained("ner_deid_enriched", "en", "clinical/models")
  .setInputCols(Array("sentence", "token", "embeddings")).setOutputCol("ner")

val nerConverter = new NerConverterInternal()
  .setInputCols(Array("sentence", "token", "ner")).setOutputCol("chunk")

val deIdentification = new LightDeIdentification()
  .setInputCols(Array("chunk", "sentence")).setOutputCol("dei")
  .setMode("obfuscate")
  .setObfuscateDate(true)
  .setDays(5)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  sentenceDetector,
  tokenizer,
  embeddings,
  clinical_sensitive_entities,
  nerConverter,
  deIdentification
))
import spark.implicits._
val data = Seq("""
  |Record date: 2093-01-13, David Hale, M.D., Name: Hendrickson Ora.
  | MR # 7194334 Date: 01/13/93. PCP: Oliveira, 25 years-old, Record date: 2079-11-09.
  |Cocke County Baptist Hospital, 0295 Keats Street, Phone 55-555-5555.""".stripMargin
).toDF("text")

val result = pipeline.fit(data).transform(data)
result.selectExpr("explode(dei) as result").show(truncate = false)

Results:

+--------------------------------------------------------------------------------------------------------------------------------------------------+
|result                                                                                                                                            |
+--------------------------------------------------------------------------------------------------------------------------------------------------+
|{document, 0, 69, Record date: 2093-01-18, Chestine Spore, M.D., Name: Sallyanne Havers., {sentence -> 0, originalIndex -> 2}, []}                |
|{document, 70, 97, MR # 8469629 Date: 01/18/93., {sentence -> 1, originalIndex -> 71}, []}                                                        |
|{document, 98, 156, PCP: Derrill Center, 38 years-old, Record date: 2079-11-14., {sentence -> 2, originalIndex -> 100}, []}                       |
|{document, 157, 237, SELECT SPECIALTY HOSPITAL - DALLAS (GARLAND), 101 Hospital Rd, Phone 52-841-3244., {sentence -> 3, originalIndex -> 155}, []}|
+--------------------------------------------------------------------------------------------------------------------------------------------------+

Exceptions thrown: java.security.NoSuchAlgorithmException If no Provider supports a SecureRandom implementation for specified algorithm name. See for more information and parameters DeidModelParams and LightDeIdentificationParams
Note: If the mode is set to obfuscate, the LightDeIdentification uses java.security.SecureRandom for generating fake data. You can select a generation algorithm by configuring the system environment variable SPARK_NLP_JSL_SEED_ALGORITHM. The chosen algorithm may impact the generation of fake data, performance, and potential blocking issues. For information about standard RNG algorithm names, refer to the SecureRandom section in the Number Generation Algorithm. The default algorithm is 'SHA1PRNG'.
See also: DeidModelParams
LightDeIdentificationParams

trait LightDeIdentificationParams extends BaseDeidParams with MaskingParams with Params
A trait that contains params that LightDeIdentification has.
A trait that contains params that LightDeIdentification has.

See also
LightDeIdentification
trait MaskingParams extends Params
case class MySentnece(content: String, start: Int, end: Int, index: Int, originalIndex: Int) extends Product with Serializable
class NameChunkObfuscator extends AnnotatorModel[NameChunkObfuscator] with HasSimpleAnnotate[NameChunkObfuscator] with NameChunkObfuscatorParams with CheckLicense
Contains all the parameters to transform a dataset with an Input Annotation of type CHUNK, into its obfuscated version of by obfuscating the given CHUNKS.
Contains all the parameters to transform a dataset with an Input Annotation of type CHUNK, into its obfuscated version of by obfuscating the given CHUNKS. Model can obfuscate the given names, remain others same.
To create an configured NameChunkObfuscator, please see the example of NameChunkObfuscatorApproach.

See also
NameChunkObfuscatorParams to see params
NameChunkObfuscatorApproach to train your own model

class NameChunkObfuscatorApproach extends AnnotatorApproach[NameChunkObfuscator] with NameChunkObfuscatorParams with DeidApproachParams with CheckLicense

Contains all the methods for training a NameChunkObfuscator model.

Contains all the methods for training a NameChunkObfuscator model. This module can replace name entities with consistent fakers. Additionally, obfuscation names can be defined with setObfuscateRefFile, where each line is a mapping of name. The format and seperator can be speficied with setRefFileFormat and setRefSep.

George#NAME
Taylor#NAME

The configuration params for that module are in trait NameChunkObfuscatorParams.

See also

NameChunkObfuscator

NameChunkObfuscatorParams

DeidApproachParams See Spark NLP Workshop for more examples of usage.

Example

 val data = Seq("John Davies is a 62 y.o. patient admitted." +
  "He was seen by attending physician Dr. Lorand and was scheduled for emergency assessment.")
  .toDF("text")

val documentAssembler = new DocumentAssembler()
 .setInputCol("text")
 .setOutputCol("sentence")

val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")

val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
.setInputCols(Array("sentence", "token"))
.setOutputCol("embeddings")

val clinical_ner = MedicalNerModel.pretrained("ner_deid_generic_augmented", "en", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")

val ner_converter_name = new NerConverterInternal()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")

NameChunkObfuscatorApproach

val nameChunkObfuscator = new NameChunkObfuscatorApproach()
.setInputCols("ner_chunk")
.setOutputCol("replacement")
.setRefFileFormat("csv")
.setObfuscateRefFile("obfuscator_names.txt")
.setRefSep("#")
.setObfuscateRefSource("both")
.setLanguage("en")

val replacer_name = new Replacer()
.setInputCols("replacement", "sentence")
.setOutputCol("obfuscated_document_name")
.setUseReplacement(true)

Pipeline

 val pipeline = new Pipeline().setStages(Array(
 documentAssembler,
 tokenizer,
 word_embeddings,
 clinical_ner,
 ner_converter_name,
 nameChunkObfuscator,
 replacer_name
 ))

 val result = pipeline.fit(data).transform(data)
 result.select("text").show(false)
 result.selectExpr("explode(document_normalized.result) as normalized_text").show(false)
+-----------------------------------------------------------------------------------------------------------------------------------+
|text                                                                                                                               |
+-----------------------------------------------------------------------------------------------------------------------------------+
|John Davies is a 62 y.o. patient admitted.He was seen by attending physician Dr. Lorand and was scheduled for emergency assessment.|
+-----------------------------------------------------------------------------------------------------------------------------------+

+-------------------------------------------------------------------------------------------------------------------------------------+
|result                                                                                                                               |
+-------------------------------------------------------------------------------------------------------------------------------------+
|[Charlestine is a 62 y.o. patient admitted.He was seen by attending physician Dr. Lowery and was scheduled for emergency assessment.]|
+-------------------------------------------------------------------------------------------------------------------------------------+

trait NameChunkObfuscatorParams extends Params
A trait that contains all the params that are common between NameChunkObfuscatorApproach and NameChunkObfuscator annotators
A trait that contains all the params that are common between NameChunkObfuscatorApproach and NameChunkObfuscator annotators

Attributes
protected
See also
NameChunkObfuscatorApproach
NameChunkObfuscator
class ObfuscatorAnnotatorApproach extends AnnotatorApproach[ObfuscatorAnnotatorModel] with DeidApproachParams with ObfuscatorParams
class ObfuscatorAnnotatorModel extends AnnotatorModel[ObfuscatorAnnotatorModel] with ObfuscatorParams with DeidModelParams with HasSimpleAnnotate[ObfuscatorAnnotatorModel]
trait ObfuscatorParams extends BaseDeidParams
A trait that contains all the params that are common in ObfuscatorAnnotatorModel and ObfuscatorAnnotatorApproach
A trait that contains all the params that are common in ObfuscatorAnnotatorModel and ObfuscatorAnnotatorApproach

Attributes
protected
See also
ObfuscatorAnnotatorModel
ObfuscatorAnnotatorApproach
BaseDeidParams

class ReIdentification extends AnnotatorModel[DeIdentificationModel] with HasSimpleAnnotate[DeIdentificationModel] with CheckLicense

Reidentifies obfuscated entities by DeIdentification.

Reidentifies obfuscated entities by DeIdentification. This annotator requires the outputs from the deidentification as input. Input columns need to be the deidentified document and the deidentification mappings set with DeIdentification.setMappingsColumn. To see how the entities are deidentified, please refer to the example of that class.

Example

Define the reidentification stage and transform the deidentified documents

val reideintification = new ReIdentification()
  .setInputCols("dei", "protectedEntities")
  .setOutputCol("reid")
  .transform(result)

Show results

result.select("dei.result").show(truncate = false)
+--------------------------------------------------------------------------------------------------+
|result                                                                                            |
+--------------------------------------------------------------------------------------------------+
|[# 01010101 Date : 01/18/93 PCP : Dr. Gregory House , <AGE> years-old , Record date : 2079-11-14.]|
+--------------------------------------------------------------------------------------------------+

reideintification.selectExpr("explode(reid.result)").show(false)
+-----------------------------------------------------------------------------------+
|col                                                                                |
+-----------------------------------------------------------------------------------+
|# 7194334 Date : 01/13/93 PCP : Oliveira , 25 years-old , Record date : 2079-11-09.|
+-----------------------------------------------------------------------------------+

See also: DeIdentification for deidentification of entities

trait ReadablePretrainedDeId extends ParamsAndFeaturesReadable[DeIdentificationModel] with HasPretrained[DeIdentificationModel]
trait ReadsFeatures extends ParamsAndFeaturesReadable[DeIdentificationModel]

class Replacer extends AnnotatorModel[NerQuestionGenerator] with HasSimpleAnnotate[NerQuestionGenerator] with CheckLicense

Replaces entities in the original text with new ones.

This class allows to replace entities in the original text with the ones obtained with, for example, DeIdentificationModel or DateNormalizer.

Example

val documentAssembler = DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("sentence")

val tokenizer = Tokenizer()
    .setInputCols("sentence")
    .setOutputCol("token")

val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
        .setInputCols(["sentence", "token"])
        .setOutputCol("embeddings")

val clinical_ner = MedicalNerModel.pretrained("ner_deid_generic_augmented", "en", "clinical/models")
        .setInputCols(["sentence", "token", "embeddings"])
        .setOutputCol("ner")

val ner_converter_name = NerConverterInternal()
        .setInputCols(["sentence","token","ner"])
        .setOutputCol("ner_chunk")

val nameChunkObfuscator = NameChunkObfuscatorApproach()
    .setInputCols("ner_chunk")
    .setOutputCol("replacement")
    .setRefFileFormat("csv")
    .setObfuscateRefFile("names_test.txt")
    .setRefSep("#")

val replacer_name = Replacer()
    .setInputCols("replacement","sentence")
    .setOutputCol("obfuscated_document_name")
    .setUseReplacement(True)

val nlpPipeline = new Pipeline().setStages=Array(
        documentAssembler,
        tokenizer,
        word_embeddings,
        clinical_ner,
        ner_converter_name,
        nameChunkObfuscator,
        replacer_name,
        ))

val empty_data = spark.createDataFrame([[""]]).toDF("text")
val model_chunck_obfuscator = nlpPipeline.fit(empty_data)
val sample_text = "John Davies is a 62 y.o. patient admitted. Mr. Davies was seen by attending physician Dr. Lorand and was scheduled for emergency assessment."
val lmodel = new LightPipeline(model_chunck_obfuscator)
val res = lmodel.fullAnnotate(sample_text)
"Original text.  : " + res[0]['sentence'][0].result)
"Obfuscated text : " + res[0]['obfuscated_document_name'][0].result)
Original text.  :  John Davies is a 62 y.o. patient admitted. Mr. Davies was seen by attending physician Dr. Lorand and was scheduled for emergency assessment.
Obfuscated text :  Fitzpatrick is a <AGE> y.o. patient admitted. Mr. Bowman was seen by attending physician Dr. Acosta and was scheduled for emergency assessment.

case class SentenceMaxException(message: String = "", cause: Throwable = None.orNull) extends Exception with Product with Serializable
case class StructuredDeidentification(columnsMap: Map[String, String], seedMap: Map[String, Int] = Collections.emptyMap(), obfuscateRefFile: String = "", obfuscateRefSource: String = "both", days: Int = 0, useRandomDateDisplacement: Boolean = false, dateFormats: List[String] = ..., language: String = Language.English, idColumn: String = "", region: String = "", keepYear: Boolean = false, keepMonth: Boolean = false, unnormalizedDateMode: String = "obfuscate", keepTextSizeForObfuscation: Boolean = false, fakerLengthOffset: Int = 3, genderAwareness: Boolean = false, ageRangesByHipaa: Boolean = false, consistentAcrossNameParts: Boolean = true, selectiveObfuscateRefSource: Map[String, String] = Collections.emptyMap()) extends Product with Serializable
Utility class that helps to obfuscate tabular data.
Utility class that helps to obfuscate tabular data.
columnsMap
It is a map that allows to select the name of the column with the entity. The key of the the map is the column in the dataframe and the value of the map is the entity for that column. The default entities are:
- |Entity | description |
- |location| A general location.|
- |location-other| A location that is not country, street,hospital,city or state|
- |street| A street|
- |hospital| The name of a hospital.|
- |city| A city|
- |state|A state|
- |zip| The zip code|
- |country| A country|
- |contact| The contact of one person|
- |username|A username |
- |phone| A number phone.|
- |fax| The number fax|
- |url| A url for internet|
- |email| The email of one person|
- |profession| A profession of one person|
- |name| The name opf one person|
- |doctor|The name of a doctor|
- |patient| The name of the patient|
- |first_name| The first name of one person|
- |last_name| The last name of one person|
- |id| A general Id number|
- |bioid|Is a system to screen for protein interactions as they occur in living cells|
- |age|The age of something or someone|
- |organization| Name of one organization o company|
- |healthplan| The id that identify the healthplan|
- |medicalrecord| The identification of a medical record|
- |device|The id that identified a device|
- |date| A general date|
- |ssn| A Social Security Number|
- |ip| A Internet Protocol|
- |passport| A random passport|
- |dln| A Driver's License Number |
- |npi| A National Provider Identifier|
- |c_card| A credit card number|
- |iban| A International Bank Account Number|
- |dea| A Drug Enforcement Administration| If is not present will be masked.
seedMap
Allow to add a seed to the column that you want to obfuscate. The seed used to randomly select the entities used during obfuscation mode. By providing the same seed, you can replicate the same mapping multiple times.
obfuscateRefFile
This is an optional parameter that allows to add your own terms to be used for obfuscation. The file contains as a key the entity and as the value the terms that will be used in the obfuscation.
obfuscateRefSource
The source of obfuscation of to obfuscate the entities.For dates entities doesnt apply tha method. The allowed values are the following:
- 'file': Takes the entities from the obfuscatorRefFile
- 'faker': Takes the entities from the Faker module
- 'both': Takes the entities from the obfuscatorRefFile and the faker module randomly.
days
Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used.
useRandomDateDisplacement
Use a random displacement days in dates entities. If true, use random displacement days in dates entities, otherwise use the days parameter.
dateFormats
Format of dates to displaceFormat of dates to displace.
language
The language used to select faker entities. The values are the following:
- 'en'(English),
- 'de'(German),
- 'es'(Spanish),
- 'fr'(French),
- 'ar'(Arabic)
- 'ro'(Romanian). Default:'en'
idColumn
The column that contains the id of the row. If provided, data will obfuscate consistently by idColumn, especially date entities.
region
With this property, you can select particular dateFormats. This property is especially used when obfuscating dates. The values are the following: 'eu' for European Union, 'us' for USA.
keepYear
Whether to keep the year intact when obfuscating date entities. If true, the year will remain unchanged during the obfuscation process.
keepMonth
Whether to keep the month intact when obfuscating date entities. If true, the month will remain unchanged during the obfuscation process.
unnormalizedDateMode
The mode to use if the date is not formatted. The values are the following: 'mask', 'obfuscate', 'skip'. Default: obfuscate.
keepTextSizeForObfuscation
Whether the output should maintain the same character length as the input text.
fakerLengthOffset
It specifies how much length deviation is accepted in obfuscation, with keepTextSizeForObfuscation enabled. Value must be greater than 0. Default is 3.
genderAwareness
Whether to use gender-aware names or not during obfuscation. This param effects only names. If value is true, it might decrease performance.
ageRangesByHipaa
Whether to obfuscate ages based on HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule.
consistentAcrossNameParts
Whether to keep the same name across different parts of the name (e.g., first name, last name) when obfuscating names.
selectiveObfuscateRefSource
A map that allows to select the source of obfuscation for each entity. This is used to selectively apply different obfuscation methods to specific entities. The keys are the entity names and the values are the obfuscation sources. The allowed values are the following:
- 'file': Takes the fakes from the obfuscatorRefFile
- 'faker': Takes the fakes from the Faker module
- 'both': Takes the fakes from the obfuscatorRefFile and the faker module. If an entity is not specified in this map, the obfuscateRefSource param is used to determine the obfuscation source.
case class TextToDocumentColumns(columns: List[String], suffix: String = "") extends Product with Serializable

Attributes
protected

Value Members

val randomAlgorithm: String
val securerandom: SecureRandom
object DeIdentification extends DefaultParamsReadable[DeIdentification] with Serializable
object DeIdentificationModel extends ReadablePretrainedDeId with ReadsFeatures with Serializable
object DefaultRegex

Attributes
protected
object DocumentHashCoder extends DefaultParamsReadable[DocumentHashCoder] with Serializable
object Language
object LightDeIdentification extends ParamsAndFeaturesReadable[LightDeIdentification] with Serializable
This is the companion object of LightDeIdentification.
This is the companion object of LightDeIdentification. Please refer to that class for the documentation.
object Obfuscator

Attributes
protected
object ObfuscatorAnnotatorApproach extends DefaultParamsReadable[ObfuscatorAnnotatorApproach] with Serializable
object ObfuscatorParams extends DefaultParamsReadable[DeIdentification] with Serializable
object Replacer extends ParamsAndFeaturesReadable[Replacer] with Serializable

Packages

deid

package deid

Type Members

Logic

Example usage

Example

Ner entities

Deidentification

Pipeline

Example

NameChunkObfuscatorApproach

Pipeline

Example

Example

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

deid 

package deid

Type Members

Logic

Example usage

Example

Ner entities

Deidentification

Pipeline

Example

NameChunkObfuscatorApproach

Pipeline

Example

Example

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

deid