


case class StructuredDeidentification(columnsMap: Map[String, String], seedMap: Map[String, Int] = Collections.emptyMap(), obfuscateRefFile: String = "", obfuscateRefSource: String = "both", days: Int = 0, useRandomDateDisplacement: Boolean = false, dateFormats: List[String] = ..., language: String = Language.English, idColumn: String = "") extends Product with Serializable

Utility class that helps to obfuscate tabular data.


It is a map that allows to select the name of the column with the entity. The key of the the map is the column in the dataframe and the value of the map is the entity for that column. The default entities are:

  • |Entity | description |
  • |location| A general location.|
  • |location-other| A location that is not country, street,hospital,city or state|
  • |street| A street|
  • |hospital| The name of a hospital.|
  • |city| A city|
  • |state|A state|
  • |zip| The zip code|
  • |country| A country|
  • |contact| The contact of one person|
  • |username|A username |
  • |phone| A number phone.|
  • |fax| The number fax|
  • |url| A url for internet|
  • |email| The email of one person|
  • |profession| A profession of one person|
  • |name| The name opf one person|
  • |doctor|The name of a doctor|
  • |patient| The name of the patient|
  • |id| A general Id number|
  • |bioid|Is a system to screen for protein interactions as they occur in living cells|
  • |age|The age of something or someone|
  • |organization| Name of one organization o company|
  • |healthplan| The id that identify the healthplan|
  • |medicalrecord| The identification of a medical record|
  • |device|The id that identified a device|
  • |date| A general date|
  • |ssn| A Social Security Number|
  • |ip| A Internet Protocol|
  • |passport| A random passport|
  • |dln| A Driver's License Number |
  • |npi| A National Provider Identifier|
  • |c_card| A credit card number|
  • |iban| A International Bank Account Number|
  • |dea| A Drug Enforcement Administration| If is not present will be masked.

Allow to add a seed to the column that you want to obfuscate. The seed used to randomly select the entities used during obfuscation mode. By providing the same seed, you can replicate the same mapping multiple times.


This is an optional parameter that allows to add your own terms to be used for obfuscation. The file contains as a key the entity and as the value the terms that will be used in the obfuscation.


The source of obfuscation of to obfuscate the entities.For dates entities doesnt apply tha method. The allowed values are the following:

  • 'file': Takes the entities from the obfuscatorRefFile
  • 'faker': Takes the entities from the Faker module
  • 'both': Takes the entities from the obfuscatorRefFile and the faker module randomly.

Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used.


Use a random displacement days in dates entities. If true, use random displacement days in dates entities, otherwise use the days parameter.


Format of dates to displaceFormat of dates to displace.


The language used to select faker entities. The values are the following:

  • 'en'(English),
  • 'de'(German),
  • 'es'(Spanish),
  • 'fr'(French),
  • 'ar'(Arabic)
  • 'ro'(Romanian). Default:'en'

The column that contains the id of the row. If provided, data will obfuscate consistently by idColumn, especially date entities.

Linear Supertypes
Serializable, Serializable, Product, Equals, AnyRef, Any
  1. Alphabetic
  2. By Inheritance
  1. StructuredDeidentification
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
  1. Public
  2. All

Instance Constructors

  1. new StructuredDeidentification(columnsMap: Map[String, String], seedMap: Map[String, Int] = Collections.emptyMap(), obfuscateRefFile: String = "", obfuscateRefSource: String = "both", days: Int = 0, useRandomDateDisplacement: Boolean = false, dateFormats: List[String] = ..., language: String = Language.English, idColumn: String = "")


    It is a map that allows to select the name of the column with the entity. The key of the the map is the column in the dataframe and the value of the map is the entity for that column. The default entities are:

    • |Entity | description |
    • |location| A general location.|
    • |location-other| A location that is not country, street,hospital,city or state|
    • |street| A street|
    • |hospital| The name of a hospital.|
    • |city| A city|
    • |state|A state|
    • |zip| The zip code|
    • |country| A country|
    • |contact| The contact of one person|
    • |username|A username |
    • |phone| A number phone.|
    • |fax| The number fax|
    • |url| A url for internet|
    • |email| The email of one person|
    • |profession| A profession of one person|
    • |name| The name opf one person|
    • |doctor|The name of a doctor|
    • |patient| The name of the patient|
    • |id| A general Id number|
    • |bioid|Is a system to screen for protein interactions as they occur in living cells|
    • |age|The age of something or someone|
    • |organization| Name of one organization o company|
    • |healthplan| The id that identify the healthplan|
    • |medicalrecord| The identification of a medical record|
    • |device|The id that identified a device|
    • |date| A general date|
    • |ssn| A Social Security Number|
    • |ip| A Internet Protocol|
    • |passport| A random passport|
    • |dln| A Driver's License Number |
    • |npi| A National Provider Identifier|
    • |c_card| A credit card number|
    • |iban| A International Bank Account Number|
    • |dea| A Drug Enforcement Administration| If is not present will be masked.

    Allow to add a seed to the column that you want to obfuscate. The seed used to randomly select the entities used during obfuscation mode. By providing the same seed, you can replicate the same mapping multiple times.


    This is an optional parameter that allows to add your own terms to be used for obfuscation. The file contains as a key the entity and as the value the terms that will be used in the obfuscation.


    The source of obfuscation of to obfuscate the entities.For dates entities doesnt apply tha method. The allowed values are the following:

    • 'file': Takes the entities from the obfuscatorRefFile
    • 'faker': Takes the entities from the Faker module
    • 'both': Takes the entities from the obfuscatorRefFile and the faker module randomly.

    Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used.


    Use a random displacement days in dates entities. If true, use random displacement days in dates entities, otherwise use the days parameter.


    Format of dates to displaceFormat of dates to displace.


    The language used to select faker entities. The values are the following:

    • 'en'(English),
    • 'de'(German),
    • 'es'(Spanish),
    • 'fr'(French),
    • 'ar'(Arabic)
    • 'ro'(Romanian). Default:'en'

    The column that contains the id of the row. If provided, data will obfuscate consistently by idColumn, especially date entities.

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
  5. def clone(): AnyRef
    Definition Classes
    @throws( ... ) @native()
  6. val columnsMap: Map[String, String]
  7. val dateFormats: List[String]
  8. val days: Int
  9. final def eq(arg0: AnyRef): Boolean
    Definition Classes
  10. def finalize(): Unit
    Definition Classes
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
  12. val idColumn: String
  13. final def isInstanceOf[T0]: Boolean
    Definition Classes
  14. val language: String
  15. final def ne(arg0: AnyRef): Boolean
    Definition Classes
  16. final def notify(): Unit
    Definition Classes
  17. final def notifyAll(): Unit
    Definition Classes
  18. def obfuscateColumns(dataFrame: DataFrame): DataFrame
  19. val obfuscateRefFile: String
  20. val obfuscateRefSource: String
  21. val seedMap: Map[String, Int]
  22. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
  23. val useRandomDateDisplacement: Boolean
  24. final def wait(): Unit
    Definition Classes
    @throws( ... )
  25. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    @throws( ... )
  26. final def wait(arg0: Long): Unit
    Definition Classes
    @throws( ... ) @native()

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any
