Class/Object

com.johnsnowlabs.nlp.annotators.deid

DeIdentification

Related Docs: object DeIdentification | package deid

Permalink

class DeIdentification extends AnnotatorApproach[DeIdentificationModel] with DeIdentificationParams with Licensed

Trains a DeIdentification Annotator which provides functionality to either mask or obfuscate PHI based on Input Annotations of types DOCUMENT, TOKEN and CHUNK.

Ideally this annotator works in conjunction with Demographic Named EntityRecognizers that can be trained either using TextMatchers, RegexMatchers, DateMatchers, NerCRFs or NerDLs

Linear Supertypes
Licensed, DeIdentificationParams, AnnotatorApproach[DeIdentificationModel], CanBeLazy, DefaultParamsWritable, MLWritable, HasOutputAnnotatorType, HasOutputAnnotationCol, HasInputAnnotationCols, Estimator[DeIdentificationModel], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DeIdentification
  2. Licensed
  3. DeIdentificationParams
  4. AnnotatorApproach
  5. CanBeLazy
  6. DefaultParamsWritable
  7. MLWritable
  8. HasOutputAnnotatorType
  9. HasOutputAnnotationCol
  10. HasInputAnnotationCols
  11. Estimator
  12. PipelineStage
  13. Logging
  14. Params
  15. Serializable
  16. Serializable
  17. Identifiable
  18. AnyRef
  19. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DeIdentification()

    Permalink
  2. new DeIdentification(uid: String)

    Permalink

    uid

    a unique identifier for the instanced Annotator

Type Members

  1. type AnnotatorType = String

    Permalink
    Definition Classes
    HasOutputAnnotatorType

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  4. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  5. def _fit(dataset: Dataset[_], recursiveStages: Option[PipelineModel]): DeIdentificationModel

    Permalink
    Attributes
    protected
    Definition Classes
    AnnotatorApproach
  6. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  7. def beforeTraining(spark: SparkSession): Unit

    Permalink
    Definition Classes
    AnnotatorApproach
  8. final def checkSchema(schema: StructType, inputAnnotatorType: String): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  9. final def clear(param: Param[_]): DeIdentification.this.type

    Permalink
    Definition Classes
    Params
  10. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  11. val consistentObfuscation: BooleanParam

    Permalink

    Whether to replace very similar entities in a document with the same randomized term (default: true)

    Whether to replace very similar entities in a document with the same randomized term (default: true)

    Definition Classes
    DeIdentificationParams
  12. final def copy(extra: ParamMap): Estimator[DeIdentificationModel]

    Permalink
    Definition Classes
    AnnotatorApproach → Estimator → PipelineStage → Params
  13. def copyValues[T <: Params](to: T, extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  14. val dateFormats: StringArrayParam

    Permalink

    Format of dates to displace

    Format of dates to displace

    Definition Classes
    DeIdentificationParams
  15. val dateTag: Param[String]

    Permalink

    Tag representing dates in the obfuscate reference file (default: DATE)

    Tag representing dates in the obfuscate reference file (default: DATE)

    Definition Classes
    DeIdentificationParams
  16. val dateToYear: BooleanParam

    Permalink

    true if dates must be converted to years, false otherwise

    true if dates must be converted to years, false otherwise

    Definition Classes
    DeIdentificationParams
  17. val days: IntParam

    Permalink

    Number of days to obfuscate the dates by displacement.

    Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used

    Definition Classes
    DeIdentificationParams
  18. final def defaultCopy[T <: Params](extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  19. val description: String

    Permalink
    Definition Classes
    DeIdentification → AnnotatorApproach
  20. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  21. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  22. def explainParam(param: Param[_]): String

    Permalink
    Definition Classes
    Params
  23. def explainParams(): String

    Permalink
    Definition Classes
    Params
  24. final def extractParamMap(): ParamMap

    Permalink
    Definition Classes
    Params
  25. final def extractParamMap(extra: ParamMap): ParamMap

    Permalink
    Definition Classes
    Params
  26. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  27. final def fit(dataset: Dataset[_]): DeIdentificationModel

    Permalink
    Definition Classes
    AnnotatorApproach → Estimator
  28. def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[DeIdentificationModel]

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  29. def fit(dataset: Dataset[_], paramMap: ParamMap): DeIdentificationModel

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  30. def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DeIdentificationModel

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" ) @varargs()
  31. final def get[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  32. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  33. final def getDefault[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  34. def getInputCols: Array[String]

    Permalink
    Definition Classes
    HasInputAnnotationCols
  35. def getLazyAnnotator: Boolean

    Permalink
    Definition Classes
    CanBeLazy
  36. final def getOrDefault[T](param: Param[T]): T

    Permalink
    Definition Classes
    Params
  37. final def getOutputCol: String

    Permalink
    Definition Classes
    HasOutputAnnotationCol
  38. def getParam(paramName: String): Param[Any]

    Permalink
    Definition Classes
    Params
  39. final def hasDefault[T](param: Param[T]): Boolean

    Permalink
    Definition Classes
    Params
  40. def hasParam(paramName: String): Boolean

    Permalink
    Definition Classes
    Params
  41. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  42. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  43. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  44. val inputAnnotatorTypes: Array[AnnotatorType]

    Permalink
    Definition Classes
    DeIdentification → HasInputAnnotationCols
  45. final val inputCols: StringArrayParam

    Permalink
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  46. final def isDefined(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  47. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  48. final def isSet(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  49. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  50. val lazyAnnotator: BooleanParam

    Permalink
    Definition Classes
    CanBeLazy
  51. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  52. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  53. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  54. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  55. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  56. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  57. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  58. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  59. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  60. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  61. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  62. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  63. val minYear: IntParam

    Permalink

    Minimum year to use when converting date to year

    Minimum year to use when converting date to year

    Definition Classes
    DeIdentificationParams
  64. val mode: Param[String]

    Permalink

    Mode for Anonymizer ['mask'|'obfuscate']

    Mode for Anonymizer ['mask'|'obfuscate']

    Definition Classes
    DeIdentificationParams
  65. def msgHelper(schema: StructType): String

    Permalink
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  66. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  67. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  68. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  69. val obfuscateDate: BooleanParam

    Permalink

    When mode=="obfuscate" whether to obfuscate dates or not.

    When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to true, make sure dateFormats param fits the needs (default: false)

    Definition Classes
    DeIdentificationParams
  70. val obfuscateRefFile: Param[String]

    Permalink

    File with the terms to be used for Obfuscation

  71. val obfuscateRefSource: Param[String]

    Permalink
    Definition Classes
    DeIdentificationParams
  72. def onTrained(model: DeIdentificationModel, spark: SparkSession): Unit

    Permalink
    Definition Classes
    AnnotatorApproach
  73. val outputAnnotatorType: AnnotatorType

    Permalink
    Definition Classes
    DeIdentification → HasOutputAnnotatorType
  74. final val outputCol: Param[String]

    Permalink
    Attributes
    protected
    Definition Classes
    HasOutputAnnotationCol
  75. lazy val params: Array[Param[_]]

    Permalink
    Definition Classes
    Params
  76. val refFileFormat: Param[String]

    Permalink

    Format of the reference file for Obfuscation

  77. val refSep: Param[String]

    Permalink

    Separator character for the csv reference file for Obfuscation

  78. val regexOverride: BooleanParam

    Permalink
    Definition Classes
    DeIdentificationParams
  79. val regexPatternsDictionary: ExternalResourceParam

    Permalink

    dictionary with regular expression patterns that match some protected entity

  80. val sameEntityThreshold: DoubleParam

    Permalink

    Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same (default: 0.9)

    Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same (default: 0.9)

    Definition Classes
    DeIdentificationParams
  81. def save(path: String): Unit

    Permalink
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  82. final def set(paramPair: ParamPair[_]): DeIdentification.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  83. final def set(param: String, value: Any): DeIdentification.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  84. final def set[T](param: Param[T], value: T): DeIdentification.this.type

    Permalink
    Definition Classes
    Params
  85. def setConsistentObfuscation(s: Boolean): DeIdentification.this.type

    Permalink
    Definition Classes
    DeIdentificationParams
  86. def setDateFormats(s: Array[String]): DeIdentification.this.type

    Permalink
    Definition Classes
    DeIdentificationParams
  87. def setDateTag(s: String): DeIdentification.this.type

    Permalink
    Definition Classes
    DeIdentificationParams
  88. def setDateToYear(s: Boolean): DeIdentification.this.type

    Permalink
    Definition Classes
    DeIdentificationParams
  89. def setDays(k: Int): DeIdentification.this.type

    Permalink
    Definition Classes
    DeIdentificationParams
  90. final def setDefault(paramPairs: ParamPair[_]*): DeIdentification.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  91. final def setDefault[T](param: Param[T], value: T): DeIdentification.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  92. final def setInputCols(value: String*): DeIdentification.this.type

    Permalink
    Definition Classes
    HasInputAnnotationCols
  93. final def setInputCols(value: Array[String]): DeIdentification.this.type

    Permalink
    Definition Classes
    HasInputAnnotationCols
  94. def setLazyAnnotator(value: Boolean): DeIdentification.this.type

    Permalink
    Definition Classes
    CanBeLazy
  95. def setMinYear(s: Int): DeIdentification.this.type

    Permalink
    Definition Classes
    DeIdentificationParams
  96. def setMode(m: String): DeIdentification.this.type

    Permalink
    Definition Classes
    DeIdentificationParams
  97. def setObfuscateDate(s: Boolean): DeIdentification.this.type

    Permalink
    Definition Classes
    DeIdentificationParams
  98. def setObfuscateRefFile(f: String): DeIdentification.this.type

    Permalink
  99. def setObfuscateRefSource(s: String): DeIdentification.this.type

    Permalink
    Definition Classes
    DeIdentificationParams
  100. final def setOutputCol(value: String): DeIdentification.this.type

    Permalink
    Definition Classes
    HasOutputAnnotationCol
  101. def setRefFileFormat(f: String): DeIdentification.this.type

    Permalink
  102. def setRefSep(f: String): DeIdentification.this.type

    Permalink
  103. def setRegexOverride(s: Boolean): DeIdentification.this.type

    Permalink
    Definition Classes
    DeIdentificationParams
  104. def setRegexPatternsDictionary(path: String, readAs: Format = ReadAs.TEXT, options: Map[String, String] = Map("delimiter"->" ")): DeIdentification.this.type

    Permalink
  105. def setRegexPatternsDictionary(path: ExternalResource): DeIdentification.this.type

    Permalink
  106. def setSameEntityThreshold(s: Double): DeIdentification.this.type

    Permalink
    Definition Classes
    DeIdentificationParams
  107. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  108. def toString(): String

    Permalink
    Definition Classes
    Identifiable → AnyRef → Any
  109. def train(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): DeIdentificationModel

    Permalink

    Returns the DeIdentificationModel Transformer, that can be used to transform input datasets

    Returns the DeIdentificationModel Transformer, that can be used to transform input datasets

    The dataset provided to the fit method should have one chunk per row and contain the following columns: Document, Tokens, Chunks

    This method is called inside the AnnotatorApproach's fit method

    dataset

    a Dataset containing ChunkTokens, ChunkEmbeddings, ClassifierLabel, ResolverLabel, [ResolverNormalized]

    returns

    a trained ChunkEntityResolverModel

    Definition Classes
    DeIdentification → AnnotatorApproach
  110. def transformRegexPatternsDictionary(regexPatternsDictionary: Array[(String, String)]): Map[String, Array[String]]

    Permalink
  111. final def transformSchema(schema: StructType): StructType

    Permalink
    Definition Classes
    AnnotatorApproach → PipelineStage
  112. def transformSchema(schema: StructType, logging: Boolean): StructType

    Permalink
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  113. val uid: String

    Permalink

    a unique identifier for the instanced Annotator

    a unique identifier for the instanced Annotator

    Definition Classes
    DeIdentification → Identifiable
  114. def validate(schema: StructType): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    AnnotatorApproach
  115. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  116. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  117. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  118. def write: MLWriter

    Permalink
    Definition Classes
    DefaultParamsWritable → MLWritable

Inherited from Licensed

Inherited from DeIdentificationParams

Inherited from AnnotatorApproach[DeIdentificationModel]

Inherited from CanBeLazy

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from HasOutputAnnotatorType

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from Estimator[DeIdentificationModel]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Ungrouped