Class/Object

com.johnsnowlabs.nlp.annotators.deid

DeIdentification

Related Docs: object DeIdentification | package deid

Permalink

class DeIdentification extends AnnotatorApproach[DeIdentificationModel] with DeIdentificationParams with Licensed

Trains a DeIdentification Annotator which provides functionality to either mask or obfuscate PHI based on Input Annotations of types DOCUMENT, TOKEN and CHUNK.

Ideally this annotator works in conjunction with Demographic Named EntityRecognizers that can be trained either using TextMatchers, RegexMatchers, DateMatchers, NerCRFs or NerDLs

Linear Supertypes
Licensed, DeIdentificationParams, AnnotatorApproach[DeIdentificationModel], CanBeLazy, DefaultParamsWritable, MLWritable, HasOutputAnnotatorType, HasOutputAnnotationCol, HasInputAnnotationCols, Estimator[DeIdentificationModel], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DeIdentification
  2. Licensed
  3. DeIdentificationParams
  4. AnnotatorApproach
  5. CanBeLazy
  6. DefaultParamsWritable
  7. MLWritable
  8. HasOutputAnnotatorType
  9. HasOutputAnnotationCol
  10. HasInputAnnotationCols
  11. Estimator
  12. PipelineStage
  13. Logging
  14. Params
  15. Serializable
  16. Serializable
  17. Identifiable
  18. AnyRef
  19. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DeIdentification()

    Permalink
  2. new DeIdentification(uid: String)

    Permalink

    uid

    a unique identifier for the instanced Annotator

Type Members

  1. type AnnotatorType = String

    Permalink
    Definition Classes
    HasOutputAnnotatorType

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  4. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  5. def _fit(dataset: Dataset[_], recursiveStages: Option[PipelineModel]): DeIdentificationModel

    Permalink
    Attributes
    protected
    Definition Classes
    AnnotatorApproach
  6. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  7. def beforeTraining(spark: SparkSession): Unit

    Permalink
    Definition Classes
    AnnotatorApproach
  8. final def checkSchema(schema: StructType, inputAnnotatorType: String): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  9. final def clear(param: Param[_]): DeIdentification.this.type

    Permalink
    Definition Classes
    Params
  10. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  11. val consistentObfuscation: BooleanParam

    Permalink

    Whether to replace very similar entities in a document with the same randomized term (default: true)

    Whether to replace very similar entities in a document with the same randomized term (default: true)

    Definition Classes
    DeIdentificationParams
  12. final def copy(extra: ParamMap): Estimator[DeIdentificationModel]

    Permalink
    Definition Classes
    AnnotatorApproach → Estimator → PipelineStage → Params
  13. def copyValues[T <: Params](to: T, extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  14. val dateFormats: StringArrayParam

    Permalink

    Format of dates to displace

    Format of dates to displace

    Definition Classes
    DeIdentificationParams
  15. val dateTag: Param[String]

    Permalink

    Tag representing dates in the obfuscate reference file (default: DATE)

    Tag representing dates in the obfuscate reference file (default: DATE)

    Definition Classes
    DeIdentificationParams
  16. val dateToYear: BooleanParam

    Permalink

    true if dates must be converted to years, false otherwise

    true if dates must be converted to years, false otherwise

    Definition Classes
    DeIdentificationParams
  17. val days: IntParam

    Permalink

    Number of days to obfuscate the dates by displacement.

    Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used

    Definition Classes
    DeIdentificationParams
  18. final def defaultCopy[T <: Params](extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  19. val description: String

    Permalink
    Definition Classes
    DeIdentification → AnnotatorApproach
  20. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  21. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  22. def explainParam(param: Param[_]): String

    Permalink
    Definition Classes
    Params
  23. def explainParams(): String

    Permalink
    Definition Classes
    Params
  24. final def extractParamMap(): ParamMap

    Permalink
    Definition Classes
    Params
  25. final def extractParamMap(extra: ParamMap): ParamMap

    Permalink
    Definition Classes
    Params
  26. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  27. final def fit(dataset: Dataset[_]): DeIdentificationModel

    Permalink
    Definition Classes
    AnnotatorApproach → Estimator
  28. def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[DeIdentificationModel]

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  29. def fit(dataset: Dataset[_], paramMap: ParamMap): DeIdentificationModel

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  30. def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DeIdentificationModel

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" ) @varargs()
  31. final def get[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  32. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  33. final def getDefault[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  34. def getInputCols: Array[String]

    Permalink
    Definition Classes
    HasInputAnnotationCols
  35. def getLazyAnnotator: Boolean

    Permalink
    Definition Classes
    CanBeLazy
  36. final def getOrDefault[T](param: Param[T]): T

    Permalink
    Definition Classes
    Params
  37. final def getOutputCol: String

    Permalink
    Definition Classes
    HasOutputAnnotationCol
  38. def getParam(paramName: String): Param[Any]

    Permalink
    Definition Classes
    Params
  39. final def hasDefault[T](param: Param[T]): Boolean

    Permalink
    Definition Classes
    Params
  40. def hasParam(paramName: String): Boolean

    Permalink
    Definition Classes
    Params
  41. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  42. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  43. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  44. val inputAnnotatorTypes: Array[AnnotatorType]

    Permalink
    Definition Classes
    DeIdentification → HasInputAnnotationCols
  45. final val inputCols: StringArrayParam

    Permalink
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  46. final def isDefined(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  47. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  48. final def isSet(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  49. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  50. val lazyAnnotator: BooleanParam

    Permalink
    Definition Classes
    CanBeLazy
  51. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  52. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  53. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  54. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  55. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  56. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  57. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  58. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  59. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  60. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  61. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  62. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  63. val minYear: IntParam

    Permalink

    Minimum year to use when converting date to year

    Minimum year to use when converting date to year

    Definition Classes
    DeIdentificationParams
  64. val mode: Param[String]

    Permalink

    Mode for Anonymizer ['mask'|'obfuscate']

    Mode for Anonymizer ['mask'|'obfuscate']

    Definition Classes
    DeIdentificationParams
  65. def msgHelper(schema: StructType): String

    Permalink
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  66. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  67. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  68. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  69. val obfuscateDate: BooleanParam

    Permalink

    When mode=="obfuscate" whether to obfuscate dates or not.

    When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to true, make sure dateFormats param fits the needs (default: false)

    Definition Classes
    DeIdentificationParams
  70. val obfuscateRefFile: Param[String]

    Permalink

    File with the terms to be used for Obfuscation

  71. def onTrained(model: DeIdentificationModel, spark: SparkSession): Unit

    Permalink
    Definition Classes
    AnnotatorApproach
  72. val outputAnnotatorType: AnnotatorType

    Permalink
    Definition Classes
    DeIdentification → HasOutputAnnotatorType
  73. final val outputCol: Param[String]

    Permalink
    Attributes
    protected
    Definition Classes
    HasOutputAnnotationCol
  74. lazy val params: Array[Param[_]]

    Permalink
    Definition Classes
    Params
  75. val refFileFormat: Param[String]

    Permalink

    Format of the reference file for Obfuscation

  76. val refSep: Param[String]

    Permalink

    Separator character for the csv reference file for Obfuscation

  77. val regexPatternsDictionary: ExternalResourceParam

    Permalink

    dictionary with regular expression patterns that match some protected entity

  78. val sameEntityThreshold: DoubleParam

    Permalink

    Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same (default: 0.9)

    Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same (default: 0.9)

    Definition Classes
    DeIdentificationParams
  79. def save(path: String): Unit

    Permalink
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  80. final def set(paramPair: ParamPair[_]): DeIdentification.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  81. final def set(param: String, value: Any): DeIdentification.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  82. final def set[T](param: Param[T], value: T): DeIdentification.this.type

    Permalink
    Definition Classes
    Params
  83. def setConsistentObfuscation(s: Boolean): DeIdentification.this.type

    Permalink
    Definition Classes
    DeIdentificationParams
  84. def setDateFormats(s: Array[String]): DeIdentification.this.type

    Permalink
    Definition Classes
    DeIdentificationParams
  85. def setDateTag(s: String): DeIdentification.this.type

    Permalink
    Definition Classes
    DeIdentificationParams
  86. def setDateToYear(s: Boolean): DeIdentification.this.type

    Permalink
    Definition Classes
    DeIdentificationParams
  87. def setDays(k: Int): DeIdentification.this.type

    Permalink
    Definition Classes
    DeIdentificationParams
  88. final def setDefault(paramPairs: ParamPair[_]*): DeIdentification.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  89. final def setDefault[T](param: Param[T], value: T): DeIdentification.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  90. final def setInputCols(value: String*): DeIdentification.this.type

    Permalink
    Definition Classes
    HasInputAnnotationCols
  91. final def setInputCols(value: Array[String]): DeIdentification.this.type

    Permalink
    Definition Classes
    HasInputAnnotationCols
  92. def setLazyAnnotator(value: Boolean): DeIdentification.this.type

    Permalink
    Definition Classes
    CanBeLazy
  93. def setMinYear(s: Int): DeIdentification.this.type

    Permalink
    Definition Classes
    DeIdentificationParams
  94. def setMode(m: String): DeIdentification.this.type

    Permalink
    Definition Classes
    DeIdentificationParams
  95. def setObfuscateDate(s: Boolean): DeIdentification.this.type

    Permalink
    Definition Classes
    DeIdentificationParams
  96. def setObfuscateRefFile(f: String): DeIdentification.this.type

    Permalink
  97. final def setOutputCol(value: String): DeIdentification.this.type

    Permalink
    Definition Classes
    HasOutputAnnotationCol
  98. def setRefFileFormat(f: String): DeIdentification.this.type

    Permalink
  99. def setRefSep(f: String): DeIdentification.this.type

    Permalink
  100. def setRegexPatternsDictionary(path: String, readAs: Format = ReadAs.TEXT, options: Map[String, String] = Map("delimiter"->" ")): DeIdentification.this.type

    Permalink
  101. def setRegexPatternsDictionary(path: ExternalResource): DeIdentification.this.type

    Permalink
  102. def setSameEntityThreshold(s: Double): DeIdentification.this.type

    Permalink
    Definition Classes
    DeIdentificationParams
  103. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  104. def toString(): String

    Permalink
    Definition Classes
    Identifiable → AnyRef → Any
  105. def train(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): DeIdentificationModel

    Permalink

    Returns the DeIdentificationModel Transformer, that can be used to transform input datasets

    Returns the DeIdentificationModel Transformer, that can be used to transform input datasets

    The dataset provided to the fit method should have one chunk per row and contain the following columns: Document, Tokens, Chunks

    This method is called inside the AnnotatorApproach's fit method

    dataset

    a Dataset containing ChunkTokens, ChunkEmbeddings, ClassifierLabel, ResolverLabel, [ResolverNormalized]

    returns

    a trained ChunkEntityResolverModel

    Definition Classes
    DeIdentification → AnnotatorApproach
  106. def transformRegexPatternsDictionary(regexPatternsDictionary: Array[(String, String)]): Map[String, Array[String]]

    Permalink
  107. final def transformSchema(schema: StructType): StructType

    Permalink
    Definition Classes
    AnnotatorApproach → PipelineStage
  108. def transformSchema(schema: StructType, logging: Boolean): StructType

    Permalink
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  109. val uid: String

    Permalink

    a unique identifier for the instanced Annotator

    a unique identifier for the instanced Annotator

    Definition Classes
    DeIdentification → Identifiable
  110. def validate(schema: StructType): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    AnnotatorApproach
  111. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  112. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  113. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  114. def write: MLWriter

    Permalink
    Definition Classes
    DefaultParamsWritable → MLWritable

Inherited from Licensed

Inherited from DeIdentificationParams

Inherited from AnnotatorApproach[DeIdentificationModel]

Inherited from CanBeLazy

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from HasOutputAnnotatorType

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from Estimator[DeIdentificationModel]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Ungrouped