Packages

package deid

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class DeIdentification extends AnnotatorApproach[DeIdentificationModel] with DeIdentificationParams with Licensed

    Contains all the methods for training a DeIdentificationModel model.

    Contains all the methods for training a DeIdentificationModel model. This module can obfuscate or mask the entities that contains personal information. These can be set with a file of regex patterns with setRegexPatternsDictionary, where each line is a mapping of entity to regex.

    DATE \d{4}
    AID \d{6,7}

    Additionally, obfuscation strings can be defined with setObfuscateRefFile, where each line is a mapping of string to entity. The format and seperator can be speficied with setRefFileFormat and setRefSep.

    Dr. Gregory House#DOCTOR
    01010101#MEDICALRECORD

    The configuration params for that module are in trait DeIdentificationParams.

    See also

    DeIdentificationModel

    DeIdentificationParams

    train Ideally this annotator works in conjunction with Demographic Named EntityRecognizers that can be trained either using TextMatchers, RegexMatchers, DateMatchers, NerCRFs or NerDLs Example of pipeline for deidentification.

    Example

    val documentAssembler = new DocumentAssembler()
        .setInputCol("text")
        .setOutputCol("document")
    
    val sentenceDetector = new SentenceDetector()
        .setInputCols(Array("document"))
        .setOutputCol("sentence")
        .setUseAbbreviations(true)
    
    val tokenizer = new Tokenizer()
        .setInputCols(Array("sentence"))
        .setOutputCol("token")
    
    val embeddings = WordEmbeddingsModel
        .pretrained("embeddings_clinical", "en", "clinical/models")
        .setInputCols(Array("sentence", "token"))
        .setOutputCol("embeddings")

    Ner entities

    val clinical_sensitive_entities = MedicalNerModel.pretrained("ner_deid_enriched", "en", "clinical/models")
           .setInputCols(Array("sentence", "token", "embeddings")).setOutputCol("ner")
    
    val nerConverter = new NerConverter()
        .setInputCols(Array("sentence", "token", "ner"))
        .setOutputCol("ner_con")

    Deidentification

    val deIdentification = new DeIdentification()
        .setInputCols(Array("ner_chunk", "token", "sentence"))
        .setOutputCol("dei")
        // file with custom regex pattern for custom entities
        .setRegexPatternsDictionary("path/to/dic_regex_patterns_main_categories.txt")
        // file with custom obfuscator names for the entities
        .setObfuscateRefFile("path/to/obfuscate_fixed_entities.txt")
        .setRefFileFormat("csv")
        .setRefSep("#")
        .setMode("obfuscate")
        .setDateFormats(Array("MM/dd/yy","yyyy-MM-dd"))
        .setObfuscateDate(true)
        .setDateTag("DATE")
        .setDays(5)
        .setObfuscateRefSource("file")

    Pipeline

    val data = Seq(
      "# 7194334 Date : 01/13/93 PCP : Oliveira , 25 years-old , Record date : 2079-11-09."
    ).toDF("text")
    
    val pipeline = new Pipeline().setStages(Array(
      documentAssembler,
      sentenceDetector,
      tokenizer,
      embeddings,
      clinical_sensitive_entities,
      nerConverter,
      deIdentification
    ))
    val result = pipeline.fit(data).transform(data)
    
    result.select("dei.result").show(truncate = false)

    Show Results

    result.select("dei.result").show(truncate = false)
    +--------------------------------------------------------------------------------------------------+
    |result                                                                                            |
    +--------------------------------------------------------------------------------------------------+
    |[# 01010101 Date : 01/18/93 PCP : Dr. Gregory House , <AGE> years-old , Record date : 2079-11-14.]|
    +--------------------------------------------------------------------------------------------------+
  2. class DeIdentificationModel extends AnnotatorModel[DeIdentificationModel] with DeIdentificationParams with HasSimpleAnnotate[DeIdentificationModel] with Licensed

    Contains all the parameters to transform a dataset with three Input Annotations of types DOCUMENT, TOKEN and CHUNK, into its DeIdentified version of by either masking or obfuscating the given CHUNKS.

    Contains all the parameters to transform a dataset with three Input Annotations of types DOCUMENT, TOKEN and CHUNK, into its DeIdentified version of by either masking or obfuscating the given CHUNKS.

    To create an configured DeIdentificationModel, please see the example of DeIdentification.

    See also

    DeIdentification to train your own model

  3. trait DeIdentificationParams extends Params

    A trait that contains all the params that are common between DeIdentificationModel and DeIdentification annotators

    A trait that contains all the params that are common between DeIdentificationModel and DeIdentification annotators

    See also

    DeIdentification

    DeIdentificationModel

  4. case class MySentnece(content: String, start: Int, end: Int, index: Int, originalIndex: Int) extends Product with Serializable
  5. class ObfuscatorAnnotatorApproach extends AnnotatorApproach[ObfuscatorAnnotatorModel] with ObfuscatorParams
  6. class ObfuscatorAnnotatorModel extends AnnotatorModel[ObfuscatorAnnotatorModel] with ObfuscatorParams with HasSimpleAnnotate[ObfuscatorAnnotatorModel]
  7. trait ObfuscatorParams extends Params
  8. class ReIdentification extends AnnotatorModel[DeIdentificationModel] with HasSimpleAnnotate[DeIdentificationModel] with Licensed

    Reidentifies obfuscated entities by DeIdentification.

    Reidentifies obfuscated entities by DeIdentification. This annotator requires the outputs from the deidentification as input. Input columns need to be the deidentified document and the deidentification mappings set with DeIdentification.setMappingsColumn. To see how the entities are deidentified, please refer to the example of that class.

    Example

    Define the reidentification stage and transform the deidentified documents

    val reideintification = new ReIdentification()
      .setInputCols("dei", "protectedEntities")
      .setOutputCol("reid")
      .transform(result)

    Show results

    result.select("dei.result").show(truncate = false)
    +--------------------------------------------------------------------------------------------------+
    |result                                                                                            |
    +--------------------------------------------------------------------------------------------------+
    |[# 01010101 Date : 01/18/93 PCP : Dr. Gregory House , <AGE> years-old , Record date : 2079-11-14.]|
    +--------------------------------------------------------------------------------------------------+
    
    reideintification.selectExpr("explode(reid.result)").show(false)
    +-----------------------------------------------------------------------------------+
    |col                                                                                |
    +-----------------------------------------------------------------------------------+
    |# 7194334 Date : 01/13/93 PCP : Oliveira , 25 years-old , Record date : 2079-11-09.|
    +-----------------------------------------------------------------------------------+
    See also

    DeIdentification for deidentification of entities

  9. trait ReadablePretrainedDeId extends ParamsAndFeaturesReadable[DeIdentificationModel] with HasPretrained[DeIdentificationModel]
  10. case class SentenceMaxException(message: String = "", cause: Throwable = None.orNull) extends Exception with Product with Serializable
  11. case class StructuredDeid(conllFilePath: String, regexPatternsFilePath: String, obfuscateRefFilePath: String) extends Product with Serializable
  12. case class StructuredDeidentification(columnsMap: Map[String, String], obfuscateRefFile: String = "", obfuscateRefSource: String = "both") extends Product with Serializable
  13. case class TextToDocumentColumns(columns: List[String]) extends Product with Serializable

Value Members

  1. object DeIdentification extends DefaultParamsReadable[DeIdentification] with Serializable
  2. object DeIdentificationModel extends ReadablePretrainedDeId with Serializable
  3. object DefaultRegex
  4. object Obfuscator
  5. object ObfuscatorAnnotatorApproach extends DefaultParamsReadable[ObfuscatorAnnotatorApproach] with Serializable
  6. object ObfuscatorParams extends DefaultParamsReadable[DeIdentification] with Serializable

Ungrouped