sparknlp_jsl.deidentification_module#

Module Contents#

Classes#

Deidentifier

class Deidentifier(spark, custom_pipeline=None, fields=None, ner_chunk='ner_chunk', sentence='sentence', token='token', document='document', masking_policy='entity_labels', fixed_mask_length=1, obfuscate_date=True, obfuscate_ref_source='faker', obfuscate_ref_file_path=None, age_group_obfuscation=False, age_ranges=None, shift_days=False, number_of_days=None, documenthashcoder_col_name='documentHash', date_tag='DATE', language='en', region='us', unnormalized_date=False, unnormalized_mode='mask', id_column_name='id', date_shift_column_name='dateshift', multi_mode_file_path=None, domain=None, separator='\t', input_file_path=None, output_file_path='deidentified.csv')#
age_group_obfuscation#
age_ranges#
custom_pipeline#
date_shift_column_name#
date_tag#
document#
documenthashcoder_col_name#
domain#
fields#
fixed_mask_length#
id_column_name#
input_file_path#
language#
masking_policy#
multi_mode_file_path#
ner_chunk#
number_of_days#
obfuscate_date#
obfuscate_ref_file_path#
obfuscate_ref_source#
output_file_path#
region#
sentence#
separator#
shift_days#
spark#
token#
unnormalized_date#
unnormalized_mode#
deid_with_custom_pipeline(pretrained_pipeline=None)#

This function is used to deidentify the given data with custom pipeline.

deid_with_pretrained_pipeline()#

Deidentification with pretrained pipeline

deidentify()#

This function deidentifies the input file according to the given field names and saves the results as a csv/json file.