sparknlp_jsl.deidentification_module
#
Module Contents#
Classes#
- class Deidentifier(spark, custom_pipeline=None, fields=None, ner_chunk='ner_chunk', sentence='sentence', token='token', document='document', masking_policy='entity_labels', fixed_mask_length=1, obfuscate_date=True, obfuscate_ref_source='faker', obfuscate_ref_file_path=None, age_group_obfuscation=False, age_ranges=None, shift_days=False, number_of_days=None, documenthashcoder_col_name='documentHash', date_tag='DATE', language='en', region='us', unnormalized_date=False, unnormalized_mode='mask', id_column_name='id', date_shift_column_name='dateshift', multi_mode_file_path=None, domain=None, separator='\t', input_file_path=None, output_file_path='deidentified.csv')#
- age_group_obfuscation#
- age_ranges#
- custom_pipeline#
- date_shift_column_name#
- date_tag#
- document#
- documenthashcoder_col_name#
- domain#
- fields#
- fixed_mask_length#
- id_column_name#
- input_file_path#
- language#
- masking_policy#
- multi_mode_file_path#
- ner_chunk#
- number_of_days#
- obfuscate_date#
- obfuscate_ref_file_path#
- obfuscate_ref_source#
- output_file_path#
- region#
- sentence#
- separator#
- shift_days#
- spark#
- token#
- unnormalized_date#
- unnormalized_mode#
- deid_with_custom_pipeline(pretrained_pipeline=None)#
This function is used to deidentify the given data with custom pipeline.
- deid_with_pretrained_pipeline()#
Deidentification with pretrained pipeline
- deidentify()#
This function deidentifies the input file according to the given field names and saves the results as a csv/json file.