com.johnsnowlabs.nlp.annotators.deid
Whether to replace very similar entities in a document with the same randomized term (default: true)
Whether to replace very similar entities in a document with the same randomized term (default: true)
Format of dates to displace
Format of dates to displace
Tag representing dates in the obfuscate reference file (default: DATE)
Tag representing dates in the obfuscate reference file (default: DATE)
true if dates must be converted to years, false otherwise
true if dates must be converted to years, false otherwise
Number of days to obfuscate the dates by displacement.
Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used
Minimum year to use when converting date to year
Minimum year to use when converting date to year
Mode for Anonymizer ['mask'|'obfuscate']
Mode for Anonymizer ['mask'|'obfuscate']
When mode=="obfuscate" whether to obfuscate dates or not.
When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to true
, make sure dateFormats param fits the needs (default: false)
File with the terms to be used for Obfuscation
Format of the reference file for Obfuscation
Separator character for the csv reference file for Obfuscation
dictionary with regular expression patterns that match some protected entity
Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same
(default: 0.9)
Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same
(default: 0.9)
Returns the DeIdentificationModel Transformer, that can be used to transform input datasets
Returns the DeIdentificationModel Transformer, that can be used to transform input datasets
The dataset provided to the fit method should have one chunk per row and contain the following columns: Document, Tokens, Chunks
This method is called inside the AnnotatorApproach's fit method
a Dataset containing ChunkTokens, ChunkEmbeddings, ClassifierLabel, ResolverLabel, [ResolverNormalized]
a trained ChunkEntityResolverModel
a unique identifier for the instanced Annotator
a unique identifier for the instanced Annotator
Trains a DeIdentification Annotator which provides functionality to either
mask
orobfuscate
PHI based on Input Annotations of types DOCUMENT, TOKEN and CHUNK.Ideally this annotator works in conjunction with Demographic Named EntityRecognizers that can be trained either using TextMatchers, RegexMatchers, DateMatchers, NerCRFs or NerDLs