com.johnsnowlabs.nlp.annotators.deid
dictionary, which contains all terms for using later in anonimization function
This method represents the pipeline method which calls each method one by one It utilizes the main point of interest which is getAnonymizeSentence() and calls it for each sentence
This method represents the pipeline method which calls each method one by one It utilizes the main point of interest which is getAnonymizeSentence() and calls it for each sentence
a Sequence of Annotations to anonymize
a Sequence of Anonimized Annotations
Whether to replace very similar entities in a document with the same randomized term (default: true)
Whether to replace very similar entities in a document with the same randomized term (default: true)
The method that takes anonymized sentence to create proper Annotation
The method that takes anonymized sentence to create proper Annotation
a sentence, which is anonymized
a index of the sentence
a proper Annotation instance
Format of dates to displace
Format of dates to displace
Tag representing dates in the obfuscate reference file (default: DATE)
Tag representing dates in the obfuscate reference file (default: DATE)
true if dates must be converted to years, false otherwise
true if dates must be converted to years, false otherwise
Number of days to obfuscate the dates by displacement.
Number of days to obfuscate the dates by displacement. If not provided a random integer between 1 and 60 will be used
Main point of interest.
Main point of interest. This method projects the sentence into the anonymized form This method is called for each sentence in the input collection of Annotations
a sentence, which we want to anonymize
a sequence of Entities which we want to anonymize
a String which represents the value with which we replace dates
a Int which represents how many days back we look at
a flag whether we use displaceDate() or apply dateToYear() method
a minimum date from which all obfuscated date start, default is 1929
a String, which represents an anonymized sentence
Returns the NER Annotations for each Annotation instance in the input Sequence
Returns the NER Annotations for each Annotation instance in the input Sequence
a Sequence of Annotation instances
a Sequence of Sequence[IndexedToken], each Sequence represents tokens from each input Annotation
Returns the Regex Annotations for each IndexedToken in the input Sequence
Returns the Regex Annotations for each IndexedToken in the input Sequence
a Sequence of IndexedToken instances
a Sequence of Annotation, each Annotation represents Regex Entity
Returns the content of each sentence inside the input sequence
Returns the content of each sentence inside the input sequence
a Sequence of Annotation instances, to return content from
a Sequence of String, each string represents the content of the Annotation
Returns the tokens for each Annotation instance in the input Sequence
Returns the tokens for each Annotation instance in the input Sequence
a Sequence of Annotation instances
a Sequence of Sequence[IndexedToken], each Sequence represents tokens from each input Annotation
Returns a complement of A entities against B entities
Returns a complement of A entities against B entities
a sequence of Entities to combine
an sequence of Entities to combine
a Sequence of Annotation, which is difference between NER and RegEx
Returns Boolean flag, which says if the token matches at least one pattern from array
Returns Boolean flag, which says if the token matches at least one pattern from array
a token of interest to check for the match
an Array of String to check against the token
a Boolean flag, representing if the token matches at least pattern one of regexPatterns
Returns a combined Sequence of Annotations, cleaned from duplicates
Returns a combined Sequence of Annotations, cleaned from duplicates
a sequence of NER Entities to combine
an sequence of Regex Entities to combine
a Sequence of Annotation, which is result of a merge without duplicates
Minimum year to use when converting date to year
Minimum year to use when converting date to year
Mode for Anonymizer ['mask'|'obfuscate']
Mode for Anonymizer ['mask'|'obfuscate']
When mode=="obfuscate" whether to obfuscate dates or not.
When mode=="obfuscate" whether to obfuscate dates or not. This param helps in consistency to make dateFormats more visible. When setting to true
, make sure dateFormats param fits the needs (default: false)
dictionary with regular expression patterns that match some protected entity
This is simple RegEx replace which removes some punctuation tokens from input
This is simple RegEx replace which removes some punctuation tokens from input
a String, inside which we want to replace flavors
a String, which represents a cleaned version
Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same
(default: 0.9)
Similarity threshold [0.0-1.0] to consider two appearances of an entity as the same
(default: 0.9)
a unique identifier for the instanced AnnotatorModel
a unique identifier for the instanced AnnotatorModel
Contains all the parameters to transform a dataset with three Input Annotations of types DOCUMENT, TOKEN and CHUNK, into its DeIdentified version of by either masking or obfuscating the given CHUNKS