A feature transformer that converts the input array of strings (annotatorType CHUNK) into an
array of chunk-based tokens (annotatorType TOKEN).
When the input is empty, an empty array is returned.
This Annotator is specially convenient when using NGramGenerator annotations as inputs to WordEmbeddingsModels
Annotator which normalizes raw text from clinical documents, e.g.
Annotator which normalizes raw text from clinical documents, e.g. scraped web pages or xml documents, from document type columns into Sentence.
Removes all dirty characters from text following one or more input regex patterns.
Can apply non wanted character removal which a specific policy.
Can apply lower case normalization.
See DocumentNormalizer test class for examples examples of usage.