CustomPragmaticMethod extends PragmaticMethod with Serializable
Inspired on Kevin Dias, Ruby implementation: https://github.com/diasks2/pragmatic_segmenter This approach extracts sentence bounds by first formatting the data with RuleSymbols and then extracting bounds with a strong RegexBased rule application
- class DefaultPragmaticMethod extends PragmaticMethod with Serializable
- class MixedPragmaticMethod extends PragmaticMethod with Serializable
PragmaticContentFormatter extends AnyRef
rule-based formatter that adds regex rules to different marking steps Symbols protect from ambiguous bounds to be considered splitters
PragmaticMethod extends AnyRef
PragmaticSentenceExtractor extends AnyRef
Reads through symbolized data, and computes the bounds based on regex rules following symbol meaning
RuleSymbols extends AnyRef
Base Symbols that may be extended later on.
SentenceDetector extends AnnotatorModel[SentenceDetector] with HasSimpleAnnotate[SentenceDetector] with SentenceDetectorParams
Annotator that detects sentence boundaries using regular expressions.
- See also
SentenceDetectorDLModel for pretrained models
- object PragmaticContentFormatter
This is a dictionary that contains common english abbreviations that should be considered sentence bounds
PragmaticSymbols extends RuleSymbols
Extends RuleSymbols with specific symbols used for the pragmatic approach.
SentenceDetector extends DefaultParamsReadable[SentenceDetector] with Serializable
This is the companion object of SentenceDetector.