package regex

  1. class RegexMatcherInternal extends AnnotatorApproach[RegexMatcherInternalModel] with MergeCommonParams with CheckLicense

    Uses rules to match a set of regular expressions and associate them with a provided entity.

    A rule consists of a regex pattern and an entity, delimited by a character of choice. An example could be \d{4}\/\d\d\/\d\d,date which will match strings like "1970/01/01" to the entity "date".

    Rules must be provided by either setRules (followed by setDelimiter) or an external file.

    To use an external file, a dictionary of predefined regular expressions must be provided with setExternalRules. The dictionary can be set in either in the form of a delimited text file or directly as an ExternalResource.


    In this example, the rules.txt has the form of

    the\s\w+, followed by 'the'
    ceremonies, ceremony

    where each regex is separated by the entity by ","

    import ResourceHelper.spark.implicits._
    import com.johnsnowlabs.nlp.base.DocumentAssembler
    import com.johnsnowlabs.nlp.annotator.SentenceDetector
    import com.johnsnowlabs.nlp.annotators.regex.RegexMatcherInternal
    val documentAssembler = new DocumentAssembler().setInputCol("text").setOutputCol("document")
    val sentence = new SentenceDetector().setInputCols("document").setOutputCol("sentence")
    val regexMatcher = new RegexMatcherInternal()
      .setExternalRules("src/test/resources/regex-matcher/rules.txt",  ",")
    val pipeline = new Pipeline().setStages(Array(documentAssembler, sentence, regexMatcher))
    val data = Seq(
      "My first sentence with the first rule. This is my second sentence with ceremonies rule."
    val results =
    results.selectExpr("explode(regex) as result").show(false)
    |result                                                                                      |
    |[chunk, 23, 31, the first, [entity -> followed by 'the', sentence -> 0, chunk -> 0], []]|
    |[chunk, 71, 80, ceremonies, [entity -> ceremony, sentence -> 1, chunk -> 0], []]        |
  2. class RegexMatcherInternalModel extends AnnotatorModel[RegexMatcherInternalModel] with HasSimpleAnnotate[RegexMatcherInternalModel] with MergeCommonParams with CheckLicense

    Instantiated model of the RegexMatcherInternal. For usage and examples see the documentation of the main class.

Value Members

  1. object RegexMatcherInternal extends DefaultParamsReadable[RegexMatcherInternal] with Serializable

    This is the companion object of RegexMatcherInternal. Please refer to that class for the documentation.

  2. object RegexMatcherInternalModel extends ParamsAndFeaturesReadable[RegexMatcherInternalModel] with Serializable