Packages

package regex

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class RegexMatcherInternal extends AnnotatorApproach[RegexMatcherInternalModel] with MergeCommonParams with CheckLicense

    Uses rules to match a set of regular expressions and associate them with a provided entity.

    Uses rules to match a set of regular expressions and associate them with a provided entity.

    A rule consists of a regex pattern and an entity, delimited by a character of choice. An example could be \d{4}\/\d\d\/\d\d,date which will match strings like "1970/01/01" to the entity "date".

    Rules must be provided by either setRules (followed by setDelimiter) or an external file.

    To use an external file, a dictionary of predefined regular expressions must be provided with setExternalRules. The dictionary can be set in either in the form of a delimited text file or directly as an ExternalResource.

    Example

    In this example, the rules.txt has the form of

    the\s\w+, followed by 'the'
    ceremonies, ceremony

    where each regex is separated by the entity by ","

    import ResourceHelper.spark.implicits._
    import com.johnsnowlabs.nlp.base.DocumentAssembler
    import com.johnsnowlabs.nlp.annotator.SentenceDetector
    import com.johnsnowlabs.nlp.annotators.regex.RegexMatcherInternal
    import org.apache.spark.ml.Pipeline
    
    val documentAssembler = new DocumentAssembler().setInputCol("text").setOutputCol("document")
    
    val sentence = new SentenceDetector().setInputCols("document").setOutputCol("sentence")
    
    val regexMatcher = new RegexMatcherInternal()
      .setExternalRules("src/test/resources/regex-matcher/rules.txt",  ",")
      .setInputCols(Array("sentence"))
      .setOutputCol("regex")
      .setStrategy("MATCH_ALL")
    
    val pipeline = new Pipeline().setStages(Array(documentAssembler, sentence, regexMatcher))
    
    val data = Seq(
      "My first sentence with the first rule. This is my second sentence with ceremonies rule."
    ).toDF("text")
    val results = pipeline.fit(data).transform(data)
    
    results.selectExpr("explode(regex) as result").show(false)
    +--------------------------------------------------------------------------------------------+
    |result                                                                                      |
    +--------------------------------------------------------------------------------------------+
    |[chunk, 23, 31, the first, [entity -> followed by 'the', sentence -> 0, chunk -> 0], []]|
    |[chunk, 71, 80, ceremonies, [entity -> ceremony, sentence -> 1, chunk -> 0], []]        |
    +--------------------------------------------------------------------------------------------+
  2. class RegexMatcherInternalModel extends AnnotatorModel[RegexMatcherInternalModel] with HasSimpleAnnotate[RegexMatcherInternalModel] with MergeCommonParams with CheckLicense

    Instantiated model of the RegexMatcherInternal.

    Instantiated model of the RegexMatcherInternal. For usage and examples see the documentation of the main class.

Value Members

  1. object RegexMatcherInternal extends DefaultParamsReadable[RegexMatcherInternal] with Serializable

    This is the companion object of RegexMatcherInternal.

    This is the companion object of RegexMatcherInternal. Please refer to that class for the documentation.

  2. object RegexMatcherInternalModel extends ParamsAndFeaturesReadable[RegexMatcherInternalModel] with Serializable

Ungrouped