regex

package regex

Ordering

Alphabetic

Visibility

Public
All

Type Members

trait ReadablePretrainedRegexMatcherInternal extends ParamsAndFeaturesReadable[RegexMatcherInternalModel] with HasPretrained[RegexMatcherInternalModel]

class RegexMatcherInternal extends AnnotatorApproach[RegexMatcherInternalModel] with MergeCommonParams with CheckLicense

Uses rules to match a set of regular expressions and associate them with a provided entity.

A rule consists of a regex pattern and an entity, delimited by a character of choice. An example could be \d{4}\/\d\d\/\d\d,date which will match strings like "1970/01/01" to the entity "date".

Rules must be provided by either setRules (followed by setDelimiter) or an external file.

To use an external file, a dictionary of predefined regular expressions must be provided with setExternalRules. The dictionary can be set in either in the form of a delimited text file or directly as an ExternalResource.

Example

In this example, the rules.txt has the form of

the\s\w+, followed by 'the'
ceremonies, ceremony

where each regex is separated by the entity by ","

import ResourceHelper.spark.implicits._
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotator.SentenceDetector
import com.johnsnowlabs.nlp.annotators.regex.RegexMatcherInternal
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler().setInputCol("text").setOutputCol("document")

val sentence = new SentenceDetector().setInputCols("document").setOutputCol("sentence")

val regexMatcher = new RegexMatcherInternal()
  .setExternalRules("src/test/resources/regex-matcher/rules.txt",  ",")
  .setInputCols(Array("sentence"))
  .setOutputCol("regex")
  .setStrategy("MATCH_ALL")

val pipeline = new Pipeline().setStages(Array(documentAssembler, sentence, regexMatcher))

val data = Seq(
  "My first sentence with the first rule. This is my second sentence with ceremonies rule."
).toDF("text")
val results = pipeline.fit(data).transform(data)

results.selectExpr("explode(regex) as result").show(false)
+--------------------------------------------------------------------------------------------+
|result                                                                                      |
+--------------------------------------------------------------------------------------------+
|[chunk, 23, 31, the first, [entity -> followed by 'the', sentence -> 0, chunk -> 0], []]|
|[chunk, 71, 80, ceremonies, [entity -> ceremony, sentence -> 1, chunk -> 0], []]        |
+--------------------------------------------------------------------------------------------+

class RegexMatcherInternalModel extends AnnotatorModel[RegexMatcherInternalModel] with HasSimpleAnnotate[RegexMatcherInternalModel] with MergeCommonParams with CheckLicense
Instantiated model of the RegexMatcherInternal.
Instantiated model of the RegexMatcherInternal. For usage and examples see the documentation of the main class.

Value Members

object RegexMatcherInternal extends DefaultParamsReadable[RegexMatcherInternal] with Serializable
This is the companion object of RegexMatcherInternal.
This is the companion object of RegexMatcherInternal. Please refer to that class for the documentation.
object RegexMatcherInternalModel extends ReadablePretrainedRegexMatcherInternal with Serializable

Packages

regex

package regex

Type Members

Example

Value Members

Ungrouped

Packages

regex 

package regex

Type Members

Example

Value Members

Ungrouped

regex