package regex
- Alphabetic
- Public
- All
Type Members
- trait ReadablePretrainedRegexMatcherInternal extends ParamsAndFeaturesReadable[RegexMatcherInternalModel] with HasPretrained[RegexMatcherInternalModel]
-
class
RegexMatcherInternal extends AnnotatorApproach[RegexMatcherInternalModel] with MergeCommonParams with CheckLicense
Uses rules to match a set of regular expressions and associate them with a provided entity.
Uses rules to match a set of regular expressions and associate them with a provided entity.
A rule consists of a regex pattern and an entity, delimited by a character of choice. An example could be
\d{4}\/\d\d\/\d\d,date
which will match strings like"1970/01/01"
to the entity"date"
.Rules must be provided by either
setRules
(followed bysetDelimiter
) or an external file.To use an external file, a dictionary of predefined regular expressions must be provided with
setExternalRules
. The dictionary can be set in either in the form of a delimited text file or directly as an ExternalResource.Example
In this example, the
rules.txt
has the form ofthe\s\w+, followed by 'the' ceremonies, ceremony
where each regex is separated by the entity by
","
import ResourceHelper.spark.implicits._ import com.johnsnowlabs.nlp.base.DocumentAssembler import com.johnsnowlabs.nlp.annotator.SentenceDetector import com.johnsnowlabs.nlp.annotators.regex.RegexMatcherInternal import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler().setInputCol("text").setOutputCol("document") val sentence = new SentenceDetector().setInputCols("document").setOutputCol("sentence") val regexMatcher = new RegexMatcherInternal() .setExternalRules("src/test/resources/regex-matcher/rules.txt", ",") .setInputCols(Array("sentence")) .setOutputCol("regex") .setStrategy("MATCH_ALL") val pipeline = new Pipeline().setStages(Array(documentAssembler, sentence, regexMatcher)) val data = Seq( "My first sentence with the first rule. This is my second sentence with ceremonies rule." ).toDF("text") val results = pipeline.fit(data).transform(data) results.selectExpr("explode(regex) as result").show(false) +--------------------------------------------------------------------------------------------+ |result | +--------------------------------------------------------------------------------------------+ |[chunk, 23, 31, the first, [entity -> followed by 'the', sentence -> 0, chunk -> 0], []]| |[chunk, 71, 80, ceremonies, [entity -> ceremony, sentence -> 1, chunk -> 0], []] | +--------------------------------------------------------------------------------------------+
-
class
RegexMatcherInternalModel extends AnnotatorModel[RegexMatcherInternalModel] with HasSimpleAnnotate[RegexMatcherInternalModel] with MergeCommonParams with CheckLicense
Instantiated model of the RegexMatcherInternal.
Instantiated model of the RegexMatcherInternal. For usage and examples see the documentation of the main class.
Value Members
-
object
RegexMatcherInternal extends DefaultParamsReadable[RegexMatcherInternal] with Serializable
This is the companion object of RegexMatcherInternal.
This is the companion object of RegexMatcherInternal. Please refer to that class for the documentation.
- object RegexMatcherInternalModel extends ReadablePretrainedRegexMatcherInternal with Serializable