package matcher
- Alphabetic
- Public
- All
Type Members
- trait ReadablePretrainedTextMatcherInternal extends ParamsAndFeaturesReadable[TextMatcherInternalModel] with HasPretrained[TextMatcherInternalModel]
-
class
SearchTrieInternal extends SearchTrie
Immutable Collection that used for fast substring search Implementation of Aho-Corasick algorithm https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm
-
class
TextMatcherInternal extends AnnotatorApproach[TextMatcherInternalModel] with TextMatcherInternalParams with ParamsAndFeaturesWritable
Annotator to match exact phrases (by token) provided in a file against a Document.
Annotator to match exact phrases (by token) provided in a file against a Document.
A text file of predefined phrases must be provided with
setEntities
. The text file can als be set directly as an ExternalResource.For extended examples of usage, see the
Example
In this example, the entities file is of the form
... dolore magna aliqua lorem ipsum dolor. sit laborum ...
where each line represents an entity phrase to be extracted.
import spark.implicits._ import com.johnsnowlabs.nlp.DocumentAssembler import com.johnsnowlabs.nlp.annotator.Tokenizer import com.johnsnowlabs.nlp.annotator.TextMatcherInternal import com.johnsnowlabs.nlp.util.io.ReadAs import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val data = Seq("Hello dolore magna aliqua. Lorem ipsum dolor. sit in laborum").toDF("text") val entityExtractor = new TextMatcherInternal() .setInputCols("document", "token") .setEntities("src/test/resources/entity-extractor/test-phrases.txt", ReadAs.TEXT) .setOutputCol("entity") .setCaseSensitive(false) .setTokenizer(tokenizer.fit(data)) val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, entityExtractor)) val results = pipeline.fit(data).transform(data) results.selectExpr("explode(entity) as result").show(false) +------------------------------------------------------------------------------------------+ |result | +------------------------------------------------------------------------------------------+ |[chunk, 6, 24, dolore magna aliqua, [entity -> entity, sentence -> 0, chunk -> 0], []] | |[chunk, 27, 48, Lorem ipsum dolor. sit, [entity -> entity, sentence -> 0, chunk -> 1], []]| |[chunk, 53, 59, laborum, [entity -> entity, sentence -> 0, chunk -> 2], []] | +------------------------------------------------------------------------------------------+
-
class
TextMatcherInternalModel extends AnnotatorModel[TextMatcherInternalModel] with HasSimpleAnnotate[TextMatcherInternalModel] with TextMatcherInternalParams with CheckLicense
Instantiated model of the TextMatcherInternal.
Instantiated model of the TextMatcherInternal. For usage and examples see the documentation of the main class.
-
trait
TextMatcherInternalParams extends Params with HasFeatures
Trait containing parameters and helper methods for the TextMatcherInternal and TextMatcherInternalModel components.
Value Members
- object SearchTrieInternal extends Serializable
-
object
TextMatcherInternal extends DefaultParamsReadable[TextMatcherInternal] with Serializable
This is the companion object of TextMatcherInternal.
This is the companion object of TextMatcherInternal. Please refer to that class for the documentation.
-
object
TextMatcherInternalModel extends ReadablePretrainedTextMatcherInternal with Serializable
This is the companion object of TextMatcherInternalModel.
This is the companion object of TextMatcherInternalModel. Please refer to that class for the documentation.