matcher

package matcher

Ordering

Alphabetic

Visibility

Public
All

Type Members

trait ReadablePretrainedTextMatcherInternal extends ParamsAndFeaturesReadable[TextMatcherInternalModel] with HasPretrained[TextMatcherInternalModel]
class SearchTrieInternal extends SearchTrie
Immutable Collection that used for fast substring search Implementation of Aho-Corasick algorithm https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm

class TextMatcherInternal extends AnnotatorApproach[TextMatcherInternalModel] with TextMatcherInternalParams with ParamsAndFeaturesWritable

Annotator to match exact phrases (by token) provided in a file against a Document.

A text file of predefined phrases must be provided with setEntities. The text file can als be set directly as an ExternalResource.

For extended examples of usage, see the

Example

In this example, the entities file is of the form

...
dolore magna aliqua
lorem ipsum dolor. sit
laborum
...

where each line represents an entity phrase to be extracted.

import spark.implicits._
import com.johnsnowlabs.nlp.DocumentAssembler
import com.johnsnowlabs.nlp.annotator.Tokenizer
import com.johnsnowlabs.nlp.annotator.TextMatcherInternal
import com.johnsnowlabs.nlp.util.io.ReadAs
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val data = Seq("Hello dolore magna aliqua. Lorem ipsum dolor. sit in laborum").toDF("text")
val entityExtractor = new TextMatcherInternal()
  .setInputCols("document", "token")
  .setEntities("src/test/resources/entity-extractor/test-phrases.txt", ReadAs.TEXT)
  .setOutputCol("entity")
  .setCaseSensitive(false)
  .setTokenizer(tokenizer.fit(data))

val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, entityExtractor))
val results = pipeline.fit(data).transform(data)

results.selectExpr("explode(entity) as result").show(false)
+------------------------------------------------------------------------------------------+
|result                                                                                    |
+------------------------------------------------------------------------------------------+
|[chunk, 6, 24, dolore magna aliqua, [entity -> entity, sentence -> 0, chunk -> 0], []]    |
|[chunk, 27, 48, Lorem ipsum dolor. sit, [entity -> entity, sentence -> 0, chunk -> 1], []]|
|[chunk, 53, 59, laborum, [entity -> entity, sentence -> 0, chunk -> 2], []]               |
+------------------------------------------------------------------------------------------+

class TextMatcherInternalModel extends AnnotatorModel[TextMatcherInternalModel] with HasSimpleAnnotate[TextMatcherInternalModel] with TextMatcherInternalParams with CheckLicense
Instantiated model of the TextMatcherInternal.
Instantiated model of the TextMatcherInternal. For usage and examples see the documentation of the main class.
trait TextMatcherInternalParams extends Params with HasFeatures
Trait containing parameters and helper methods for the TextMatcherInternal and TextMatcherInternalModel components.

Value Members

object SearchTrieInternal extends Serializable
object TextMatcherInternal extends DefaultParamsReadable[TextMatcherInternal] with Serializable
This is the companion object of TextMatcherInternal.
This is the companion object of TextMatcherInternal. Please refer to that class for the documentation.
object TextMatcherInternalModel extends ReadablePretrainedTextMatcherInternal with Serializable
This is the companion object of TextMatcherInternalModel.
This is the companion object of TextMatcherInternalModel. Please refer to that class for the documentation.

Packages

matcher

package matcher

Type Members

Example

Value Members

Ungrouped

Packages

matcher 

package matcher

Type Members

Example

Value Members

Ungrouped

matcher