Packages

package matcher

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. trait ReadablePretrainedTextMatcherInternal extends ParamsAndFeaturesReadable[TextMatcherInternalModel] with HasPretrained[TextMatcherInternalModel]
  2. class TextMatcherInternal extends AnnotatorApproach[TextMatcherInternalModel] with ParamsAndFeaturesWritable

    Annotator to match exact phrases (by token) provided in a file against a Document.

    Annotator to match exact phrases (by token) provided in a file against a Document.

    A text file of predefined phrases must be provided with setEntities. The text file can als be set directly as an ExternalResource.

    For extended examples of usage, see the

    Example

    In this example, the entities file is of the form

    ...
    dolore magna aliqua
    lorem ipsum dolor. sit
    laborum
    ...

    where each line represents an entity phrase to be extracted.

    import spark.implicits._
    import com.johnsnowlabs.nlp.DocumentAssembler
    import com.johnsnowlabs.nlp.annotator.Tokenizer
    import com.johnsnowlabs.nlp.annotator.TextMatcherInternal
    import com.johnsnowlabs.nlp.util.io.ReadAs
    import org.apache.spark.ml.Pipeline
    
    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val tokenizer = new Tokenizer()
      .setInputCols("document")
      .setOutputCol("token")
    
    val data = Seq("Hello dolore magna aliqua. Lorem ipsum dolor. sit in laborum").toDF("text")
    val entityExtractor = new TextMatcherInternal()
      .setInputCols("document", "token")
      .setEntities("src/test/resources/entity-extractor/test-phrases.txt", ReadAs.TEXT)
      .setOutputCol("entity")
      .setCaseSensitive(false)
      .setTokenizer(tokenizer.fit(data))
    
    val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, entityExtractor))
    val results = pipeline.fit(data).transform(data)
    
    results.selectExpr("explode(entity) as result").show(false)
    +------------------------------------------------------------------------------------------+
    |result                                                                                    |
    +------------------------------------------------------------------------------------------+
    |[chunk, 6, 24, dolore magna aliqua, [entity -> entity, sentence -> 0, chunk -> 0], []]    |
    |[chunk, 27, 48, Lorem ipsum dolor. sit, [entity -> entity, sentence -> 0, chunk -> 1], []]|
    |[chunk, 53, 59, laborum, [entity -> entity, sentence -> 0, chunk -> 2], []]               |
    +------------------------------------------------------------------------------------------+
  3. class TextMatcherInternalModel extends AnnotatorModel[TextMatcherInternalModel] with HasSimpleAnnotate[TextMatcherInternalModel] with CheckLicense

    Instantiated model of the TextMatcherInternal.

    Instantiated model of the TextMatcherInternal. For usage and examples see the documentation of the main class.

Value Members

  1. object TextMatcherInternal extends DefaultParamsReadable[TextMatcherInternal] with Serializable

    This is the companion object of TextMatcherInternal.

    This is the companion object of TextMatcherInternal. Please refer to that class for the documentation.

  2. object TextMatcherInternalModel extends ReadablePretrainedTextMatcherInternal with Serializable

    This is the companion object of TextMatcherInternalModel.

    This is the companion object of TextMatcherInternalModel. Please refer to that class for the documentation.

Ungrouped