package matcher
- Alphabetic
- Public
- All
Type Members
- trait ReadablePretrainedTextMatcherInternal extends ParamsAndFeaturesReadable[TextMatcherInternalModel] with HasPretrained[TextMatcherInternalModel]
-
class
TextMatcherInternal extends AnnotatorApproach[TextMatcherInternalModel] with ParamsAndFeaturesWritable
Annotator to match exact phrases (by token) provided in a file against a Document.
Annotator to match exact phrases (by token) provided in a file against a Document.
A text file of predefined phrases must be provided with
setEntities
. The text file can als be set directly as an ExternalResource.For extended examples of usage, see the
Example
In this example, the entities file is of the form
... dolore magna aliqua lorem ipsum dolor. sit laborum ...
where each line represents an entity phrase to be extracted.
import spark.implicits._ import com.johnsnowlabs.nlp.DocumentAssembler import com.johnsnowlabs.nlp.annotator.Tokenizer import com.johnsnowlabs.nlp.annotator.TextMatcherInternal import com.johnsnowlabs.nlp.util.io.ReadAs import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val data = Seq("Hello dolore magna aliqua. Lorem ipsum dolor. sit in laborum").toDF("text") val entityExtractor = new TextMatcherInternal() .setInputCols("document", "token") .setEntities("src/test/resources/entity-extractor/test-phrases.txt", ReadAs.TEXT) .setOutputCol("entity") .setCaseSensitive(false) .setTokenizer(tokenizer.fit(data)) val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, entityExtractor)) val results = pipeline.fit(data).transform(data) results.selectExpr("explode(entity) as result").show(false) +------------------------------------------------------------------------------------------+ |result | +------------------------------------------------------------------------------------------+ |[chunk, 6, 24, dolore magna aliqua, [entity -> entity, sentence -> 0, chunk -> 0], []] | |[chunk, 27, 48, Lorem ipsum dolor. sit, [entity -> entity, sentence -> 0, chunk -> 1], []]| |[chunk, 53, 59, laborum, [entity -> entity, sentence -> 0, chunk -> 2], []] | +------------------------------------------------------------------------------------------+
-
class
TextMatcherInternalModel extends AnnotatorModel[TextMatcherInternalModel] with HasSimpleAnnotate[TextMatcherInternalModel] with CheckLicense
Instantiated model of the TextMatcherInternal.
Instantiated model of the TextMatcherInternal. For usage and examples see the documentation of the main class.
Value Members
-
object
TextMatcherInternal extends DefaultParamsReadable[TextMatcherInternal] with Serializable
This is the companion object of TextMatcherInternal.
This is the companion object of TextMatcherInternal. Please refer to that class for the documentation.
-
object
TextMatcherInternalModel extends ReadablePretrainedTextMatcherInternal with Serializable
This is the companion object of TextMatcherInternalModel.
This is the companion object of TextMatcherInternalModel. Please refer to that class for the documentation.