Packages

package nlp

Type Members

  1. class FeaturesAssembler extends Transformer with DefaultParamsWritable with HasOutputAnnotatorType with HasOutputAnnotationCol with HasStorageRef with CheckLicense

    The FeaturesAssembler is used to collect features from different columns.

    The FeaturesAssembler is used to collect features from different columns. It can collect features from single value columns (anything which can be cast to a float, if casts fails then the value is set to 0), array columns or SparkNLP annotations (if the annotation is an embedding, it takes the embedding, otherwise tries to cast the result field). The output of the transformer is a FEATURE_VECTOR annotation (the numeric vector is in the embeddings field).

  2. class Router extends AnnotatorModel[Router] with HasSimpleAnnotate[Router] with CheckLicense

    This class allows to filter any annotation based on the medatata fields.

    This class allows to filter any annotation based on the medatata fields.

    val testData = ResourceHelper.spark.createDataFrame(Seq(
       (1, "\"Jesus live in Leon. Madrid is the capital of Spain")
     )).toDF("id", "text")
    
     val document = new DocumentAssembler()
       .setInputCol("text")
       .setOutputCol("document")
    
     val sentence = new SentenceDetector()
       .setInputCols("document")
       .setOutputCol("sentence")
    
     val regexMatcher = new RegexMatcher()
       .setExternalRules(ExternalResource("src/test/resources/regex-matcher/rules2.txt", ReadAs.TEXT, Map("delimiter" -> ",")))
       .setInputCols(Array("sentence"))
       .setOutputCol("regex")
       .setStrategy("MATCH_ALL")
    
     val chunk2Doc = new Chunk2Doc().setInputCols("regex").setOutputCol("doc_chunk")
    
     val embeddings = BertSentenceEmbeddings.pretrained("sent_small_bert_L2_128")
       .setInputCols("doc_chunk")
       .setOutputCol("bert")
       .setCaseSensitive(false)
       .setMaxSentenceLength(32)
    
     val routerName = new Router()
       .setInputType("sentence_embeddings")
       .setInputCols(Array("bert"))
       .setMetadataField("identifier")
       .setFilterFieldsElements(Array("name"))
       .setOutputCol("names")
     val routerCity = new Router()
       .setInputType("sentence_embeddings")
       .setInputCols(Array("bert"))
       .setMetadataField("identifier")
       .setFilterFieldsElements(Array("city"))
       .setOutputCol("cities")

Ungrouped