package annotator
- Alphabetic
- Public
- All
Type Members
-
class
AnnotationMerger extends AnnotatorModel[AnnotationMerger] with CheckLicense with HasMultipleInputAnnotationCols with HasSimpleAnnotate[AnnotationMerger]
Merges Annotations from multiple columns.
Merges Annotations from multiple columns.
Example
val empty_data = Seq([[""]]).toDF("text") val document1 = DocumentAssembler().setInputCol("text").setOutputCol("document1") val document2 = DocumentAssembler().setInputCol("text").setOutputCol("document2") val annotation_merger = AnnotationMerger() .setInputCols("document1", "document2") .setInputType("document") .setOutputCol("all_docs") val pipelineModel = new Pipeline().setStages(Array( document1, document2, annotation_merger)).fit(empty_data) val lp = LightPipeline(pipelineModel) lp.fullAnnotate("one doc to be replicated") [{'document1': [Annotation(document, 0, 23, one doc to be replicated, {})], 'document2': [Annotation(document, 0, 23, one doc to be replicated, {})], 'all_docs': [Annotation(document, 0, 23, one doc to be replicated, {}), Annotation(document, 0, 23, one doc to be replicated, {})]}]
- class Doc2ChunkInternal extends Model[Doc2ChunkInternal] with RawAnnotator[Doc2ChunkInternal]
-
class
FeaturesAssembler extends AnnotatorModel[FeaturesAssembler] with HasStorageRef with HasSimpleAnnotate[FeaturesAssembler] with CheckLicense
The FeaturesAssembler is used to collect features from different columns.
The FeaturesAssembler is used to collect features from different columns. It can collect features from single value columns (anything which can be cast to a float, if casts fails then the value is set to 0), array columns or SparkNLP annotations (if the annotation is an embedding, it takes the embedding, otherwise tries to cast the
result
field). The output of the transformer is aFEATURE_VECTOR
annotation (the numeric vector is in theembeddings
field). -
class
Router extends AnnotatorModel[Router] with HasSimpleAnnotate[Router] with ParamsAndFeaturesWritable with CheckLicense
This class allows to filter any annotation based on the medatata fields.
This class allows to filter any annotation based on the medatata fields.
val testData = ResourceHelper.spark.createDataFrame(Seq( (1, "\"Jesus live in Leon. Madrid is the capital of Spain") )).toDF("id", "text") val document = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val sentence = new SentenceDetector() .setInputCols("document") .setOutputCol("sentence") val regexMatcher = new RegexMatcher() .setExternalRules(ExternalResource("src/test/resources/regex-matcher/rules2.txt", ReadAs.TEXT, Map("delimiter" -> ","))) .setInputCols(Array("sentence")) .setOutputCol("regex") .setStrategy("MATCH_ALL") val chunk2Doc = new Chunk2Doc().setInputCols("regex").setOutputCol("doc_chunk") val embeddings = BertSentenceEmbeddings.pretrained("sent_small_bert_L2_128") .setInputCols("doc_chunk") .setOutputCol("bert") .setCaseSensitive(false) .setMaxSentenceLength(32) val routerName = new Router() .setInputType("sentence_embeddings") .setInputCols(Array("bert")) .setMetadataField("identifier") .setFilterFieldsElements(Array("name")) .setOutputCol("names") val routerCity = new Router() .setInputType("sentence_embeddings") .setInputCols(Array("bert")) .setMetadataField("identifier") .setFilterFieldsElements(Array("city")) .setOutputCol("cities")
Value Members
- object AnnotationMerger extends ParamsAndFeaturesReadable[AnnotationMerger] with Serializable
-
object
Doc2ChunkInternal extends DefaultParamsReadable[Doc2ChunkInternal] with Serializable
This is the companion object of Doc2ChunkInternal.
This is the companion object of Doc2ChunkInternal. Please refer to that class for the documentation.
- object FeaturesAssembler extends DefaultParamsReadable[FeaturesAssembler] with Serializable
- object Router extends ParamsAndFeaturesReadable[Router] with Serializable