Packages

p

com.johnsnowlabs

annotator

package annotator

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class AnnotationConverter extends Transformer with DefaultParamsWritable with HasOutputAnnotationCol with HasInputCol
  2. class AnnotationMerger extends AnnotatorModel[AnnotationMerger] with CheckLicense with HasMultipleInputAnnotationCols with HasSimpleAnnotate[AnnotationMerger]

    Merges Annotations from multiple columns.

    Merges Annotations from multiple columns.

    Example

    val empty_data = Seq([[""]]).toDF("text")
    val document1 = DocumentAssembler().setInputCol("text").setOutputCol("document1")
    val document2 = DocumentAssembler().setInputCol("text").setOutputCol("document2")
    val annotation_merger = AnnotationMerger()
        .setInputCols("document1", "document2")
        .setInputType("document")
        .setOutputCol("all_docs")
    
    val pipelineModel = new Pipeline().setStages(Array(
        document1, document2, annotation_merger)).fit(empty_data)
    val lp = LightPipeline(pipelineModel)
    lp.fullAnnotate("one doc to be replicated")
    [{'document1': [Annotation(document, 0, 23, one doc to be replicated, {})], 'document2': [Annotation(document, 0, 23, one doc to be replicated, {})], 'all_docs': [Annotation(document, 0, 23, one doc to be replicated, {}), Annotation(document, 0, 23, one doc to be replicated, {})]}]
  3. class Doc2ChunkInternal extends Model[Doc2ChunkInternal] with RawAnnotator[Doc2ChunkInternal]
  4. class FeaturesAssembler extends AnnotatorModel[FeaturesAssembler] with HasStorageRef with HasSimpleAnnotate[FeaturesAssembler] with CheckLicense

    The FeaturesAssembler is used to collect features from different columns.

    The FeaturesAssembler is used to collect features from different columns. It can collect features from single value columns (anything which can be cast to a float, if casts fails then the value is set to 0), array columns or SparkNLP annotations (if the annotation is an embedding, it takes the embedding, otherwise tries to cast the result field). The output of the transformer is a FEATURE_VECTOR annotation (the numeric vector is in the embeddings field).

  5. class MetadataAnnotationConverter extends AnnotatorModel[MetadataAnnotationConverter] with HasSimpleAnnotate[MetadataAnnotationConverter]

    Converts metadata fields in annotations into their respective result, begin, or end values.

    Converts metadata fields in annotations into their respective result, begin, or end values.

    In certain pipelines, annotations carry rich metadata such as normalized values, custom offsets, or alternative representations. MetadataAnnotationConverter is a helper component that transforms these metadata fields into actual Annotation fields. This can be used to ensure consistency or to override noisy model predictions with more reliable metadata-derived information.

    Notes

    This annotator assumes that metadata fields like begin_key, end_key, or result_key contain values that can override the corresponding fields in the original annotation. If a metadata key is missing or invalid, the original annotation's values are used.

    Example

    Use the converter to override annotation fields from metadata:

    val converter = new MetadataAnnotationConverter()
      .setInputAnnotatorType("chunk")
      .setResultField("normalized")
      .setBeginField("char_start")
      .setEndField("char_end")
      .setInputCols("chunk")  // input should match inputAnnotatorType
      .setOutputCol("converted_chunk")

    After transformation, the annotation in converted_chunk will use metadata if available.

  6. class Router extends AnnotatorModel[Router] with HasSimpleAnnotate[Router] with ParamsAndFeaturesWritable with CheckLicense

    This class allows to filter any annotation based on the medatata fields.

    This class allows to filter any annotation based on the medatata fields.

    val testData = ResourceHelper.spark.createDataFrame(Seq(
       (1, "\"Jesus live in Leon. Madrid is the capital of Spain")
     )).toDF("id", "text")
    
     val document = new DocumentAssembler()
       .setInputCol("text")
       .setOutputCol("document")
    
     val sentence = new SentenceDetector()
       .setInputCols("document")
       .setOutputCol("sentence")
    
     val regexMatcher = new RegexMatcher()
       .setExternalRules(ExternalResource("src/test/resources/regex-matcher/rules2.txt", ReadAs.TEXT, Map("delimiter" -> ",")))
       .setInputCols(Array("sentence"))
       .setOutputCol("regex")
       .setStrategy("MATCH_ALL")
    
     val chunk2Doc = new Chunk2Doc().setInputCols("regex").setOutputCol("doc_chunk")
    
     val embeddings = BertSentenceEmbeddings.pretrained("sent_small_bert_L2_128")
       .setInputCols("doc_chunk")
       .setOutputCol("bert")
       .setCaseSensitive(false)
       .setMaxSentenceLength(32)
    
     val routerName = new Router()
       .setInputType("sentence_embeddings")
       .setInputCols(Array("bert"))
       .setMetadataField("identifier")
       .setFilterFieldsElements(Array("name"))
       .setOutputCol("names")
     val routerCity = new Router()
       .setInputType("sentence_embeddings")
       .setInputCols(Array("bert"))
       .setMetadataField("identifier")
       .setFilterFieldsElements(Array("city"))
       .setOutputCol("cities")

Value Members

  1. object AnnotationConverter extends DefaultParamsReadable[AnnotationConverter] with Serializable
  2. object AnnotationMerger extends ParamsAndFeaturesReadable[AnnotationMerger] with Serializable
  3. object Doc2ChunkInternal extends DefaultParamsReadable[Doc2ChunkInternal] with Serializable

    This is the companion object of Doc2ChunkInternal.

    This is the companion object of Doc2ChunkInternal. Please refer to that class for the documentation.

  4. object FeaturesAssembler extends DefaultParamsReadable[FeaturesAssembler] with Serializable
  5. object MetadataAnnotationConverter extends DefaultParamsReadable[MetadataAnnotationConverter] with Serializable

    This is the companion object of MetadataAnnotationConverter.

    This is the companion object of MetadataAnnotationConverter. Please refer to that class for the documentation.

  6. object Router extends ParamsAndFeaturesReadable[Router] with Serializable

Ungrouped