class Annotation2Training extends CheckLicense

Converts annotation results from json or csv files to DataFrame suitable for NER training. Input files must have a structure similar to the one produced by John Snow Labs' Generative AI annotation tool.

Linear Supertypes
CheckLicense, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Annotation2Training
  2. CheckLicense
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Annotation2Training(spark: SparkSession)

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit
    Definition Classes
    CheckLicense
  6. def checkValidScope(scope: String): Unit
    Definition Classes
    CheckLicense
  7. def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  8. def checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  9. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  10. def convertCsv2NerDF(csvPath: String, pipelineModel: PipelineModel, repartition: Int = 32, tokenOutputCol: String = "token", nerLabelCol: String = "label"): DataFrame

    Converts a CSV file with annotation results to a DataFrame suitable for NER training.

    Converts a CSV file with annotation results to a DataFrame suitable for NER training.

    csvPath

    Path to the input CSV file. The file will be read with the spark.read.csv method with header, multiLine, quote and escape options set.

    pipelineModel

    A pre-trained Spark NLP PipelineModel that includes at least a DocumentAssembler, and Tokenizer. PipelineModel can also include SentenceDetector, DocumentSplitter, WordEmbeddings, etc.

    repartition

    Number of partitions to use when reading the CSV file (default is 32).

    tokenOutputCol

    The name of the column containing token annotations (default is "token").

    nerLabelCol

    The name of the output column for NER labels (default is "label").

    returns

    A DataFrame to train NER models.

  11. def convertJson2NerDF(inputPath: String, pipelineModel: PipelineModel, repartition: Int = 32, tokenOutputCol: String = "token", nerLabelCol: String = "label"): DataFrame

    Converts a JSON file with annotation results to a DataFrame suitable for NER training.

    Converts a JSON file with annotation results to a DataFrame suitable for NER training.

    inputPath

    Path to the input JSON file. The file will be read with the spark.read.json method with multiLine option set to true.

    pipelineModel

    A pre-trained Spark NLP PipelineModel that includes at least a DocumentAssembler, and Tokenizer. PipelineModel can also include SentenceDetector, DocumentSplitter, WordEmbeddings, etc.

    repartition

    Number of partitions to use when reading the input file (default is 32).

    tokenOutputCol

    The name of the column containing token annotations (default is "token").

    nerLabelCol

    The name of the output column for NER labels (default is "label").

    returns

    A DataFrame to train NER models.

  12. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  13. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  14. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  15. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  16. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  17. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  18. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  19. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  20. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  21. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  22. def toString(): String
    Definition Classes
    AnyRef → Any
  23. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  24. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  25. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from CheckLicense

Inherited from AnyRef

Inherited from Any

Ungrouped