Packages

c

com.johnsnowlabs.nlp

FeaturesAssembler

class FeaturesAssembler extends Transformer with DefaultParamsWritable with HasOutputAnnotatorType with HasOutputAnnotationCol with HasStorageRef with CheckLicense

The FeaturesAssembler is used to collect features from different columns. It can collect features from single value columns (anything which can be cast to a float, if casts fails then the value is set to 0), array columns or SparkNLP annotations (if the annotation is an embedding, it takes the embedding, otherwise tries to cast the result field). The output of the transformer is a FEATURE_VECTOR annotation (the numeric vector is in the embeddings field).

Linear Supertypes
CheckLicense, HasStorageRef, ParamsAndFeaturesWritable, HasFeatures, HasOutputAnnotationCol, HasOutputAnnotatorType, DefaultParamsWritable, MLWritable, Transformer, PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. FeaturesAssembler
  2. CheckLicense
  3. HasStorageRef
  4. ParamsAndFeaturesWritable
  5. HasFeatures
  6. HasOutputAnnotationCol
  7. HasOutputAnnotatorType
  8. DefaultParamsWritable
  9. MLWritable
  10. Transformer
  11. PipelineStage
  12. Logging
  13. Params
  14. Serializable
  15. Serializable
  16. Identifiable
  17. AnyRef
  18. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Parameters

  1. val inputCols: StringArrayParam

    Input columns containing features

Annotator types

Required input and expected output annotator types

  1. val outputAnnotatorType: AnnotatorType

    Output annotator type: FEATURE_VECTOR

    Output annotator type: FEATURE_VECTOR

    Definition Classes
    FeaturesAssembler → HasOutputAnnotatorType

Members

  1. type AnnotatorType = String
    Definition Classes
    HasOutputAnnotatorType
  1. def checkValidEnvironment(spark: Option[SparkSession]): Unit
    Definition Classes
    CheckLicense
  2. def checkValidScope(scope: String): Unit
    Definition Classes
    CheckLicense
  3. def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  4. def checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  5. final def clear(param: Param[_]): FeaturesAssembler.this.type
    Definition Classes
    Params
  6. def copy(extra: ParamMap): Transformer
    Definition Classes
    FeaturesAssembler → Transformer → PipelineStage → Params
  7. def createDatabaseConnection(database: Name): RocksDBConnection
    Definition Classes
    HasStorageRef
  8. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  9. def explainParams(): String
    Definition Classes
    Params
  10. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  11. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  12. val features: ArrayBuffer[Feature[_, _, _]]
    Definition Classes
    HasFeatures
  13. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  14. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  15. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  16. final def getOutputCol: String
    Definition Classes
    HasOutputAnnotationCol
  17. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  18. def getStorageRef: String
    Definition Classes
    HasStorageRef
  19. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  20. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  21. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  22. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  23. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  24. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  25. final def set[T](param: Param[T], value: T): FeaturesAssembler.this.type
    Definition Classes
    Params
  26. final def setOutputCol(value: String): FeaturesAssembler.this.type
    Definition Classes
    HasOutputAnnotationCol
  27. def setStorageRef(value: String): FeaturesAssembler.this.type
    Definition Classes
    HasStorageRef
  28. val storageRef: Param[String]
    Definition Classes
    HasStorageRef
  29. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  30. def transform(dataset: Dataset[_]): DataFrame
    Definition Classes
    FeaturesAssembler → Transformer
  31. def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" )
  32. def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" ) @varargs()
  33. final def transformSchema(schema: StructType): StructType

    requirement for pipeline transformation validation.

    requirement for pipeline transformation validation. It is called on fit()

    Definition Classes
    FeaturesAssembler → PipelineStage
  34. val uid: String
    Definition Classes
    FeaturesAssembler → Identifiable
  35. def validateStorageRef(dataset: Dataset[_], inputCols: Array[String], annotatorType: String): Unit
    Definition Classes
    HasStorageRef
  36. def write: MLWriter
    Definition Classes
    ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable

Parameter setters

  1. def setInputCols(value: Array[String]): FeaturesAssembler.this.type

    Input columns containing features

Parameter getters

  1. def getInputCols: Array[String]

    Input columns containing features