Packages

t

com.johnsnowlabs.nlp.annotators.splitter

DocumentSplitterParams

trait DocumentSplitterParams extends Params

A trait that contains all the params that InternalDocumentSplitter has.

See also

InternalDocumentSplitter

Linear Supertypes
Params, Serializable, Serializable, Identifiable, AnyRef, Any
Known Subclasses
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. DocumentSplitterParams
  2. Params
  3. Serializable
  4. Serializable
  5. Identifiable
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Abstract Value Members

  1. abstract def copy(extra: ParamMap): Params
    Definition Classes
    Params
  2. abstract val uid: String
    Definition Classes
    Identifiable

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T
    Attributes
    protected
    Definition Classes
    Params
  4. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  5. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  6. val caseSensitive: BooleanParam

    Whether to use case sensitive when matching regex (Default: false)

  7. final def clear(param: Param[_]): DocumentSplitterParams.this.type
    Definition Classes
    Params
  8. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  9. def copyValues[T <: Params](to: T, extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  10. val customBoundsStrategy: Param[String]

    Sets the custom bounds strategy for text parsing using regular expressions.

  11. final def defaultCopy[T <: Params](extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  12. val enableSentenceIncrement: BooleanParam

    Controls whether the sentence index should be incremented in the metadata of the annotator.

    Controls whether the sentence index should be incremented in the metadata of the annotator. When set to true, the annotator will increment the sentence index in the metadata for each split documents. Default: false

  13. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  14. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  15. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  16. def explainParams(): String
    Definition Classes
    Params
  17. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  18. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  19. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  20. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  21. def getCaseSensitive: Boolean

    Gets whether to use case sensitive when matching values (Default: false)

  22. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  23. def getCustomBoundsStrategy: String

    Gets customBoundsStrategy param

  24. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  25. def getEnableSentenceIncrement: Boolean

    Gets whether the sentence index should be incremented in the metadata of the annotator.

  26. def getMaxLength: Int

    Gets maxLength param

  27. def getMetaDataFields: Array[String]

    Gets metaDataFields param

  28. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  29. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  30. def getSentenceAwareness: Boolean

    Gets sentenceAwareness param

  31. def getSplitMode: String

    Gets splitMode param

  32. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  33. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  34. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  35. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  36. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  37. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  38. val maxLength: IntParam

    The maximum length for text parsing based on the specified mode.

  39. val metaDataFields: StringArrayParam

    Metadata fields to add specified data in columns to the metadata of the split documents.

    Metadata fields to add specified data in columns to the metadata of the split documents. You should set column names to read columns. Default: Array.empty

  40. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  41. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  42. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  43. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  44. val sentenceAwareness: BooleanParam

    Whether to split document by sentence awareness if possible.

    Whether to split document by sentence awareness if possible. If true, it can stop the split process before maxLength. If true, You should supply sentences from inputCols. Default: false.

  45. final def set(paramPair: ParamPair[_]): DocumentSplitterParams.this.type
    Attributes
    protected
    Definition Classes
    Params
  46. final def set(param: String, value: Any): DocumentSplitterParams.this.type
    Attributes
    protected
    Definition Classes
    Params
  47. final def set[T](param: Param[T], value: T): DocumentSplitterParams.this.type
    Definition Classes
    Params
  48. def setCaseSensitive(value: Boolean): DocumentSplitterParams.this.type

    Whether to use case sensitive when matching regex (Default: false)

  49. def setCustomBoundsStrategy(value: String): DocumentSplitterParams.this.type

    Sets the custom bounds strategy for text parsing using regular expressions.

    Sets the custom bounds strategy for text parsing using regular expressions.

    value

    The custom bounds strategy to be set. It should be one of the following values:

    • "none": No custom bounds are applied.
    • "prepend": Custom bounds are prepended to the split documents.
    • "append": Custom bounds are appended to the split documents.
    • Default: "prepend".
  50. final def setDefault(paramPairs: ParamPair[_]*): DocumentSplitterParams.this.type
    Attributes
    protected
    Definition Classes
    Params
  51. final def setDefault[T](param: Param[T], value: T): DocumentSplitterParams.this.type
    Attributes
    protected[org.apache.spark.ml]
    Definition Classes
    Params
  52. def setEnableSentenceIncrement(value: Boolean): DocumentSplitterParams.this.type

    Controls whether the sentence index should be incremented in the metadata of the annotator.

    Controls whether the sentence index should be incremented in the metadata of the annotator. When set to true, the annotator will increment the sentence index in the metadata for each split documents. Default: false

  53. def setMaxLength(value: Int): DocumentSplitterParams.this.type

    Sets the maximum length for text parsing based on the specified mode.

  54. def setMetaDataFields(value: Array[String]): DocumentSplitterParams.this.type

    Sets metadata fields to add specified data in columns to the metadata of the split documents.

    Sets metadata fields to add specified data in columns to the metadata of the split documents. You should set column names to read columns. Default: Array.empty

  55. def setSentenceAwareness(value: Boolean): DocumentSplitterParams.this.type

    Sets whether to split document by sentence awareness if possible.

    Sets whether to split document by sentence awareness if possible. If true, it can stop the split process before maxLength. If true, You should supply sentences from inputCols. Default: false.

  56. def setSplitMode(value: String): DocumentSplitterParams.this.type

    Sets the split mode to determine how text should be segmented.

    Sets the split mode to determine how text should be segmented. Default: 'regex'

    value

    The split mode to be set. It should be one of the following values:

    • "char": Split text based on individual characters.
    • "token": Split text based on tokens. You should supply tokens from inputCols.
    • "sentence": Split text based on sentences. You should supply sentences from inputCols.
    • "recursive": Split text recursively using a specific algorithm.
    • "regex": Split text based on a regular expression pattern.
  57. val splitMode: Param[String]

    The split mode to determine how text should be segmented.

    The split mode to determine how text should be segmented. Default: 'regex'

  58. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  59. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  60. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  61. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  62. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

getParam

param

setParam

Ungrouped