class ChunkKeyPhraseExtraction extends BertSentenceEmbeddings with CheckLicense

Extracts key phrases from texts.

ChunkKeyPhraseExtraction uses BertSentenceEmbeddings to determine the most relevant key phrases describing a text with the use of two approaches:

  • By using cosine similarities between the embedding representation of the chunks and the embedding representation of the corresponding sentences/documents.
  • By using the Maximal Marginal Relevance (MMR) algorithm (set with the setDivergence method) to determine the most relevant key phrases. If the selectMostDifferent parameter is set, return the key phrases that are the most different from each other (avoid too similar key phrases). The model compares the chunks against the corresponding sentences/documents and selects the chunks which are most representative of the broader text context (i.e., the document or the sentence they belong to). This allows, for example, to obtain a brief understanding of a document by selecting the most relevant phrases. The input to the model consists of chunk annotations and sentence or document annotation. The input chunks can be generated in various ways:
  • Using NGramGenerator, which allows to obtain ranked n-gram chunks from the text (can be used to identify new entities).
  • Using YakeKeywordExtractor, allowing to rank the keywords extracted using the YAKE algorithm.
  • Using TextMatcher, which allows to rank the desired chunks from the annotator.
  • Using NerConverter, which allows to extract ranked named entities (which entities are the most relevant in the sentence/document). The model operates either at sentence (selecting the most descriptive chunks from the sentence they belong to) or at document level. In the latter case, the key phrases are selected to represent all the input document annotations.

This model is a subclass of BertSentenceEmbeddings and shares all parameters with it. It can load any pretrained BertSentenceEmbeddings model. Available models can be found at Models Hub.

val embeddings = ChunkKeyPhraseExtraction.pretrained()
  .setInputCols("sentence", "chunk")
  .setOutputCol("key_phrase_chunks")

The default model is "sbert_jsl_medium_uncased", if no name is provided.

Sources :

The use of MMR, diversity-based reranking for reordering documents and producing summaries

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotator.SentenceDetector
import com.johnsnowlabs.nlp.embeddings.BertSentenceEmbeddings
import com.johnsnowlabs.nlp.EmbeddingsFinisher
import org.apache.spark.ml.Pipeline

 val documentAssembler = new DocumentAssembler()
   .setInputCol("text")
   .setOutputCol("document")

 val tokenizer = new Tokenizer()
   .setInputCols("document")
   .setOutputCol("tokens")

 val stopWordsCleaner = StopWordsCleaner.pretrained()
   .setInputCols("tokens")
   .setOutputCol("clean_tokens")
   .setCaseSensitive(false)

 val nGrams = new NGramGenerator()
   .setInputCols(Array("clean_tokens"))
   .setOutputCol("ngrams")
   .setN(3)


 val chunkKeyPhraseExtractor = ChunkKeyPhraseExtraction
   .pretrained()
   .setTopN(2)
   .setDivergence(0.7f)
   .setInputCols(Array("document", "ngrams"))
   .setOutputCol("key_phrases")

 val pipeline = new Pipeline()
   .setStages(Array(
     documentAssembler,
     tokenizer,
     stopWordsCleaner,
     nGrams,
     chunkKeyPhraseExtractor))

val sampleText = "Her Diabetes has become type 2 in the last year with her Diabetes." +
   " He complains of swelling in his right forearm."

val testDataset = Seq("").toDS.toDF("text")
val result = pipeline.fit(emptyDataset).transform(testDataset)

 result
   .selectExpr("explode(key_phrases) AS key_phrase")
   .selectExpr(
     "key_phrase.result",
     "key_phrase.metadata.DocumentSimilarity",
     "key_phrase.metadata.MMRScore")
   .show(truncate=false)

+--------------------------+-------------------+------------------+
|result                    |DocumentSimilarity |MMRScore          |
+--------------------------+-------------------+------------------+
|complains swelling forearm|0.6325718954229369 |0.1897715761677257|
|type 2 year               |0.40181028931546364|-0.189501077108947|
+--------------------------+-------------------+------------------+
See also

BertEmbeddings for token-level embeddings

BertSentenceEmbeddings for sentence-level embeddings

Annotators Main Page for a list of transformer based embeddings

Linear Supertypes
CheckLicense, BertSentenceEmbeddings, HasEngine, HasCaseSensitiveProperties, HasStorageRef, HasEmbeddingsProperties, HasProtectedParams, WriteOnnxModel, WriteTensorflowModel, HasBatchedAnnotate[BertSentenceEmbeddings], AnnotatorModel[BertSentenceEmbeddings], CanBeLazy, RawAnnotator[BertSentenceEmbeddings], HasOutputAnnotationCol, HasInputAnnotationCols, HasOutputAnnotatorType, ParamsAndFeaturesWritable, HasFeatures, DefaultParamsWritable, MLWritable, Model[BertSentenceEmbeddings], Transformer, PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. ChunkKeyPhraseExtraction
  2. CheckLicense
  3. BertSentenceEmbeddings
  4. HasEngine
  5. HasCaseSensitiveProperties
  6. HasStorageRef
  7. HasEmbeddingsProperties
  8. HasProtectedParams
  9. WriteOnnxModel
  10. WriteTensorflowModel
  11. HasBatchedAnnotate
  12. AnnotatorModel
  13. CanBeLazy
  14. RawAnnotator
  15. HasOutputAnnotationCol
  16. HasInputAnnotationCols
  17. HasOutputAnnotatorType
  18. ParamsAndFeaturesWritable
  19. HasFeatures
  20. DefaultParamsWritable
  21. MLWritable
  22. Model
  23. Transformer
  24. PipelineStage
  25. Logging
  26. Params
  27. Serializable
  28. Serializable
  29. Identifiable
  30. AnyRef
  31. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new ChunkKeyPhraseExtraction()
  2. new ChunkKeyPhraseExtraction(uid: String)

Type Members

  1. type AnnotationContent = Seq[Row]
    Attributes
    protected
    Definition Classes
    AnnotatorModel
  2. type AnnotatorType = String
    Definition Classes
    HasOutputAnnotatorType
  3. implicit class ProtectedParam[T] extends Param[T]
    Definition Classes
    HasProtectedParams

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T
    Attributes
    protected
    Definition Classes
    Params
  4. def $$[T](feature: StructFeature[T]): T
    Attributes
    protected
    Definition Classes
    HasFeatures
  5. def $$[K, V](feature: MapFeature[K, V]): Map[K, V]
    Attributes
    protected
    Definition Classes
    HasFeatures
  6. def $$[T](feature: SetFeature[T]): Set[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  7. def $$[T](feature: ArrayFeature[T]): Array[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  8. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  9. def _transform(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): DataFrame
    Attributes
    protected
    Definition Classes
    AnnotatorModel
  10. def afterAnnotate(dataset: DataFrame): DataFrame
    Attributes
    protected
    Definition Classes
    BertSentenceEmbeddings → AnnotatorModel
  11. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  12. def batchAnnotate(batchedAnnotations: Seq[Array[Annotation]]): Seq[Seq[Annotation]]
    Definition Classes
    ChunkKeyPhraseExtraction → BertSentenceEmbeddings → HasBatchedAnnotate
  13. def batchProcess(rows: Iterator[_]): Iterator[Row]
    Definition Classes
    HasBatchedAnnotate
  14. val batchSize: IntParam
    Definition Classes
    HasBatchedAnnotate
  15. def beforeAnnotate(dataset: Dataset[_]): Dataset[_]
    Attributes
    protected
    Definition Classes
    AnnotatorModel
  16. val caseSensitive: BooleanParam
    Definition Classes
    HasCaseSensitiveProperties
  17. final def checkSchema(schema: StructType, inputAnnotatorType: String): Boolean
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  18. def checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit
    Definition Classes
    CheckLicense
  19. def checkValidScope(scope: String): Unit
    Definition Classes
    CheckLicense
  20. def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  21. def checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  22. final def clear(param: Param[_]): ChunkKeyPhraseExtraction.this.type
    Definition Classes
    Params
  23. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  24. val concatenateSentences: BooleanParam

    A flag indicating whether to concatenate all input document.sentence annotations before computing their embedding.

    A flag indicating whether to concatenate all input document.sentence annotations before computing their embedding. This parameter is only used if documentLevelProcessing is true. If concatenateSentences is set to true, the model will concatenate the document/sentence input annotations and compute a single embedding. If it is false, the model will compute the embedding of each sentence separately and then average the resulting embedding vectors.

  25. val configProtoBytes: IntArrayParam
    Definition Classes
    BertSentenceEmbeddings
  26. def copy(extra: ParamMap): BertSentenceEmbeddings
    Definition Classes
    RawAnnotator → Model → Transformer → PipelineStage → Params
  27. def copyValues[T <: Params](to: T, extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  28. def createDatabaseConnection(database: Name): RocksDBConnection
    Definition Classes
    HasStorageRef
  29. final def defaultCopy[T <: Params](extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  30. val dimension: ProtectedParam[Int]
    Definition Classes
    HasEmbeddingsProperties
  31. val divergence: FloatParam

    The divergence value determines how different from each the extracted key phrases are.

    The divergence value determines how different from each the extracted key phrases are. The possible values are within the interval [0, 1]. The higher the value is, the more divergence is enforced. A value of 0 means the key phrases are not compared to each other (no divergence is ensured) and their relevance is determined solely by their similarity to the document. This parameter should not be used if setSelectMostDifferent is true - the two parameters aim to achieve the same goal in different ways. The default value is 0, meaning that the there is no constraint on the order of the extracted key phrases. The divergence is calculated using the Maximal Marginal Relevance measure.

  32. val documentLevelProcessing: BooleanParam

    A flag indicating whether to extract key phrases from the document level, i.e.

    A flag indicating whether to extract key phrases from the document level, i.e. from all the sentences available at a given row, rather than from the particular sentences the chunks refer to.

  33. val dropPunctuation: BooleanParam

    This parameter indicates whether to remove punctuation marks from the input chunks.

    This parameter indicates whether to remove punctuation marks from the input chunks. Chunks coming from NER models are not affected.

  34. val engine: Param[String]
    Definition Classes
    HasEngine
  35. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  36. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  37. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  38. def explainParams(): String
    Definition Classes
    Params
  39. def extraValidate(structType: StructType): Boolean
    Attributes
    protected
    Definition Classes
    RawAnnotator
  40. def extraValidateMsg: String
    Attributes
    protected
    Definition Classes
    RawAnnotator
  41. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  42. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  43. val features: ArrayBuffer[Feature[_, _, _]]
    Definition Classes
    HasFeatures
  44. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  45. def get[T](feature: StructFeature[T]): Option[T]
    Attributes
    protected
    Definition Classes
    HasFeatures
  46. def get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  47. def get[T](feature: SetFeature[T]): Option[Set[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  48. def get[T](feature: ArrayFeature[T]): Option[Array[T]]
    Attributes
    protected
    Definition Classes
    HasFeatures
  49. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  50. def getBatchSize: Int
    Definition Classes
    HasBatchedAnnotate
  51. def getCaseSensitive: Boolean
    Definition Classes
    HasCaseSensitiveProperties
  52. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  53. def getCombinationOfMostDifferentVectors(vectors: Seq[Array[Float]]): List[Int]

    Selects a combination of vectors such that the sum of their pair-wise cosines is minimized, i.e.

    Selects a combination of vectors such that the sum of their pair-wise cosines is minimized, i.e. the vectors as much different from each as possible.

    vectors

    a set of float vectors

    returns

    a list of vector indices

    Attributes
    protected
  54. def getConcatenateSentences: Boolean

    Check whether the input sentences.documents are concatenated before their embedding is computed

  55. def getConfigProtoBytes: Option[Array[Byte]]
    Definition Classes
    BertSentenceEmbeddings
  56. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  57. def getDimension: Int
    Definition Classes
    HasEmbeddingsProperties
  58. def getDivergence: Float

    Get the level of divergence of the extracted key phrases.

  59. def getDocumentLevelProcessing: Boolean

    Check whether the key phrases are extracted at the document or the sentence level

  60. def getDropPunctuation: Boolean

    Check whether the punctuation marks are removed from input chunks.

  61. def getEngine: String
    Definition Classes
    HasEngine
  62. def getInputCols: Array[String]
    Definition Classes
    HasInputAnnotationCols
  63. def getIsLong: Boolean
    Definition Classes
    BertSentenceEmbeddings
  64. def getLazyAnnotator: Boolean
    Definition Classes
    CanBeLazy
  65. def getMaxSentenceLength: Int
    Definition Classes
    BertSentenceEmbeddings
  66. def getModelIfNotSet: Bert
    Definition Classes
    BertSentenceEmbeddings
  67. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  68. final def getOutputCol: String
    Definition Classes
    HasOutputAnnotationCol
  69. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  70. def getSelectMostDifferent: Boolean

    Check whether the mode returns the top N key phrases which are most different from each other

  71. def getSignatures: Option[Map[String, String]]
    Definition Classes
    BertSentenceEmbeddings
  72. def getStorageRef: String
    Definition Classes
    HasStorageRef
  73. def getTopN: Int

    Get the number of key phrases extracted

  74. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  75. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  76. def hasParent: Boolean
    Definition Classes
    Model
  77. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  78. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  79. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  80. val inputAnnotatorTypes: Array[AnnotatorType]

    Input annotator types: DOCUMENT,CHUNK

    Input annotator types: DOCUMENT,CHUNK

    Definition Classes
    ChunkKeyPhraseExtraction → BertSentenceEmbeddings → HasInputAnnotationCols
  81. final val inputCols: StringArrayParam
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  82. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  83. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  84. val isLong: ProtectedParam[Boolean]
    Definition Classes
    BertSentenceEmbeddings
  85. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  86. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  87. val lazyAnnotator: BooleanParam
    Definition Classes
    CanBeLazy
  88. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  89. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  90. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  91. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  92. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  93. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  94. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  95. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  96. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  97. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  98. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  99. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  100. val maxSentenceLength: IntParam
    Definition Classes
    BertSentenceEmbeddings
  101. def msgHelper(schema: StructType): String
    Attributes
    protected
    Definition Classes
    HasInputAnnotationCols
  102. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  103. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  104. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  105. def onWrite(path: String, spark: SparkSession): Unit
    Definition Classes
    BertSentenceEmbeddings → ParamsAndFeaturesWritable
  106. val optionalInputAnnotatorTypes: Array[String]
    Definition Classes
    HasInputAnnotationCols
  107. def orderKeyPhrasesByMMR(document: Array[Float], keyPhrases: Seq[Array[Float]], numResults: Int): Seq[(Int, Double, Double)]

    Takes a document embedding and a sequence of chunk embeddings and selects a number of chunks with the highest MMR scores

    Takes a document embedding and a sequence of chunk embeddings and selects a number of chunks with the highest MMR scores

    document

    document/sentence embedding

    keyPhrases

    chunk embeddings

    numResults

    number of chunk indices to return

    returns

    a sequence of tuples (chunk index, document similarity, MMR score)

    Attributes
    protected
  108. val outputAnnotatorType: AnnotatorType

    Output annotator types: CHUNK

    Output annotator types: CHUNK

    Definition Classes
    ChunkKeyPhraseExtraction → BertSentenceEmbeddings → HasOutputAnnotatorType
  109. final val outputCol: Param[String]
    Attributes
    protected
    Definition Classes
    HasOutputAnnotationCol
  110. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  111. var parent: Estimator[BertSentenceEmbeddings]
    Definition Classes
    Model
  112. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  113. val selectMostDifferent: BooleanParam

    Pre-select topN * 2 key phrases and out of those select the topN that are the most different from each other.

    Pre-select topN * 2 key phrases and out of those select the topN that are the most different from each other. This parameter should not be used in conjunction with divergence as they aim to achieve the same goal, but in different ways.

  114. def sentenceEndTokenId: Int
    Definition Classes
    BertSentenceEmbeddings
  115. def sentenceStartTokenId: Int
    Definition Classes
    BertSentenceEmbeddings
  116. def set[T](param: ProtectedParam[T], value: T): ChunkKeyPhraseExtraction.this.type
    Definition Classes
    HasProtectedParams
  117. def set[T](feature: StructFeature[T], value: T): ChunkKeyPhraseExtraction.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  118. def set[K, V](feature: MapFeature[K, V], value: Map[K, V]): ChunkKeyPhraseExtraction.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  119. def set[T](feature: SetFeature[T], value: Set[T]): ChunkKeyPhraseExtraction.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  120. def set[T](feature: ArrayFeature[T], value: Array[T]): ChunkKeyPhraseExtraction.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  121. final def set(paramPair: ParamPair[_]): ChunkKeyPhraseExtraction.this.type
    Attributes
    protected
    Definition Classes
    Params
  122. final def set(param: String, value: Any): ChunkKeyPhraseExtraction.this.type
    Attributes
    protected
    Definition Classes
    Params
  123. final def set[T](param: Param[T], value: T): ChunkKeyPhraseExtraction.this.type
    Definition Classes
    Params
  124. def setBatchSize(size: Int): ChunkKeyPhraseExtraction.this.type
    Definition Classes
    HasBatchedAnnotate
  125. def setCaseSensitive(value: Boolean): ChunkKeyPhraseExtraction.this.type
    Definition Classes
    BertSentenceEmbeddings → HasCaseSensitiveProperties
  126. def setConcatenateSentences(value: Boolean): ChunkKeyPhraseExtraction.this.type

    Concatenate the input sentence/documentation annotations before computing their embedding Default value is 'true'.

  127. def setConfigProtoBytes(bytes: Array[Int]): ChunkKeyPhraseExtraction.this.type
    Definition Classes
    BertSentenceEmbeddings
  128. def setDefault[T](feature: StructFeature[T], value: () ⇒ T): ChunkKeyPhraseExtraction.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  129. def setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): ChunkKeyPhraseExtraction.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  130. def setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): ChunkKeyPhraseExtraction.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  131. def setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): ChunkKeyPhraseExtraction.this.type
    Attributes
    protected
    Definition Classes
    HasFeatures
  132. final def setDefault(paramPairs: ParamPair[_]*): ChunkKeyPhraseExtraction.this.type
    Attributes
    protected
    Definition Classes
    Params
  133. final def setDefault[T](param: Param[T], value: T): ChunkKeyPhraseExtraction.this.type
    Attributes
    protected
    Definition Classes
    Params
  134. def setDimension(value: Int): ChunkKeyPhraseExtraction.this.type
    Definition Classes
    BertSentenceEmbeddings → HasEmbeddingsProperties
  135. def setDivergence(value: Float): ChunkKeyPhraseExtraction.this.type

    Set the level of divergence of the extracted key phrases.

    Set the level of divergence of the extracted key phrases. The value should be in the interval [0, 1].

  136. def setDocumentLevelProcessing(value: Boolean): ChunkKeyPhraseExtraction.this.type

    Extract key phrases from the whole document (true) or from particular sentences which the chunks refer to (false) Default value is 'true'.

  137. def setDropPunctuation(value: Boolean): ChunkKeyPhraseExtraction.this.type

    Remove punctuation marks from input chunks.

    Remove punctuation marks from input chunks. Default value is 'true'.

  138. final def setInputCols(value: String*): ChunkKeyPhraseExtraction.this.type
    Definition Classes
    HasInputAnnotationCols
  139. def setInputCols(value: Array[String]): ChunkKeyPhraseExtraction.this.type
    Definition Classes
    HasInputAnnotationCols
  140. def setIsLong(value: Boolean): ChunkKeyPhraseExtraction.this.type
    Definition Classes
    BertSentenceEmbeddings
  141. def setLazyAnnotator(value: Boolean): ChunkKeyPhraseExtraction.this.type
    Definition Classes
    CanBeLazy
  142. def setMaxSentenceLength(value: Int): ChunkKeyPhraseExtraction.this.type
    Definition Classes
    BertSentenceEmbeddings
  143. def setModelIfNotSet(spark: SparkSession, tensorflowWrapper: Option[TensorflowWrapper], onnxWrapper: Option[OnnxWrapper]): ChunkKeyPhraseExtraction.this.type
    Definition Classes
    BertSentenceEmbeddings
  144. final def setOutputCol(value: String): ChunkKeyPhraseExtraction.this.type
    Definition Classes
    HasOutputAnnotationCol
  145. def setParent(parent: Estimator[BertSentenceEmbeddings]): BertSentenceEmbeddings
    Definition Classes
    Model
  146. def setSelectMostDifferent(value: Boolean): ChunkKeyPhraseExtraction.this.type

    Let the model return the top N key phrases which are the most different from each other

  147. def setSignatures(value: Map[String, String]): ChunkKeyPhraseExtraction.this.type
    Definition Classes
    BertSentenceEmbeddings
  148. def setStorageRef(value: String): ChunkKeyPhraseExtraction.this.type
    Definition Classes
    HasStorageRef
  149. def setTopN(value: Int): ChunkKeyPhraseExtraction.this.type

    Set the number of key phrases to extract

  150. def setVocabulary(value: Map[String, Int]): ChunkKeyPhraseExtraction.this.type
    Definition Classes
    BertSentenceEmbeddings
  151. val signatures: MapFeature[String, String]
    Definition Classes
    BertSentenceEmbeddings
  152. val storageRef: Param[String]
    Definition Classes
    HasStorageRef
  153. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  154. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  155. def tokenize(sentences: Seq[Sentence]): Seq[WordpieceTokenizedSentence]
    Definition Classes
    BertSentenceEmbeddings
  156. val topN: IntParam

    Number of key phrases to extract, ordered by their score

  157. final def transform(dataset: Dataset[_]): DataFrame
    Definition Classes
    AnnotatorModel → Transformer
  158. def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" )
  159. def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" ) @varargs()
  160. final def transformSchema(schema: StructType): StructType
    Definition Classes
    RawAnnotator → PipelineStage
  161. def transformSchema(schema: StructType, logging: Boolean): StructType
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  162. val uid: String
    Definition Classes
    ChunkKeyPhraseExtraction → BertSentenceEmbeddings → Identifiable
  163. def validate(schema: StructType): Boolean
    Attributes
    protected
    Definition Classes
    RawAnnotator
  164. def validateStorageRef(dataset: Dataset[_], inputCols: Array[String], annotatorType: String): Unit
    Definition Classes
    HasStorageRef
  165. val vocabulary: MapFeature[String, Int]
    Definition Classes
    BertSentenceEmbeddings
  166. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  167. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  168. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  169. def wrapColumnMetadata(col: Column): Column
    Attributes
    protected
    Definition Classes
    RawAnnotator
  170. def wrapEmbeddingsMetadata(col: Column, embeddingsDim: Int, embeddingsRef: Option[String]): Column
    Attributes
    protected
    Definition Classes
    HasEmbeddingsProperties
  171. def wrapSentenceEmbeddingsMetadata(col: Column, embeddingsDim: Int, embeddingsRef: Option[String]): Column
    Attributes
    protected
    Definition Classes
    HasEmbeddingsProperties
  172. def write: MLWriter
    Definition Classes
    ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable
  173. def writeOnnxModel(path: String, spark: SparkSession, onnxWrapper: OnnxWrapper, suffix: String, fileName: String): Unit
    Definition Classes
    WriteOnnxModel
  174. def writeOnnxModels(path: String, spark: SparkSession, onnxWrappersWithNames: Seq[(OnnxWrapper, String)], suffix: String, dataFileSuffix: String): Unit
    Definition Classes
    WriteOnnxModel
  175. def writeTensorflowHub(path: String, tfPath: String, spark: SparkSession, suffix: String): Unit
    Definition Classes
    WriteTensorflowModel
  176. def writeTensorflowModel(path: String, spark: SparkSession, tensorflow: TensorflowWrapper, suffix: String, filename: String, configProtoBytes: Option[Array[Byte]]): Unit
    Definition Classes
    WriteTensorflowModel
  177. def writeTensorflowModelV2(path: String, spark: SparkSession, tensorflow: TensorflowWrapper, suffix: String, filename: String, configProtoBytes: Option[Array[Byte]], savedSignatures: Option[Map[String, String]]): Unit
    Definition Classes
    WriteTensorflowModel

Inherited from CheckLicense

Inherited from BertSentenceEmbeddings

Inherited from HasEngine

Inherited from HasCaseSensitiveProperties

Inherited from HasStorageRef

Inherited from HasEmbeddingsProperties

Inherited from HasProtectedParams

Inherited from WriteOnnxModel

Inherited from WriteTensorflowModel

Inherited from HasBatchedAnnotate[BertSentenceEmbeddings]

Inherited from AnnotatorModel[BertSentenceEmbeddings]

Inherited from CanBeLazy

Inherited from RawAnnotator[BertSentenceEmbeddings]

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from HasOutputAnnotatorType

Inherited from ParamsAndFeaturesWritable

Inherited from HasFeatures

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from Model[BertSentenceEmbeddings]

Inherited from Transformer

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

anno

getParam

param

setParam

Ungrouped