ChunkKeyPhraseExtraction

Companion object ChunkKeyPhraseExtraction

class ChunkKeyPhraseExtraction extends BertSentenceEmbeddings with CheckLicense

Extracts key phrases from texts.

ChunkKeyPhraseExtraction uses BertSentenceEmbeddings to determine the most relevant key phrases describing a text with the use of two approaches:

By using cosine similarities between the embedding representation of the chunks and the embedding representation of the corresponding sentences/documents.
By using the Maximal Marginal Relevance (MMR) algorithm (set with the setDivergence method) to determine the most relevant key phrases. If the selectMostDifferent parameter is set, return the key phrases that are the most different from each other (avoid too similar key phrases). The model compares the chunks against the corresponding sentences/documents and selects the chunks which are most representative of the broader text context (i.e., the document or the sentence they belong to). This allows, for example, to obtain a brief understanding of a document by selecting the most relevant phrases. The input to the model consists of chunk annotations and sentence or document annotation. The input chunks can be generated in various ways:
Using NGramGenerator, which allows to obtain ranked n-gram chunks from the text (can be used to identify new entities).
Using YakeKeywordExtractor, allowing to rank the keywords extracted using the YAKE algorithm.
Using TextMatcher, which allows to rank the desired chunks from the annotator.
Using NerConverter, which allows to extract ranked named entities (which entities are the most relevant in the sentence/document). The model operates either at sentence (selecting the most descriptive chunks from the sentence they belong to) or at document level. In the latter case, the key phrases are selected to represent all the input document annotations.

This model is a subclass of BertSentenceEmbeddings and shares all parameters with it. It can load any pretrained BertSentenceEmbeddings model. Available models can be found at Models Hub.

val embeddings = ChunkKeyPhraseExtraction.pretrained()
  .setInputCols("sentence", "chunk")
  .setOutputCol("key_phrase_chunks")

The default model is "sbert_jsl_medium_uncased", if no name is provided.

Sources :

The use of MMR, diversity-based reranking for reordering documents and producing summaries

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Example

import spark.implicits._
import com.johnsnowlabs.nlp.base.DocumentAssembler
import com.johnsnowlabs.nlp.annotator.SentenceDetector
import com.johnsnowlabs.nlp.embeddings.BertSentenceEmbeddings
import com.johnsnowlabs.nlp.EmbeddingsFinisher
import org.apache.spark.ml.Pipeline

 val documentAssembler = new DocumentAssembler()
   .setInputCol("text")
   .setOutputCol("document")

 val tokenizer = new Tokenizer()
   .setInputCols("document")
   .setOutputCol("tokens")

 val stopWordsCleaner = StopWordsCleaner.pretrained()
   .setInputCols("tokens")
   .setOutputCol("clean_tokens")
   .setCaseSensitive(false)

 val nGrams = new NGramGenerator()
   .setInputCols(Array("clean_tokens"))
   .setOutputCol("ngrams")
   .setN(3)


 val chunkKeyPhraseExtractor = ChunkKeyPhraseExtraction
   .pretrained()
   .setTopN(2)
   .setDivergence(0.7f)
   .setInputCols(Array("document", "ngrams"))
   .setOutputCol("key_phrases")

 val pipeline = new Pipeline()
   .setStages(Array(
     documentAssembler,
     tokenizer,
     stopWordsCleaner,
     nGrams,
     chunkKeyPhraseExtractor))

val sampleText = "Her Diabetes has become type 2 in the last year with her Diabetes." +
   " He complains of swelling in his right forearm."

val testDataset = Seq("").toDS.toDF("text")
val result = pipeline.fit(emptyDataset).transform(testDataset)

 result
   .selectExpr("explode(key_phrases) AS key_phrase")
   .selectExpr(
     "key_phrase.result",
     "key_phrase.metadata.DocumentSimilarity",
     "key_phrase.metadata.MMRScore")
   .show(truncate=false)

+--------------------------+-------------------+------------------+
|result                    |DocumentSimilarity |MMRScore          |
+--------------------------+-------------------+------------------+
|complains swelling forearm|0.6325718954229369 |0.1897715761677257|
|type 2 year               |0.40181028931546364|-0.189501077108947|
+--------------------------+-------------------+------------------+

See also: BertEmbeddings for token-level embeddings
BertSentenceEmbeddings for sentence-level embeddings
Annotators Main Page for a list of transformer based embeddings

Linear Supertypes

CheckLicense, BertSentenceEmbeddings, HasEngine, HasCaseSensitiveProperties, HasStorageRef, HasEmbeddingsProperties, HasProtectedParams, WriteOnnxModel, WriteOpenvinoModel, WriteTensorflowModel, HasBatchedAnnotate[BertSentenceEmbeddings], AnnotatorModel[BertSentenceEmbeddings], CanBeLazy, RawAnnotator[BertSentenceEmbeddings], HasOutputAnnotationCol, HasInputAnnotationCols, HasOutputAnnotatorType, ParamsAndFeaturesWritable, HasFeatures, DefaultParamsWritable, MLWritable, Model[BertSentenceEmbeddings], Transformer, PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any

Ordering

Grouped
Alphabetic
By Inheritance

Inherited

ChunkKeyPhraseExtraction
CheckLicense
BertSentenceEmbeddings
HasEngine
HasCaseSensitiveProperties
HasStorageRef
HasEmbeddingsProperties
HasProtectedParams
WriteOnnxModel
WriteOpenvinoModel
WriteTensorflowModel
HasBatchedAnnotate
AnnotatorModel
CanBeLazy
RawAnnotator
HasOutputAnnotationCol
HasInputAnnotationCols
HasOutputAnnotatorType
ParamsAndFeaturesWritable
HasFeatures
DefaultParamsWritable
MLWritable
Model
Transformer
PipelineStage
Logging
Params
Serializable
Serializable
Identifiable
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Instance Constructors

new ChunkKeyPhraseExtraction()
new ChunkKeyPhraseExtraction(uid: String)

Type Members

type AnnotationContent = Seq[Row]

Attributes
protected
Definition Classes
AnnotatorModel
type AnnotatorType = String

Definition Classes
HasOutputAnnotatorType
implicit class ProtectedParam[T] extends Param[T]

Definition Classes
HasProtectedParams

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def $[T](param: Param[T]): T

Attributes
protected
Definition Classes
Params
def $$[T](feature: StructFeature[T]): T

Attributes
protected
Definition Classes
HasFeatures
def $$[K, V](feature: MapFeature[K, V]): Map[K, V]

Attributes
protected
Definition Classes
HasFeatures
def $$[T](feature: SetFeature[T]): Set[T]

Attributes
protected
Definition Classes
HasFeatures
def $$[T](feature: ArrayFeature[T]): Array[T]

Attributes
protected
Definition Classes
HasFeatures
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def _transform(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): DataFrame

Attributes
protected
Definition Classes
AnnotatorModel
def afterAnnotate(dataset: DataFrame): DataFrame

Attributes
protected
Definition Classes
BertSentenceEmbeddings → AnnotatorModel
final def asInstanceOf[T0]: T0

Definition Classes
Any
def batchAnnotate(batchedAnnotations: Seq[Array[Annotation]]): Seq[Seq[Annotation]]

Definition Classes
ChunkKeyPhraseExtraction → BertSentenceEmbeddings → HasBatchedAnnotate
def batchProcess(rows: Iterator[_]): Iterator[Row]

Definition Classes
HasBatchedAnnotate
val batchSize: IntParam

Definition Classes
HasBatchedAnnotate
def beforeAnnotate(dataset: Dataset[_]): Dataset[_]

Attributes
protected
Definition Classes
AnnotatorModel
val caseSensitive: BooleanParam

Definition Classes
HasCaseSensitiveProperties
final def checkSchema(schema: StructType, inputAnnotatorType: String): Boolean

Attributes
protected
Definition Classes
HasInputAnnotationCols
def checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit

Definition Classes
CheckLicense
def checkValidScope(scope: String): Unit

Definition Classes
CheckLicense
def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit

Definition Classes
CheckLicense
def checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit

Definition Classes
CheckLicense
final def clear(param: Param[_]): ChunkKeyPhraseExtraction.this.type

Definition Classes
Params
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
val concatenateSentences: BooleanParam
A flag indicating whether to concatenate all input document.sentence annotations before computing their embedding.
A flag indicating whether to concatenate all input document.sentence annotations before computing their embedding. This parameter is only used if documentLevelProcessing is true. If concatenateSentences is set to true, the model will concatenate the document/sentence input annotations and compute a single embedding. If it is false, the model will compute the embedding of each sentence separately and then average the resulting embedding vectors.
val configProtoBytes: IntArrayParam

Definition Classes
BertSentenceEmbeddings
def copy(extra: ParamMap): BertSentenceEmbeddings

Definition Classes
RawAnnotator → Model → Transformer → PipelineStage → Params
def copyValues[T <: Params](to: T, extra: ParamMap): T

Attributes
protected
Definition Classes
Params
def createDatabaseConnection(database: Name): RocksDBConnection

Definition Classes
HasStorageRef
final def defaultCopy[T <: Params](extra: ParamMap): T

Attributes
protected
Definition Classes
Params
val dimension: ProtectedParam[Int]

Definition Classes
HasEmbeddingsProperties
val divergence: FloatParam
The divergence value determines how different from each the extracted key phrases are.
The divergence value determines how different from each the extracted key phrases are. The possible values are within the interval [0, 1]. The higher the value is, the more divergence is enforced. A value of 0 means the key phrases are not compared to each other (no divergence is ensured) and their relevance is determined solely by their similarity to the document. This parameter should not be used if setSelectMostDifferent is true - the two parameters aim to achieve the same goal in different ways. The default value is 0, meaning that the there is no constraint on the order of the extracted key phrases. The divergence is calculated using the Maximal Marginal Relevance measure.
val documentLevelProcessing: BooleanParam
A flag indicating whether to extract key phrases from the document level, i.e.
A flag indicating whether to extract key phrases from the document level, i.e. from all the sentences available at a given row, rather than from the particular sentences the chunks refer to.
val dropPunctuation: BooleanParam
This parameter indicates whether to remove punctuation marks from the input chunks.
This parameter indicates whether to remove punctuation marks from the input chunks. Chunks coming from NER models are not affected.
val engine: Param[String]

Definition Classes
HasEngine
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def explainParam(param: Param[_]): String

Definition Classes
Params
def explainParams(): String

Definition Classes
Params
def extraValidate(structType: StructType): Boolean

Attributes
protected
Definition Classes
RawAnnotator
def extraValidateMsg: String

Attributes
protected
Definition Classes
RawAnnotator
final def extractParamMap(): ParamMap

Definition Classes
Params
final def extractParamMap(extra: ParamMap): ParamMap

Definition Classes
Params
val features: ArrayBuffer[Feature[_, _, _]]

Definition Classes
HasFeatures
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def get[T](feature: StructFeature[T]): Option[T]

Attributes
protected
Definition Classes
HasFeatures
def get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]

Attributes
protected
Definition Classes
HasFeatures
def get[T](feature: SetFeature[T]): Option[Set[T]]

Attributes
protected
Definition Classes
HasFeatures
def get[T](feature: ArrayFeature[T]): Option[Array[T]]

Attributes
protected
Definition Classes
HasFeatures
final def get[T](param: Param[T]): Option[T]

Definition Classes
Params
def getBatchSize: Int

Definition Classes
HasBatchedAnnotate
def getCaseSensitive: Boolean

Definition Classes
HasCaseSensitiveProperties
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
def getCombinationOfMostDifferentVectors(vectors: Seq[Array[Float]]): List[Int]
Selects a combination of vectors such that the sum of their pair-wise cosines is minimized, i.e.
Selects a combination of vectors such that the sum of their pair-wise cosines is minimized, i.e. the vectors as much different from each as possible.
vectors
a set of float vectors
returns
a list of vector indices

Attributes
protected
def getConcatenateSentences: Boolean
Check whether the input sentences.documents are concatenated before their embedding is computed
def getConfigProtoBytes: Option[Array[Byte]]

Definition Classes
BertSentenceEmbeddings
final def getDefault[T](param: Param[T]): Option[T]

Definition Classes
Params
def getDimension: Int

Definition Classes
HasEmbeddingsProperties
def getDivergence: Float
Get the level of divergence of the extracted key phrases.
def getDocumentLevelProcessing: Boolean
Check whether the key phrases are extracted at the document or the sentence level
def getDropPunctuation: Boolean
Check whether the punctuation marks are removed from input chunks.
def getEngine: String

Definition Classes
HasEngine
def getInputCols: Array[String]

Definition Classes
HasInputAnnotationCols
def getIsLong: Boolean

Definition Classes
BertSentenceEmbeddings
def getLazyAnnotator: Boolean

Definition Classes
CanBeLazy
def getMaxSentenceLength: Int

Definition Classes
BertSentenceEmbeddings
def getModelIfNotSet: Bert

Definition Classes
BertSentenceEmbeddings
final def getOrDefault[T](param: Param[T]): T

Definition Classes
Params
final def getOutputCol: String

Definition Classes
HasOutputAnnotationCol
def getParam(paramName: String): Param[Any]

Definition Classes
Params
def getSelectMostDifferent: Boolean
Check whether the mode returns the top N key phrases which are most different from each other
def getSignatures: Option[Map[String, String]]

Definition Classes
BertSentenceEmbeddings
def getStorageRef: String

Definition Classes
HasStorageRef
def getTopN: Int
Get the number of key phrases extracted
final def hasDefault[T](param: Param[T]): Boolean

Definition Classes
Params
def hasParam(paramName: String): Boolean

Definition Classes
Params
def hasParent: Boolean

Definition Classes
Model
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

Attributes
protected
Definition Classes
Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes
protected
Definition Classes
Logging
val inputAnnotatorTypes: Array[AnnotatorType]
Input annotator types: DOCUMENT,CHUNK
Input annotator types: DOCUMENT,CHUNK

Definition Classes
ChunkKeyPhraseExtraction → BertSentenceEmbeddings → HasInputAnnotationCols
final val inputCols: StringArrayParam

Attributes
protected
Definition Classes
HasInputAnnotationCols
final def isDefined(param: Param[_]): Boolean

Definition Classes
Params
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
val isLong: ProtectedParam[Boolean]

Definition Classes
BertSentenceEmbeddings
final def isSet(param: Param[_]): Boolean

Definition Classes
Params
def isTraceEnabled(): Boolean

Attributes
protected
Definition Classes
Logging
val lazyAnnotator: BooleanParam

Definition Classes
CanBeLazy
def log: Logger

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logName: String

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
val maxSentenceLength: IntParam

Definition Classes
BertSentenceEmbeddings
def msgHelper(schema: StructType): String

Attributes
protected
Definition Classes
HasInputAnnotationCols
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
def onWrite(path: String, spark: SparkSession): Unit

Definition Classes
BertSentenceEmbeddings → ParamsAndFeaturesWritable
val optionalInputAnnotatorTypes: Array[String]

Definition Classes
HasInputAnnotationCols
def orderKeyPhrasesByMMR(document: Array[Float], keyPhrases: Seq[Array[Float]], numResults: Int): Seq[(Int, Double, Double)]
Takes a document embedding and a sequence of chunk embeddings and selects a number of chunks with the highest MMR scores
Takes a document embedding and a sequence of chunk embeddings and selects a number of chunks with the highest MMR scores
document
document/sentence embedding
keyPhrases
chunk embeddings
numResults
number of chunk indices to return
returns
a sequence of tuples (chunk index, document similarity, MMR score)

Attributes
protected
val outputAnnotatorType: AnnotatorType
Output annotator types: CHUNK
Output annotator types: CHUNK

Definition Classes
ChunkKeyPhraseExtraction → BertSentenceEmbeddings → HasOutputAnnotatorType
final val outputCol: Param[String]

Attributes
protected
Definition Classes
HasOutputAnnotationCol
lazy val params: Array[Param[_]]

Definition Classes
Params
var parent: Estimator[BertSentenceEmbeddings]

Definition Classes
Model
def save(path: String): Unit

Definition Classes
MLWritable
Annotations
@Since( "1.6.0" ) @throws( ... )
val selectMostDifferent: BooleanParam
Pre-select topN * 2 key phrases and out of those select the topN that are the most different from each other.
Pre-select topN * 2 key phrases and out of those select the topN that are the most different from each other. This parameter should not be used in conjunction with divergence as they aim to achieve the same goal, but in different ways.
def sentenceEndTokenId: Int

Definition Classes
BertSentenceEmbeddings
def sentenceStartTokenId: Int

Definition Classes
BertSentenceEmbeddings
def set[T](param: ProtectedParam[T], value: T): ChunkKeyPhraseExtraction.this.type

Definition Classes
HasProtectedParams
def set[T](feature: StructFeature[T], value: T): ChunkKeyPhraseExtraction.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[K, V](feature: MapFeature[K, V], value: Map[K, V]): ChunkKeyPhraseExtraction.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[T](feature: SetFeature[T], value: Set[T]): ChunkKeyPhraseExtraction.this.type

Attributes
protected
Definition Classes
HasFeatures
def set[T](feature: ArrayFeature[T], value: Array[T]): ChunkKeyPhraseExtraction.this.type

Attributes
protected
Definition Classes
HasFeatures
final def set(paramPair: ParamPair[_]): ChunkKeyPhraseExtraction.this.type

Attributes
protected
Definition Classes
Params
final def set(param: String, value: Any): ChunkKeyPhraseExtraction.this.type

Attributes
protected
Definition Classes
Params
final def set[T](param: Param[T], value: T): ChunkKeyPhraseExtraction.this.type

Definition Classes
Params
def setBatchSize(size: Int): ChunkKeyPhraseExtraction.this.type

Definition Classes
HasBatchedAnnotate
def setCaseSensitive(value: Boolean): ChunkKeyPhraseExtraction.this.type

Definition Classes
BertSentenceEmbeddings → HasCaseSensitiveProperties
def setConcatenateSentences(value: Boolean): ChunkKeyPhraseExtraction.this.type
Concatenate the input sentence/documentation annotations before computing their embedding Default value is 'true'.
def setConfigProtoBytes(bytes: Array[Int]): ChunkKeyPhraseExtraction.this.type

Definition Classes
BertSentenceEmbeddings
def setDefault[T](feature: StructFeature[T], value: () ⇒ T): ChunkKeyPhraseExtraction.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): ChunkKeyPhraseExtraction.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): ChunkKeyPhraseExtraction.this.type

Attributes
protected
Definition Classes
HasFeatures
def setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): ChunkKeyPhraseExtraction.this.type

Attributes
protected
Definition Classes
HasFeatures
final def setDefault(paramPairs: ParamPair[_]*): ChunkKeyPhraseExtraction.this.type

Attributes
protected
Definition Classes
Params
final def setDefault[T](param: Param[T], value: T): ChunkKeyPhraseExtraction.this.type

Attributes
protected[org.apache.spark.ml]
Definition Classes
Params
def setDimension(value: Int): ChunkKeyPhraseExtraction.this.type

Definition Classes
BertSentenceEmbeddings → HasEmbeddingsProperties
def setDivergence(value: Float): ChunkKeyPhraseExtraction.this.type
Set the level of divergence of the extracted key phrases.
Set the level of divergence of the extracted key phrases. The value should be in the interval [0, 1].
def setDocumentLevelProcessing(value: Boolean): ChunkKeyPhraseExtraction.this.type
Extract key phrases from the whole document (true) or from particular sentences which the chunks refer to (false) Default value is 'true'.
def setDropPunctuation(value: Boolean): ChunkKeyPhraseExtraction.this.type
Remove punctuation marks from input chunks.
Remove punctuation marks from input chunks. Default value is 'true'.
final def setInputCols(value: String*): ChunkKeyPhraseExtraction.this.type

Definition Classes
HasInputAnnotationCols
def setInputCols(value: Array[String]): ChunkKeyPhraseExtraction.this.type

Definition Classes
HasInputAnnotationCols
def setIsLong(value: Boolean): ChunkKeyPhraseExtraction.this.type

Definition Classes
BertSentenceEmbeddings
def setLazyAnnotator(value: Boolean): ChunkKeyPhraseExtraction.this.type

Definition Classes
CanBeLazy
def setMaxSentenceLength(value: Int): ChunkKeyPhraseExtraction.this.type

Definition Classes
BertSentenceEmbeddings
def setModelIfNotSet(spark: SparkSession, tensorflowWrapper: Option[TensorflowWrapper], onnxWrapper: Option[OnnxWrapper], openvinoWrapper: Option[OpenvinoWrapper]): ChunkKeyPhraseExtraction.this.type

Definition Classes
BertSentenceEmbeddings
final def setOutputCol(value: String): ChunkKeyPhraseExtraction.this.type

Definition Classes
HasOutputAnnotationCol
def setParent(parent: Estimator[BertSentenceEmbeddings]): BertSentenceEmbeddings

Definition Classes
Model
def setSelectMostDifferent(value: Boolean): ChunkKeyPhraseExtraction.this.type
Let the model return the top N key phrases which are the most different from each other
def setSignatures(value: Map[String, String]): ChunkKeyPhraseExtraction.this.type

Definition Classes
BertSentenceEmbeddings
def setStorageRef(value: String): ChunkKeyPhraseExtraction.this.type

Definition Classes
HasStorageRef
def setTopN(value: Int): ChunkKeyPhraseExtraction.this.type
Set the number of key phrases to extract
def setVocabulary(value: Map[String, Int]): ChunkKeyPhraseExtraction.this.type

Definition Classes
BertSentenceEmbeddings
val signatures: MapFeature[String, String]

Definition Classes
BertSentenceEmbeddings
val storageRef: Param[String]

Definition Classes
HasStorageRef
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
Identifiable → AnyRef → Any
def tokenize(sentences: Seq[Sentence]): Seq[WordpieceTokenizedSentence]

Definition Classes
BertSentenceEmbeddings
val topN: IntParam
Number of key phrases to extract, ordered by their score
final def transform(dataset: Dataset[_]): DataFrame

Definition Classes
AnnotatorModel → Transformer
def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame

Definition Classes
Transformer
Annotations
@Since( "2.0.0" )
def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame

Definition Classes
Transformer
Annotations
@Since( "2.0.0" ) @varargs()
final def transformSchema(schema: StructType): StructType

Definition Classes
RawAnnotator → PipelineStage
def transformSchema(schema: StructType, logging: Boolean): StructType

Attributes
protected
Definition Classes
PipelineStage
Annotations
@DeveloperApi()
val uid: String

Definition Classes
ChunkKeyPhraseExtraction → BertSentenceEmbeddings → Identifiable
def validate(schema: StructType): Boolean

Attributes
protected
Definition Classes
RawAnnotator
def validateStorageRef(dataset: Dataset[_], inputCols: Array[String], annotatorType: String): Unit

Definition Classes
HasStorageRef
val vocabulary: MapFeature[String, Int]

Definition Classes
BertSentenceEmbeddings
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
def wrapColumnMetadata(col: Column): Column

Attributes
protected
Definition Classes
RawAnnotator
def wrapEmbeddingsMetadata(col: Column, embeddingsDim: Int, embeddingsRef: Option[String]): Column

Attributes
protected
Definition Classes
HasEmbeddingsProperties
def wrapSentenceEmbeddingsMetadata(col: Column, embeddingsDim: Int, embeddingsRef: Option[String]): Column

Attributes
protected
Definition Classes
HasEmbeddingsProperties
def write: MLWriter

Definition Classes
ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable
def writeOnnxModel(path: String, spark: SparkSession, onnxWrapper: OnnxWrapper, suffix: String, fileName: String): Unit

Definition Classes
WriteOnnxModel
def writeOnnxModels(path: String, spark: SparkSession, onnxWrappersWithNames: Seq[(OnnxWrapper, String)], suffix: String): Unit

Definition Classes
WriteOnnxModel
def writeOpenvinoModel(path: String, spark: SparkSession, openvinoWrapper: OpenvinoWrapper, suffix: String, fileName: String): Unit

Definition Classes
WriteOpenvinoModel
def writeOpenvinoModels(path: String, spark: SparkSession, ovWrappersWithNames: Seq[(OpenvinoWrapper, String)], suffix: String): Unit

Definition Classes
WriteOpenvinoModel
def writeTensorflowHub(path: String, tfPath: String, spark: SparkSession, suffix: String): Unit

Definition Classes
WriteTensorflowModel
def writeTensorflowModel(path: String, spark: SparkSession, tensorflow: TensorflowWrapper, suffix: String, filename: String, configProtoBytes: Option[Array[Byte]]): Unit

Definition Classes
WriteTensorflowModel
def writeTensorflowModelV2(path: String, spark: SparkSession, tensorflow: TensorflowWrapper, suffix: String, filename: String, configProtoBytes: Option[Array[Byte]], savedSignatures: Option[Map[String, String]]): Unit

Definition Classes
WriteTensorflowModel

Packages

ChunkKeyPhraseExtraction 

Companion object ChunkKeyPhraseExtraction

class ChunkKeyPhraseExtraction extends BertSentenceEmbeddings with CheckLicense

Example

Instance Constructors

Type Members

Value Members

Inherited from CheckLicense

Inherited from BertSentenceEmbeddings

Inherited from HasEngine

Inherited from HasCaseSensitiveProperties

Inherited from HasStorageRef

Inherited from HasEmbeddingsProperties

Inherited from HasProtectedParams

Inherited from WriteOnnxModel

Inherited from WriteOpenvinoModel

Inherited from WriteTensorflowModel

Inherited from HasBatchedAnnotate[BertSentenceEmbeddings]

Inherited from AnnotatorModel[BertSentenceEmbeddings]

Inherited from CanBeLazy

Inherited from RawAnnotator[BertSentenceEmbeddings]

Inherited from HasOutputAnnotationCol

Inherited from HasInputAnnotationCols

Inherited from HasOutputAnnotatorType

Inherited from ParamsAndFeaturesWritable

Inherited from HasFeatures

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from Model[BertSentenceEmbeddings]

Inherited from Transformer

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

anno

getParam

param

setParam

Ungrouped

ChunkKeyPhraseExtraction