com.johnsnowlabs.nlp.annotators.chunker
ChunkKeyPhraseExtraction
Companion object ChunkKeyPhraseExtraction
class ChunkKeyPhraseExtraction extends BertSentenceEmbeddings with CheckLicense
Extracts key phrases from texts.
ChunkKeyPhraseExtraction
uses BertSentenceEmbeddings
to determine the most
relevant key phrases describing a text with the use of two approaches:
- By using cosine similarities between the embedding representation of the chunks and the embedding representation of the corresponding sentences/documents.
- By using the Maximal Marginal Relevance (MMR) algorithm (set with the
setDivergence
method) to determine the most relevant key phrases. If theselectMostDifferent
parameter is set, return the key phrases that are the most different from each other (avoid too similar key phrases). The model compares the chunks against the corresponding sentences/documents and selects the chunks which are most representative of the broader text context (i.e., the document or the sentence they belong to). This allows, for example, to obtain a brief understanding of a document by selecting the most relevant phrases. The input to the model consists of chunk annotations and sentence or document annotation. The input chunks can be generated in various ways: - Using
NGramGenerator
, which allows to obtain ranked n-gram chunks from the text (can be used to identify new entities). - Using
YakeKeywordExtractor
, allowing to rank the keywords extracted using the YAKE algorithm. - Using
TextMatcher
, which allows to rank the desired chunks from the annotator. - Using
NerConverter
, which allows to extract ranked named entities (which entities are the most relevant in the sentence/document). The model operates either at sentence (selecting the most descriptive chunks from the sentence they belong to) or at document level. In the latter case, the key phrases are selected to represent all the input document annotations.
This model is a subclass of BertSentenceEmbeddings and shares all parameters with it. It can load any pretrained BertSentenceEmbeddings model. Available models can be found at Models Hub.
val embeddings = ChunkKeyPhraseExtraction.pretrained() .setInputCols("sentence", "chunk") .setOutputCol("key_phrase_chunks")
The default model is "sbert_jsl_medium_uncased"
, if no name is provided.
Sources :
The use of MMR, diversity-based reranking for reordering documents and producing summaries
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base.DocumentAssembler import com.johnsnowlabs.nlp.annotator.SentenceDetector import com.johnsnowlabs.nlp.embeddings.BertSentenceEmbeddings import com.johnsnowlabs.nlp.EmbeddingsFinisher import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("tokens") val stopWordsCleaner = StopWordsCleaner.pretrained() .setInputCols("tokens") .setOutputCol("clean_tokens") .setCaseSensitive(false) val nGrams = new NGramGenerator() .setInputCols(Array("clean_tokens")) .setOutputCol("ngrams") .setN(3) val chunkKeyPhraseExtractor = ChunkKeyPhraseExtraction .pretrained() .setTopN(2) .setDivergence(0.7f) .setInputCols(Array("document", "ngrams")) .setOutputCol("key_phrases") val pipeline = new Pipeline() .setStages(Array( documentAssembler, tokenizer, stopWordsCleaner, nGrams, chunkKeyPhraseExtractor)) val sampleText = "Her Diabetes has become type 2 in the last year with her Diabetes." + " He complains of swelling in his right forearm." val testDataset = Seq("").toDS.toDF("text") val result = pipeline.fit(emptyDataset).transform(testDataset) result .selectExpr("explode(key_phrases) AS key_phrase") .selectExpr( "key_phrase.result", "key_phrase.metadata.DocumentSimilarity", "key_phrase.metadata.MMRScore") .show(truncate=false) +--------------------------+-------------------+------------------+ |result |DocumentSimilarity |MMRScore | +--------------------------+-------------------+------------------+ |complains swelling forearm|0.6325718954229369 |0.1897715761677257| |type 2 year |0.40181028931546364|-0.189501077108947| +--------------------------+-------------------+------------------+
- See also
BertEmbeddings for token-level embeddings
BertSentenceEmbeddings for sentence-level embeddings
Annotators Main Page for a list of transformer based embeddings
- Grouped
- Alphabetic
- By Inheritance
- ChunkKeyPhraseExtraction
- CheckLicense
- BertSentenceEmbeddings
- HasEngine
- HasCaseSensitiveProperties
- HasStorageRef
- HasEmbeddingsProperties
- HasProtectedParams
- WriteOnnxModel
- WriteOpenvinoModel
- WriteTensorflowModel
- HasBatchedAnnotate
- AnnotatorModel
- CanBeLazy
- RawAnnotator
- HasOutputAnnotationCol
- HasInputAnnotationCols
- HasOutputAnnotatorType
- ParamsAndFeaturesWritable
- HasFeatures
- DefaultParamsWritable
- MLWritable
- Model
- Transformer
- PipelineStage
- Logging
- Params
- Serializable
- Serializable
- Identifiable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
$[T](param: Param[T]): T
- Attributes
- protected
- Definition Classes
- Params
-
def
$$[T](feature: StructFeature[T]): T
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
$$[K, V](feature: MapFeature[K, V]): Map[K, V]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
$$[T](feature: SetFeature[T]): Set[T]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
$$[T](feature: ArrayFeature[T]): Array[T]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
_transform(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): DataFrame
- Attributes
- protected
- Definition Classes
- AnnotatorModel
-
def
afterAnnotate(dataset: DataFrame): DataFrame
- Attributes
- protected
- Definition Classes
- BertSentenceEmbeddings → AnnotatorModel
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
batchAnnotate(batchedAnnotations: Seq[Array[Annotation]]): Seq[Seq[Annotation]]
- Definition Classes
- ChunkKeyPhraseExtraction → BertSentenceEmbeddings → HasBatchedAnnotate
-
def
batchProcess(rows: Iterator[_]): Iterator[Row]
- Definition Classes
- HasBatchedAnnotate
-
val
batchSize: IntParam
- Definition Classes
- HasBatchedAnnotate
-
def
beforeAnnotate(dataset: Dataset[_]): Dataset[_]
- Attributes
- protected
- Definition Classes
- AnnotatorModel
-
val
caseSensitive: BooleanParam
- Definition Classes
- HasCaseSensitiveProperties
-
final
def
checkSchema(schema: StructType, inputAnnotatorType: String): Boolean
- Attributes
- protected
- Definition Classes
- HasInputAnnotationCols
-
def
checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit
- Definition Classes
- CheckLicense
-
def
checkValidScope(scope: String): Unit
- Definition Classes
- CheckLicense
-
def
checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit
- Definition Classes
- CheckLicense
-
def
checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit
- Definition Classes
- CheckLicense
-
final
def
clear(param: Param[_]): ChunkKeyPhraseExtraction.this.type
- Definition Classes
- Params
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
val
concatenateSentences: BooleanParam
A flag indicating whether to concatenate all input document.sentence annotations before computing their embedding.
A flag indicating whether to concatenate all input document.sentence annotations before computing their embedding. This parameter is only used if documentLevelProcessing is true. If concatenateSentences is set to true, the model will concatenate the document/sentence input annotations and compute a single embedding. If it is false, the model will compute the embedding of each sentence separately and then average the resulting embedding vectors.
-
val
configProtoBytes: IntArrayParam
- Definition Classes
- BertSentenceEmbeddings
-
def
copy(extra: ParamMap): BertSentenceEmbeddings
- Definition Classes
- RawAnnotator → Model → Transformer → PipelineStage → Params
-
def
copyValues[T <: Params](to: T, extra: ParamMap): T
- Attributes
- protected
- Definition Classes
- Params
-
def
createDatabaseConnection(database: Name): RocksDBConnection
- Definition Classes
- HasStorageRef
-
final
def
defaultCopy[T <: Params](extra: ParamMap): T
- Attributes
- protected
- Definition Classes
- Params
-
val
dimension: ProtectedParam[Int]
- Definition Classes
- HasEmbeddingsProperties
-
val
divergence: FloatParam
The divergence value determines how different from each the extracted key phrases are.
The divergence value determines how different from each the extracted key phrases are. The possible values are within the interval [0, 1]. The higher the value is, the more divergence is enforced. A value of 0 means the key phrases are not compared to each other (no divergence is ensured) and their relevance is determined solely by their similarity to the document. This parameter should not be used if setSelectMostDifferent is true - the two parameters aim to achieve the same goal in different ways. The default value is 0, meaning that the there is no constraint on the order of the extracted key phrases. The divergence is calculated using the Maximal Marginal Relevance measure.
-
val
documentLevelProcessing: BooleanParam
A flag indicating whether to extract key phrases from the document level, i.e.
A flag indicating whether to extract key phrases from the document level, i.e. from all the sentences available at a given row, rather than from the particular sentences the chunks refer to.
-
val
dropPunctuation: BooleanParam
This parameter indicates whether to remove punctuation marks from the input chunks.
This parameter indicates whether to remove punctuation marks from the input chunks. Chunks coming from NER models are not affected.
-
val
engine: Param[String]
- Definition Classes
- HasEngine
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
explainParam(param: Param[_]): String
- Definition Classes
- Params
-
def
explainParams(): String
- Definition Classes
- Params
-
def
extraValidate(structType: StructType): Boolean
- Attributes
- protected
- Definition Classes
- RawAnnotator
-
def
extraValidateMsg: String
- Attributes
- protected
- Definition Classes
- RawAnnotator
-
final
def
extractParamMap(): ParamMap
- Definition Classes
- Params
-
final
def
extractParamMap(extra: ParamMap): ParamMap
- Definition Classes
- Params
-
val
features: ArrayBuffer[Feature[_, _, _]]
- Definition Classes
- HasFeatures
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
get[T](feature: StructFeature[T]): Option[T]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
get[T](feature: SetFeature[T]): Option[Set[T]]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
get[T](feature: ArrayFeature[T]): Option[Array[T]]
- Attributes
- protected
- Definition Classes
- HasFeatures
-
final
def
get[T](param: Param[T]): Option[T]
- Definition Classes
- Params
-
def
getBatchSize: Int
- Definition Classes
- HasBatchedAnnotate
-
def
getCaseSensitive: Boolean
- Definition Classes
- HasCaseSensitiveProperties
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getCombinationOfMostDifferentVectors(vectors: Seq[Array[Float]]): List[Int]
Selects a combination of vectors such that the sum of their pair-wise cosines is minimized, i.e.
Selects a combination of vectors such that the sum of their pair-wise cosines is minimized, i.e. the vectors as much different from each as possible.
- vectors
a set of float vectors
- returns
a list of vector indices
- Attributes
- protected
-
def
getConcatenateSentences: Boolean
Check whether the input sentences.documents are concatenated before their embedding is computed
-
def
getConfigProtoBytes: Option[Array[Byte]]
- Definition Classes
- BertSentenceEmbeddings
-
final
def
getDefault[T](param: Param[T]): Option[T]
- Definition Classes
- Params
-
def
getDimension: Int
- Definition Classes
- HasEmbeddingsProperties
-
def
getDivergence: Float
Get the level of divergence of the extracted key phrases.
-
def
getDocumentLevelProcessing: Boolean
Check whether the key phrases are extracted at the document or the sentence level
-
def
getDropPunctuation: Boolean
Check whether the punctuation marks are removed from input chunks.
-
def
getEngine: String
- Definition Classes
- HasEngine
-
def
getInputCols: Array[String]
- Definition Classes
- HasInputAnnotationCols
-
def
getIsLong: Boolean
- Definition Classes
- BertSentenceEmbeddings
-
def
getLazyAnnotator: Boolean
- Definition Classes
- CanBeLazy
-
def
getMaxSentenceLength: Int
- Definition Classes
- BertSentenceEmbeddings
-
def
getModelIfNotSet: Bert
- Definition Classes
- BertSentenceEmbeddings
-
final
def
getOrDefault[T](param: Param[T]): T
- Definition Classes
- Params
-
final
def
getOutputCol: String
- Definition Classes
- HasOutputAnnotationCol
-
def
getParam(paramName: String): Param[Any]
- Definition Classes
- Params
-
def
getSelectMostDifferent: Boolean
Check whether the mode returns the top N key phrases which are most different from each other
-
def
getSignatures: Option[Map[String, String]]
- Definition Classes
- BertSentenceEmbeddings
-
def
getStorageRef: String
- Definition Classes
- HasStorageRef
-
def
getTopN: Int
Get the number of key phrases extracted
-
final
def
hasDefault[T](param: Param[T]): Boolean
- Definition Classes
- Params
-
def
hasParam(paramName: String): Boolean
- Definition Classes
- Params
-
def
hasParent: Boolean
- Definition Classes
- Model
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
val
inputAnnotatorTypes: Array[AnnotatorType]
Input annotator types: DOCUMENT,CHUNK
Input annotator types: DOCUMENT,CHUNK
- Definition Classes
- ChunkKeyPhraseExtraction → BertSentenceEmbeddings → HasInputAnnotationCols
-
final
val
inputCols: StringArrayParam
- Attributes
- protected
- Definition Classes
- HasInputAnnotationCols
-
final
def
isDefined(param: Param[_]): Boolean
- Definition Classes
- Params
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
val
isLong: ProtectedParam[Boolean]
- Definition Classes
- BertSentenceEmbeddings
-
final
def
isSet(param: Param[_]): Boolean
- Definition Classes
- Params
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
val
lazyAnnotator: BooleanParam
- Definition Classes
- CanBeLazy
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
val
maxSentenceLength: IntParam
- Definition Classes
- BertSentenceEmbeddings
-
def
msgHelper(schema: StructType): String
- Attributes
- protected
- Definition Classes
- HasInputAnnotationCols
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
onWrite(path: String, spark: SparkSession): Unit
- Definition Classes
- BertSentenceEmbeddings → ParamsAndFeaturesWritable
-
val
optionalInputAnnotatorTypes: Array[String]
- Definition Classes
- HasInputAnnotationCols
-
def
orderKeyPhrasesByMMR(document: Array[Float], keyPhrases: Seq[Array[Float]], numResults: Int): Seq[(Int, Double, Double)]
Takes a document embedding and a sequence of chunk embeddings and selects a number of chunks with the highest MMR scores
Takes a document embedding and a sequence of chunk embeddings and selects a number of chunks with the highest MMR scores
- document
document/sentence embedding
- keyPhrases
chunk embeddings
- numResults
number of chunk indices to return
- returns
a sequence of tuples (chunk index, document similarity, MMR score)
- Attributes
- protected
-
val
outputAnnotatorType: AnnotatorType
Output annotator types: CHUNK
Output annotator types: CHUNK
- Definition Classes
- ChunkKeyPhraseExtraction → BertSentenceEmbeddings → HasOutputAnnotatorType
-
final
val
outputCol: Param[String]
- Attributes
- protected
- Definition Classes
- HasOutputAnnotationCol
-
lazy val
params: Array[Param[_]]
- Definition Classes
- Params
-
var
parent: Estimator[BertSentenceEmbeddings]
- Definition Classes
- Model
-
def
save(path: String): Unit
- Definition Classes
- MLWritable
- Annotations
- @Since( "1.6.0" ) @throws( ... )
-
val
selectMostDifferent: BooleanParam
Pre-select
topN * 2
key phrases and out of those select thetopN
that are the most different from each other.Pre-select
topN * 2
key phrases and out of those select thetopN
that are the most different from each other. This parameter should not be used in conjunction withdivergence
as they aim to achieve the same goal, but in different ways. -
def
sentenceEndTokenId: Int
- Definition Classes
- BertSentenceEmbeddings
-
def
sentenceStartTokenId: Int
- Definition Classes
- BertSentenceEmbeddings
-
def
set[T](param: ProtectedParam[T], value: T): ChunkKeyPhraseExtraction.this.type
- Definition Classes
- HasProtectedParams
-
def
set[T](feature: StructFeature[T], value: T): ChunkKeyPhraseExtraction.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
set[K, V](feature: MapFeature[K, V], value: Map[K, V]): ChunkKeyPhraseExtraction.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
set[T](feature: SetFeature[T], value: Set[T]): ChunkKeyPhraseExtraction.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
set[T](feature: ArrayFeature[T], value: Array[T]): ChunkKeyPhraseExtraction.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
final
def
set(paramPair: ParamPair[_]): ChunkKeyPhraseExtraction.this.type
- Attributes
- protected
- Definition Classes
- Params
-
final
def
set(param: String, value: Any): ChunkKeyPhraseExtraction.this.type
- Attributes
- protected
- Definition Classes
- Params
-
final
def
set[T](param: Param[T], value: T): ChunkKeyPhraseExtraction.this.type
- Definition Classes
- Params
-
def
setBatchSize(size: Int): ChunkKeyPhraseExtraction.this.type
- Definition Classes
- HasBatchedAnnotate
-
def
setCaseSensitive(value: Boolean): ChunkKeyPhraseExtraction.this.type
- Definition Classes
- BertSentenceEmbeddings → HasCaseSensitiveProperties
-
def
setConcatenateSentences(value: Boolean): ChunkKeyPhraseExtraction.this.type
Concatenate the input sentence/documentation annotations before computing their embedding Default value is 'true'.
-
def
setConfigProtoBytes(bytes: Array[Int]): ChunkKeyPhraseExtraction.this.type
- Definition Classes
- BertSentenceEmbeddings
-
def
setDefault[T](feature: StructFeature[T], value: () ⇒ T): ChunkKeyPhraseExtraction.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): ChunkKeyPhraseExtraction.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): ChunkKeyPhraseExtraction.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
def
setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): ChunkKeyPhraseExtraction.this.type
- Attributes
- protected
- Definition Classes
- HasFeatures
-
final
def
setDefault(paramPairs: ParamPair[_]*): ChunkKeyPhraseExtraction.this.type
- Attributes
- protected
- Definition Classes
- Params
-
final
def
setDefault[T](param: Param[T], value: T): ChunkKeyPhraseExtraction.this.type
- Attributes
- protected[org.apache.spark.ml]
- Definition Classes
- Params
-
def
setDimension(value: Int): ChunkKeyPhraseExtraction.this.type
- Definition Classes
- BertSentenceEmbeddings → HasEmbeddingsProperties
-
def
setDivergence(value: Float): ChunkKeyPhraseExtraction.this.type
Set the level of divergence of the extracted key phrases.
Set the level of divergence of the extracted key phrases. The value should be in the interval [0, 1].
-
def
setDocumentLevelProcessing(value: Boolean): ChunkKeyPhraseExtraction.this.type
Extract key phrases from the whole document (true) or from particular sentences which the chunks refer to (false) Default value is 'true'.
-
def
setDropPunctuation(value: Boolean): ChunkKeyPhraseExtraction.this.type
Remove punctuation marks from input chunks.
Remove punctuation marks from input chunks. Default value is 'true'.
-
final
def
setInputCols(value: String*): ChunkKeyPhraseExtraction.this.type
- Definition Classes
- HasInputAnnotationCols
-
def
setInputCols(value: Array[String]): ChunkKeyPhraseExtraction.this.type
- Definition Classes
- HasInputAnnotationCols
-
def
setIsLong(value: Boolean): ChunkKeyPhraseExtraction.this.type
- Definition Classes
- BertSentenceEmbeddings
-
def
setLazyAnnotator(value: Boolean): ChunkKeyPhraseExtraction.this.type
- Definition Classes
- CanBeLazy
-
def
setMaxSentenceLength(value: Int): ChunkKeyPhraseExtraction.this.type
- Definition Classes
- BertSentenceEmbeddings
-
def
setModelIfNotSet(spark: SparkSession, tensorflowWrapper: Option[TensorflowWrapper], onnxWrapper: Option[OnnxWrapper], openvinoWrapper: Option[OpenvinoWrapper]): ChunkKeyPhraseExtraction.this.type
- Definition Classes
- BertSentenceEmbeddings
-
final
def
setOutputCol(value: String): ChunkKeyPhraseExtraction.this.type
- Definition Classes
- HasOutputAnnotationCol
-
def
setParent(parent: Estimator[BertSentenceEmbeddings]): BertSentenceEmbeddings
- Definition Classes
- Model
-
def
setSelectMostDifferent(value: Boolean): ChunkKeyPhraseExtraction.this.type
Let the model return the top N key phrases which are the most different from each other
-
def
setSignatures(value: Map[String, String]): ChunkKeyPhraseExtraction.this.type
- Definition Classes
- BertSentenceEmbeddings
-
def
setStorageRef(value: String): ChunkKeyPhraseExtraction.this.type
- Definition Classes
- HasStorageRef
-
def
setTopN(value: Int): ChunkKeyPhraseExtraction.this.type
Set the number of key phrases to extract
-
def
setVocabulary(value: Map[String, Int]): ChunkKeyPhraseExtraction.this.type
- Definition Classes
- BertSentenceEmbeddings
-
val
signatures: MapFeature[String, String]
- Definition Classes
- BertSentenceEmbeddings
-
val
storageRef: Param[String]
- Definition Classes
- HasStorageRef
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- Identifiable → AnyRef → Any
-
def
tokenize(sentences: Seq[Sentence]): Seq[WordpieceTokenizedSentence]
- Definition Classes
- BertSentenceEmbeddings
-
val
topN: IntParam
Number of key phrases to extract, ordered by their score
-
final
def
transform(dataset: Dataset[_]): DataFrame
- Definition Classes
- AnnotatorModel → Transformer
-
def
transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
- Definition Classes
- Transformer
- Annotations
- @Since( "2.0.0" )
-
def
transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
- Definition Classes
- Transformer
- Annotations
- @Since( "2.0.0" ) @varargs()
-
final
def
transformSchema(schema: StructType): StructType
- Definition Classes
- RawAnnotator → PipelineStage
-
def
transformSchema(schema: StructType, logging: Boolean): StructType
- Attributes
- protected
- Definition Classes
- PipelineStage
- Annotations
- @DeveloperApi()
-
val
uid: String
- Definition Classes
- ChunkKeyPhraseExtraction → BertSentenceEmbeddings → Identifiable
-
def
validate(schema: StructType): Boolean
- Attributes
- protected
- Definition Classes
- RawAnnotator
-
def
validateStorageRef(dataset: Dataset[_], inputCols: Array[String], annotatorType: String): Unit
- Definition Classes
- HasStorageRef
-
val
vocabulary: MapFeature[String, Int]
- Definition Classes
- BertSentenceEmbeddings
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
wrapColumnMetadata(col: Column): Column
- Attributes
- protected
- Definition Classes
- RawAnnotator
-
def
wrapEmbeddingsMetadata(col: Column, embeddingsDim: Int, embeddingsRef: Option[String]): Column
- Attributes
- protected
- Definition Classes
- HasEmbeddingsProperties
-
def
wrapSentenceEmbeddingsMetadata(col: Column, embeddingsDim: Int, embeddingsRef: Option[String]): Column
- Attributes
- protected
- Definition Classes
- HasEmbeddingsProperties
-
def
write: MLWriter
- Definition Classes
- ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable
-
def
writeOnnxModel(path: String, spark: SparkSession, onnxWrapper: OnnxWrapper, suffix: String, fileName: String): Unit
- Definition Classes
- WriteOnnxModel
-
def
writeOnnxModels(path: String, spark: SparkSession, onnxWrappersWithNames: Seq[(OnnxWrapper, String)], suffix: String): Unit
- Definition Classes
- WriteOnnxModel
-
def
writeOpenvinoModel(path: String, spark: SparkSession, openvinoWrapper: OpenvinoWrapper, suffix: String, fileName: String): Unit
- Definition Classes
- WriteOpenvinoModel
-
def
writeOpenvinoModels(path: String, spark: SparkSession, ovWrappersWithNames: Seq[(OpenvinoWrapper, String)], suffix: String): Unit
- Definition Classes
- WriteOpenvinoModel
-
def
writeTensorflowHub(path: String, tfPath: String, spark: SparkSession, suffix: String): Unit
- Definition Classes
- WriteTensorflowModel
-
def
writeTensorflowModel(path: String, spark: SparkSession, tensorflow: TensorflowWrapper, suffix: String, filename: String, configProtoBytes: Option[Array[Byte]]): Unit
- Definition Classes
- WriteTensorflowModel
-
def
writeTensorflowModelV2(path: String, spark: SparkSession, tensorflow: TensorflowWrapper, suffix: String, filename: String, configProtoBytes: Option[Array[Byte]], savedSignatures: Option[Map[String, String]]): Unit
- Definition Classes
- WriteTensorflowModel