com.johnsnowlabs.nlp.annotators.chunker
ChunkKeyPhraseExtraction 
            Companion object ChunkKeyPhraseExtraction
          
      class ChunkKeyPhraseExtraction extends BertSentenceEmbeddings with CheckLicense
Extracts key phrases from texts.
ChunkKeyPhraseExtraction uses BertSentenceEmbeddings to determine the most
  relevant key phrases describing a text with the use of two approaches:
- By using cosine similarities between the embedding representation of the chunks and the embedding representation of the corresponding sentences/documents.
- By using the Maximal Marginal Relevance (MMR) algorithm (set with the
  setDivergencemethod) to determine the most relevant key phrases. If theselectMostDifferentparameter is set, return the key phrases that are the most different from each other (avoid too similar key phrases). The model compares the chunks against the corresponding sentences/documents and selects the chunks which are most representative of the broader text context (i.e., the document or the sentence they belong to). This allows, for example, to obtain a brief understanding of a document by selecting the most relevant phrases. The input to the model consists of chunk annotations and sentence or document annotation. The input chunks can be generated in various ways:
- Using NGramGenerator, which allows to obtain ranked n-gram chunks from the text (can be used to identify new entities).
- Using YakeKeywordExtractor, allowing to rank the keywords extracted using the YAKE algorithm.
- Using TextMatcher, which allows to rank the desired chunks from the annotator.
- Using NerConverter, which allows to extract ranked named entities (which entities are the most relevant in the sentence/document). The model operates either at sentence (selecting the most descriptive chunks from the sentence they belong to) or at document level. In the latter case, the key phrases are selected to represent all the input document annotations.
This model is a subclass of BertSentenceEmbeddings and shares all parameters with it. It can load any pretrained BertSentenceEmbeddings model. Available models can be found at Models Hub.
val embeddings = ChunkKeyPhraseExtraction.pretrained() .setInputCols("sentence", "chunk") .setOutputCol("key_phrase_chunks")
The default model is "sbert_jsl_medium_uncased", if no name is provided.
Sources :
The use of MMR, diversity-based reranking for reordering documents and producing summaries
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base.DocumentAssembler import com.johnsnowlabs.nlp.annotator.SentenceDetector import com.johnsnowlabs.nlp.embeddings.BertSentenceEmbeddings import com.johnsnowlabs.nlp.EmbeddingsFinisher import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("tokens") val stopWordsCleaner = StopWordsCleaner.pretrained() .setInputCols("tokens") .setOutputCol("clean_tokens") .setCaseSensitive(false) val nGrams = new NGramGenerator() .setInputCols(Array("clean_tokens")) .setOutputCol("ngrams") .setN(3) val chunkKeyPhraseExtractor = ChunkKeyPhraseExtraction .pretrained() .setTopN(2) .setDivergence(0.7f) .setInputCols(Array("document", "ngrams")) .setOutputCol("key_phrases") val pipeline = new Pipeline() .setStages(Array( documentAssembler, tokenizer, stopWordsCleaner, nGrams, chunkKeyPhraseExtractor)) val sampleText = "Her Diabetes has become type 2 in the last year with her Diabetes." + " He complains of swelling in his right forearm." val testDataset = Seq("").toDS.toDF("text") val result = pipeline.fit(emptyDataset).transform(testDataset) result .selectExpr("explode(key_phrases) AS key_phrase") .selectExpr( "key_phrase.result", "key_phrase.metadata.DocumentSimilarity", "key_phrase.metadata.MMRScore") .show(truncate=false) +--------------------------+-------------------+------------------+ |result |DocumentSimilarity |MMRScore | +--------------------------+-------------------+------------------+ |complains swelling forearm|0.6325718954229369 |0.1897715761677257| |type 2 year |0.40181028931546364|-0.189501077108947| +--------------------------+-------------------+------------------+
- See also
- BertEmbeddings for token-level embeddings - BertSentenceEmbeddings for sentence-level embeddings - Annotators Main Page for a list of transformer based embeddings 
- Grouped
- Alphabetic
- By Inheritance
- ChunkKeyPhraseExtraction
- CheckLicense
- BertSentenceEmbeddings
- HasEngine
- HasCaseSensitiveProperties
- HasStorageRef
- HasEmbeddingsProperties
- HasProtectedParams
- WriteOnnxModel
- WriteOpenvinoModel
- WriteTensorflowModel
- HasBatchedAnnotate
- AnnotatorModel
- CanBeLazy
- RawAnnotator
- HasOutputAnnotationCol
- HasInputAnnotationCols
- HasOutputAnnotatorType
- ParamsAndFeaturesWritable
- HasFeatures
- DefaultParamsWritable
- MLWritable
- Model
- Transformer
- PipelineStage
- Logging
- Params
- Serializable
- Serializable
- Identifiable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
Value Members
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        !=(arg0: Any): Boolean
      
      
      - Definition Classes
- AnyRef → Any
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        ##(): Int
      
      
      - Definition Classes
- AnyRef → Any
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        $[T](param: Param[T]): T
      
      
      - Attributes
- protected
- Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        $$[T](feature: StructFeature[T]): T
      
      
      - Attributes
- protected
- Definition Classes
- HasFeatures
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        $$[K, V](feature: MapFeature[K, V]): Map[K, V]
      
      
      - Attributes
- protected
- Definition Classes
- HasFeatures
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        $$[T](feature: SetFeature[T]): Set[T]
      
      
      - Attributes
- protected
- Definition Classes
- HasFeatures
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        $$[T](feature: ArrayFeature[T]): Array[T]
      
      
      - Attributes
- protected
- Definition Classes
- HasFeatures
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        ==(arg0: Any): Boolean
      
      
      - Definition Classes
- AnyRef → Any
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        _transform(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): DataFrame
      
      
      - Attributes
- protected
- Definition Classes
- AnnotatorModel
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        afterAnnotate(dataset: DataFrame): DataFrame
      
      
      - Attributes
- protected
- Definition Classes
- BertSentenceEmbeddings → AnnotatorModel
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        asInstanceOf[T0]: T0
      
      
      - Definition Classes
- Any
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        batchAnnotate(batchedAnnotations: Seq[Array[Annotation]]): Seq[Seq[Annotation]]
      
      
      - Definition Classes
- ChunkKeyPhraseExtraction → BertSentenceEmbeddings → HasBatchedAnnotate
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        batchProcess(rows: Iterator[_]): Iterator[Row]
      
      
      - Definition Classes
- HasBatchedAnnotate
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        batchSize: IntParam
      
      
      - Definition Classes
- HasBatchedAnnotate
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        beforeAnnotate(dataset: Dataset[_]): Dataset[_]
      
      
      - Attributes
- protected
- Definition Classes
- AnnotatorModel
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        caseSensitive: BooleanParam
      
      
      - Definition Classes
- HasCaseSensitiveProperties
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        checkSchema(schema: StructType, inputAnnotatorType: String): Boolean
      
      
      - Attributes
- protected
- Definition Classes
- HasInputAnnotationCols
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit
      
      
      - Definition Classes
- CheckLicense
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        checkValidScope(scope: String): Unit
      
      
      - Definition Classes
- CheckLicense
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit
      
      
      - Definition Classes
- CheckLicense
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit
      
      
      - Definition Classes
- CheckLicense
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        clear(param: Param[_]): ChunkKeyPhraseExtraction.this.type
      
      
      - Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        clone(): AnyRef
      
      
      - Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        concatenateSentences: BooleanParam
      
      
      A flag indicating whether to concatenate all input document.sentence annotations before computing their embedding. A flag indicating whether to concatenate all input document.sentence annotations before computing their embedding. This parameter is only used if documentLevelProcessing is true. If concatenateSentences is set to true, the model will concatenate the document/sentence input annotations and compute a single embedding. If it is false, the model will compute the embedding of each sentence separately and then average the resulting embedding vectors. 
- 
      
      
      
        
      
    
      
        
        val
      
      
        configProtoBytes: IntArrayParam
      
      
      - Definition Classes
- BertSentenceEmbeddings
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        copy(extra: ParamMap): BertSentenceEmbeddings
      
      
      - Definition Classes
- RawAnnotator → Model → Transformer → PipelineStage → Params
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        copyValues[T <: Params](to: T, extra: ParamMap): T
      
      
      - Attributes
- protected
- Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        createDatabaseConnection(database: Name): RocksDBConnection
      
      
      - Definition Classes
- HasStorageRef
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        defaultCopy[T <: Params](extra: ParamMap): T
      
      
      - Attributes
- protected
- Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        dimension: ProtectedParam[Int]
      
      
      - Definition Classes
- HasEmbeddingsProperties
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        divergence: FloatParam
      
      
      The divergence value determines how different from each the extracted key phrases are. The divergence value determines how different from each the extracted key phrases are. The possible values are within the interval [0, 1]. The higher the value is, the more divergence is enforced. A value of 0 means the key phrases are not compared to each other (no divergence is ensured) and their relevance is determined solely by their similarity to the document. This parameter should not be used if setSelectMostDifferent is true - the two parameters aim to achieve the same goal in different ways. The default value is 0, meaning that the there is no constraint on the order of the extracted key phrases. The divergence is calculated using the Maximal Marginal Relevance measure. 
- 
      
      
      
        
      
    
      
        
        val
      
      
        documentLevelProcessing: BooleanParam
      
      
      A flag indicating whether to extract key phrases from the document level, i.e. A flag indicating whether to extract key phrases from the document level, i.e. from all the sentences available at a given row, rather than from the particular sentences the chunks refer to. 
- 
      
      
      
        
      
    
      
        
        val
      
      
        dropPunctuation: BooleanParam
      
      
      This parameter indicates whether to remove punctuation marks from the input chunks. This parameter indicates whether to remove punctuation marks from the input chunks. Chunks coming from NER models are not affected. 
- 
      
      
      
        
      
    
      
        
        val
      
      
        engine: Param[String]
      
      
      - Definition Classes
- HasEngine
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        eq(arg0: AnyRef): Boolean
      
      
      - Definition Classes
- AnyRef
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        equals(arg0: Any): Boolean
      
      
      - Definition Classes
- AnyRef → Any
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        explainParam(param: Param[_]): String
      
      
      - Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        explainParams(): String
      
      
      - Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        extraValidate(structType: StructType): Boolean
      
      
      - Attributes
- protected
- Definition Classes
- RawAnnotator
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        extraValidateMsg: String
      
      
      - Attributes
- protected
- Definition Classes
- RawAnnotator
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        extractParamMap(): ParamMap
      
      
      - Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        extractParamMap(extra: ParamMap): ParamMap
      
      
      - Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        features: ArrayBuffer[Feature[_, _, _]]
      
      
      - Definition Classes
- HasFeatures
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        finalize(): Unit
      
      
      - Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        get[T](feature: StructFeature[T]): Option[T]
      
      
      - Attributes
- protected
- Definition Classes
- HasFeatures
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        get[K, V](feature: MapFeature[K, V]): Option[Map[K, V]]
      
      
      - Attributes
- protected
- Definition Classes
- HasFeatures
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        get[T](feature: SetFeature[T]): Option[Set[T]]
      
      
      - Attributes
- protected
- Definition Classes
- HasFeatures
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        get[T](feature: ArrayFeature[T]): Option[Array[T]]
      
      
      - Attributes
- protected
- Definition Classes
- HasFeatures
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        get[T](param: Param[T]): Option[T]
      
      
      - Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getBatchSize: Int
      
      
      - Definition Classes
- HasBatchedAnnotate
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getCaseSensitive: Boolean
      
      
      - Definition Classes
- HasCaseSensitiveProperties
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        getClass(): Class[_]
      
      
      - Definition Classes
- AnyRef → Any
- Annotations
- @native()
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getCombinationOfMostDifferentVectors(vectors: Seq[Array[Float]]): List[Int]
      
      
      Selects a combination of vectors such that the sum of their pair-wise cosines is minimized, i.e. Selects a combination of vectors such that the sum of their pair-wise cosines is minimized, i.e. the vectors as much different from each as possible. - vectors
- a set of float vectors 
- returns
- a list of vector indices 
 - Attributes
- protected
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getConcatenateSentences: Boolean
      
      
      Check whether the input sentences.documents are concatenated before their embedding is computed 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getConfigProtoBytes: Option[Array[Byte]]
      
      
      - Definition Classes
- BertSentenceEmbeddings
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        getDefault[T](param: Param[T]): Option[T]
      
      
      - Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getDimension: Int
      
      
      - Definition Classes
- HasEmbeddingsProperties
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getDivergence: Float
      
      
      Get the level of divergence of the extracted key phrases. 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getDocumentLevelProcessing: Boolean
      
      
      Check whether the key phrases are extracted at the document or the sentence level 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getDropPunctuation: Boolean
      
      
      Check whether the punctuation marks are removed from input chunks. 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getEngine: String
      
      
      - Definition Classes
- HasEngine
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getInputCols: Array[String]
      
      
      - Definition Classes
- HasInputAnnotationCols
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getIsLong: Boolean
      
      
      - Definition Classes
- BertSentenceEmbeddings
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getLazyAnnotator: Boolean
      
      
      - Definition Classes
- CanBeLazy
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getMaxSentenceLength: Int
      
      
      - Definition Classes
- BertSentenceEmbeddings
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getModelIfNotSet: Bert
      
      
      - Definition Classes
- BertSentenceEmbeddings
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        getOrDefault[T](param: Param[T]): T
      
      
      - Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        getOutputCol: String
      
      
      - Definition Classes
- HasOutputAnnotationCol
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getParam(paramName: String): Param[Any]
      
      
      - Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getSelectMostDifferent: Boolean
      
      
      Check whether the mode returns the top N key phrases which are most different from each other 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getSignatures: Option[Map[String, String]]
      
      
      - Definition Classes
- BertSentenceEmbeddings
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getStorageRef: String
      
      
      - Definition Classes
- HasStorageRef
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        getTopN: Int
      
      
      Get the number of key phrases extracted 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        hasDefault[T](param: Param[T]): Boolean
      
      
      - Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        hasParam(paramName: String): Boolean
      
      
      - Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        hasParent: Boolean
      
      
      - Definition Classes
- Model
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        hashCode(): Int
      
      
      - Definition Classes
- AnyRef → Any
- Annotations
- @native()
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        initializeLogIfNecessary(isInterpreter: Boolean): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        inputAnnotatorTypes: Array[AnnotatorType]
      
      
      Input annotator types: DOCUMENT,CHUNK Input annotator types: DOCUMENT,CHUNK - Definition Classes
- ChunkKeyPhraseExtraction → BertSentenceEmbeddings → HasInputAnnotationCols
 
- 
      
      
      
        
      
    
      
        final 
        val
      
      
        inputCols: StringArrayParam
      
      
      - Attributes
- protected
- Definition Classes
- HasInputAnnotationCols
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        isDefined(param: Param[_]): Boolean
      
      
      - Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        isInstanceOf[T0]: Boolean
      
      
      - Definition Classes
- Any
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        isLong: ProtectedParam[Boolean]
      
      
      - Definition Classes
- BertSentenceEmbeddings
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        isSet(param: Param[_]): Boolean
      
      
      - Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        isTraceEnabled(): Boolean
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        lazyAnnotator: BooleanParam
      
      
      - Definition Classes
- CanBeLazy
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        log: Logger
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logDebug(msg: ⇒ String, throwable: Throwable): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logDebug(msg: ⇒ String): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logError(msg: ⇒ String, throwable: Throwable): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logError(msg: ⇒ String): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logInfo(msg: ⇒ String, throwable: Throwable): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logInfo(msg: ⇒ String): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logName: String
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logTrace(msg: ⇒ String, throwable: Throwable): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logTrace(msg: ⇒ String): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logWarning(msg: ⇒ String, throwable: Throwable): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        logWarning(msg: ⇒ String): Unit
      
      
      - Attributes
- protected
- Definition Classes
- Logging
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        maxSentenceLength: IntParam
      
      
      - Definition Classes
- BertSentenceEmbeddings
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        msgHelper(schema: StructType): String
      
      
      - Attributes
- protected
- Definition Classes
- HasInputAnnotationCols
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        ne(arg0: AnyRef): Boolean
      
      
      - Definition Classes
- AnyRef
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        notify(): Unit
      
      
      - Definition Classes
- AnyRef
- Annotations
- @native()
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        notifyAll(): Unit
      
      
      - Definition Classes
- AnyRef
- Annotations
- @native()
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        onWrite(path: String, spark: SparkSession): Unit
      
      
      - Definition Classes
- BertSentenceEmbeddings → ParamsAndFeaturesWritable
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        optionalInputAnnotatorTypes: Array[String]
      
      
      - Definition Classes
- HasInputAnnotationCols
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        orderKeyPhrasesByMMR(document: Array[Float], keyPhrases: Seq[Array[Float]], numResults: Int): Seq[(Int, Double, Double)]
      
      
      Takes a document embedding and a sequence of chunk embeddings and selects a number of chunks with the highest MMR scores Takes a document embedding and a sequence of chunk embeddings and selects a number of chunks with the highest MMR scores - document
- document/sentence embedding 
- keyPhrases
- chunk embeddings 
- numResults
- number of chunk indices to return 
- returns
- a sequence of tuples (chunk index, document similarity, MMR score) 
 - Attributes
- protected
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        outputAnnotatorType: AnnotatorType
      
      
      Output annotator types: CHUNK Output annotator types: CHUNK - Definition Classes
- ChunkKeyPhraseExtraction → BertSentenceEmbeddings → HasOutputAnnotatorType
 
- 
      
      
      
        
      
    
      
        final 
        val
      
      
        outputCol: Param[String]
      
      
      - Attributes
- protected
- Definition Classes
- HasOutputAnnotationCol
 
- 
      
      
      
        
      
    
      
        
        lazy val
      
      
        params: Array[Param[_]]
      
      
      - Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        
        var
      
      
        parent: Estimator[BertSentenceEmbeddings]
      
      
      - Definition Classes
- Model
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        save(path: String): Unit
      
      
      - Definition Classes
- MLWritable
- Annotations
- @Since( "1.6.0" ) @throws( ... )
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        selectMostDifferent: BooleanParam
      
      
      Pre-select topN * 2key phrases and out of those select thetopNthat are the most different from each other.Pre-select topN * 2key phrases and out of those select thetopNthat are the most different from each other. This parameter should not be used in conjunction withdivergenceas they aim to achieve the same goal, but in different ways.
- 
      
      
      
        
      
    
      
        
        def
      
      
        sentenceEndTokenId: Int
      
      
      - Definition Classes
- BertSentenceEmbeddings
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        sentenceStartTokenId: Int
      
      
      - Definition Classes
- BertSentenceEmbeddings
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        set[T](param: ProtectedParam[T], value: T): ChunkKeyPhraseExtraction.this.type
      
      
      - Definition Classes
- HasProtectedParams
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        set[T](feature: StructFeature[T], value: T): ChunkKeyPhraseExtraction.this.type
      
      
      - Attributes
- protected
- Definition Classes
- HasFeatures
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        set[K, V](feature: MapFeature[K, V], value: Map[K, V]): ChunkKeyPhraseExtraction.this.type
      
      
      - Attributes
- protected
- Definition Classes
- HasFeatures
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        set[T](feature: SetFeature[T], value: Set[T]): ChunkKeyPhraseExtraction.this.type
      
      
      - Attributes
- protected
- Definition Classes
- HasFeatures
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        set[T](feature: ArrayFeature[T], value: Array[T]): ChunkKeyPhraseExtraction.this.type
      
      
      - Attributes
- protected
- Definition Classes
- HasFeatures
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        set(paramPair: ParamPair[_]): ChunkKeyPhraseExtraction.this.type
      
      
      - Attributes
- protected
- Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        set(param: String, value: Any): ChunkKeyPhraseExtraction.this.type
      
      
      - Attributes
- protected
- Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        set[T](param: Param[T], value: T): ChunkKeyPhraseExtraction.this.type
      
      
      - Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setBatchSize(size: Int): ChunkKeyPhraseExtraction.this.type
      
      
      - Definition Classes
- HasBatchedAnnotate
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setCaseSensitive(value: Boolean): ChunkKeyPhraseExtraction.this.type
      
      
      - Definition Classes
- BertSentenceEmbeddings → HasCaseSensitiveProperties
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setConcatenateSentences(value: Boolean): ChunkKeyPhraseExtraction.this.type
      
      
      Concatenate the input sentence/documentation annotations before computing their embedding Default value is 'true'. 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setConfigProtoBytes(bytes: Array[Int]): ChunkKeyPhraseExtraction.this.type
      
      
      - Definition Classes
- BertSentenceEmbeddings
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setDefault[T](feature: StructFeature[T], value: () ⇒ T): ChunkKeyPhraseExtraction.this.type
      
      
      - Attributes
- protected
- Definition Classes
- HasFeatures
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setDefault[K, V](feature: MapFeature[K, V], value: () ⇒ Map[K, V]): ChunkKeyPhraseExtraction.this.type
      
      
      - Attributes
- protected
- Definition Classes
- HasFeatures
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setDefault[T](feature: SetFeature[T], value: () ⇒ Set[T]): ChunkKeyPhraseExtraction.this.type
      
      
      - Attributes
- protected
- Definition Classes
- HasFeatures
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setDefault[T](feature: ArrayFeature[T], value: () ⇒ Array[T]): ChunkKeyPhraseExtraction.this.type
      
      
      - Attributes
- protected
- Definition Classes
- HasFeatures
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        setDefault(paramPairs: ParamPair[_]*): ChunkKeyPhraseExtraction.this.type
      
      
      - Attributes
- protected
- Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        setDefault[T](param: Param[T], value: T): ChunkKeyPhraseExtraction.this.type
      
      
      - Attributes
- protected[org.apache.spark.ml]
- Definition Classes
- Params
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setDimension(value: Int): ChunkKeyPhraseExtraction.this.type
      
      
      - Definition Classes
- BertSentenceEmbeddings → HasEmbeddingsProperties
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setDivergence(value: Float): ChunkKeyPhraseExtraction.this.type
      
      
      Set the level of divergence of the extracted key phrases. Set the level of divergence of the extracted key phrases. The value should be in the interval [0, 1]. 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setDocumentLevelProcessing(value: Boolean): ChunkKeyPhraseExtraction.this.type
      
      
      Extract key phrases from the whole document (true) or from particular sentences which the chunks refer to (false) Default value is 'true'. 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setDropPunctuation(value: Boolean): ChunkKeyPhraseExtraction.this.type
      
      
      Remove punctuation marks from input chunks. Remove punctuation marks from input chunks. Default value is 'true'. 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        setInputCols(value: String*): ChunkKeyPhraseExtraction.this.type
      
      
      - Definition Classes
- HasInputAnnotationCols
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setInputCols(value: Array[String]): ChunkKeyPhraseExtraction.this.type
      
      
      - Definition Classes
- HasInputAnnotationCols
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setIsLong(value: Boolean): ChunkKeyPhraseExtraction.this.type
      
      
      - Definition Classes
- BertSentenceEmbeddings
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setLazyAnnotator(value: Boolean): ChunkKeyPhraseExtraction.this.type
      
      
      - Definition Classes
- CanBeLazy
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setMaxSentenceLength(value: Int): ChunkKeyPhraseExtraction.this.type
      
      
      - Definition Classes
- BertSentenceEmbeddings
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setModelIfNotSet(spark: SparkSession, tensorflowWrapper: Option[TensorflowWrapper], onnxWrapper: Option[OnnxWrapper], openvinoWrapper: Option[OpenvinoWrapper]): ChunkKeyPhraseExtraction.this.type
      
      
      - Definition Classes
- BertSentenceEmbeddings
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        setOutputCol(value: String): ChunkKeyPhraseExtraction.this.type
      
      
      - Definition Classes
- HasOutputAnnotationCol
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setParent(parent: Estimator[BertSentenceEmbeddings]): BertSentenceEmbeddings
      
      
      - Definition Classes
- Model
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setSelectMostDifferent(value: Boolean): ChunkKeyPhraseExtraction.this.type
      
      
      Let the model return the top N key phrases which are the most different from each other 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setSignatures(value: Map[String, String]): ChunkKeyPhraseExtraction.this.type
      
      
      - Definition Classes
- BertSentenceEmbeddings
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setStorageRef(value: String): ChunkKeyPhraseExtraction.this.type
      
      
      - Definition Classes
- HasStorageRef
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setTopN(value: Int): ChunkKeyPhraseExtraction.this.type
      
      
      Set the number of key phrases to extract 
- 
      
      
      
        
      
    
      
        
        def
      
      
        setVocabulary(value: Map[String, Int]): ChunkKeyPhraseExtraction.this.type
      
      
      - Definition Classes
- BertSentenceEmbeddings
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        signatures: MapFeature[String, String]
      
      
      - Definition Classes
- BertSentenceEmbeddings
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        storageRef: Param[String]
      
      
      - Definition Classes
- HasStorageRef
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        synchronized[T0](arg0: ⇒ T0): T0
      
      
      - Definition Classes
- AnyRef
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        toString(): String
      
      
      - Definition Classes
- Identifiable → AnyRef → Any
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        tokenize(sentences: Seq[Sentence]): Seq[WordpieceTokenizedSentence]
      
      
      - Definition Classes
- BertSentenceEmbeddings
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        topN: IntParam
      
      
      Number of key phrases to extract, ordered by their score 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        transform(dataset: Dataset[_]): DataFrame
      
      
      - Definition Classes
- AnnotatorModel → Transformer
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
      
      
      - Definition Classes
- Transformer
- Annotations
- @Since( "2.0.0" )
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
      
      
      - Definition Classes
- Transformer
- Annotations
- @Since( "2.0.0" ) @varargs()
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        transformSchema(schema: StructType): StructType
      
      
      - Definition Classes
- RawAnnotator → PipelineStage
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        transformSchema(schema: StructType, logging: Boolean): StructType
      
      
      - Attributes
- protected
- Definition Classes
- PipelineStage
- Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        uid: String
      
      
      - Definition Classes
- ChunkKeyPhraseExtraction → BertSentenceEmbeddings → Identifiable
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        validate(schema: StructType): Boolean
      
      
      - Attributes
- protected
- Definition Classes
- RawAnnotator
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        validateStorageRef(dataset: Dataset[_], inputCols: Array[String], annotatorType: String): Unit
      
      
      - Definition Classes
- HasStorageRef
 
- 
      
      
      
        
      
    
      
        
        val
      
      
        vocabulary: MapFeature[String, Int]
      
      
      - Definition Classes
- BertSentenceEmbeddings
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(): Unit
      
      
      - Definition Classes
- AnyRef
- Annotations
- @throws( ... )
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(arg0: Long, arg1: Int): Unit
      
      
      - Definition Classes
- AnyRef
- Annotations
- @throws( ... )
 
- 
      
      
      
        
      
    
      
        final 
        def
      
      
        wait(arg0: Long): Unit
      
      
      - Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        wrapColumnMetadata(col: Column): Column
      
      
      - Attributes
- protected
- Definition Classes
- RawAnnotator
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        wrapEmbeddingsMetadata(col: Column, embeddingsDim: Int, embeddingsRef: Option[String]): Column
      
      
      - Attributes
- protected
- Definition Classes
- HasEmbeddingsProperties
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        wrapSentenceEmbeddingsMetadata(col: Column, embeddingsDim: Int, embeddingsRef: Option[String]): Column
      
      
      - Attributes
- protected
- Definition Classes
- HasEmbeddingsProperties
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        write: MLWriter
      
      
      - Definition Classes
- ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        writeOnnxModel(path: String, spark: SparkSession, onnxWrapper: OnnxWrapper, suffix: String, fileName: String): Unit
      
      
      - Definition Classes
- WriteOnnxModel
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        writeOnnxModels(path: String, spark: SparkSession, onnxWrappersWithNames: Seq[(OnnxWrapper, String)], suffix: String): Unit
      
      
      - Definition Classes
- WriteOnnxModel
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        writeOpenvinoModel(path: String, spark: SparkSession, openvinoWrapper: OpenvinoWrapper, suffix: String, fileName: String): Unit
      
      
      - Definition Classes
- WriteOpenvinoModel
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        writeOpenvinoModels(path: String, spark: SparkSession, ovWrappersWithNames: Seq[(OpenvinoWrapper, String)], suffix: String): Unit
      
      
      - Definition Classes
- WriteOpenvinoModel
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        writeTensorflowHub(path: String, tfPath: String, spark: SparkSession, suffix: String): Unit
      
      
      - Definition Classes
- WriteTensorflowModel
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        writeTensorflowModel(path: String, spark: SparkSession, tensorflow: TensorflowWrapper, suffix: String, filename: String, configProtoBytes: Option[Array[Byte]]): Unit
      
      
      - Definition Classes
- WriteTensorflowModel
 
- 
      
      
      
        
      
    
      
        
        def
      
      
        writeTensorflowModelV2(path: String, spark: SparkSession, tensorflow: TensorflowWrapper, suffix: String, filename: String, configProtoBytes: Option[Array[Byte]], savedSignatures: Option[Map[String, String]]): Unit
      
      
      - Definition Classes
- WriteTensorflowModel