Packages

package embeddings

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class AverageEmbeddings extends AnnotatorModel[AverageEmbeddings] with HasSimpleAnnotate[AverageEmbeddings] with HasStorageRef with HasEmbeddingsProperties with CheckLicense

    Merge embdeddings.

  2. class BertSentenceChunkEmbeddings extends BertSentenceEmbeddings with CheckLicense

    BERT Sentence embeddings for chunk annotations which take into account the context of the sentence the chunk appeared in.

    BERT Sentence embeddings for chunk annotations which take into account the context of the sentence the chunk appeared in. This is an extension of BertSentenceEmbeddings which combines the embedding of a chunk with the embedding of the surrounding sentence. For each input chunk annotation, it finds the corresponding sentence, computes the BERT sentence embedding of both the chunk and the sentence and averages them. The resulting embeddings are useful in cases, in which one needs a numerical representation of a text chunk which is sensitive to the context it appears in.

    This model is a subclass of BertSentenceEmbeddings and shares all parameters with it. It can load any pretrained BertSentenceEmbeddings model. Available models can be found at Models Hub.

    Two input columns are required - chunk and sentence.

    val embeddings = BertSentenceChunkEmbeddings.pretrained()
      .setInputCols("sentence", "chunk")
      .setOutputCol("sentence_chunk_bert_embeddings")

    The default model is "sent_small_bert_L2_768", if no name is provided.

    Sources :

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

    Example

    import spark.implicits._
    import com.johnsnowlabs.nlp.base.DocumentAssembler
    import com.johnsnowlabs.nlp.annotator.SentenceDetector
    import com.johnsnowlabs.nlp.embeddings.BertSentenceEmbeddings
    import com.johnsnowlabs.nlp.EmbeddingsFinisher
    import org.apache.spark.ml.Pipeline
    
    val documentAssembler = new DocumentAssembler()
       .setInputCol("text")
       .setOutputCol("document")
    
    val sentenceDetector = new SentenceDetector()
       .setInputCols("document")
       .setOutputCol("sentence")
    
    val tokenizer = new Tokenizer()
       .setInputCols("sentence")
       .setOutputCol("tokens")
    
    val wordEmbeddings = BertEmbeddings
       .pretrained("biobert_pubmed_base_cased")
       .setInputCols(Array("sentence", "tokens"))
       .setOutputCol("word_embeddings")
    
    val nerModel = MedicalNerModel
       .pretrained("ner_clinical_biobert", "en", "clinical/models")
       .setInputCols(Array("sentence", "tokens", "word_embeddings"))
       .setOutputCol("ner")
    
     val nerConverter = new NerConverter()
       .setInputCols("sentence", "tokens", "ner")
       .setOutputCol("ner_chunk")
    
    val sentenceChunkEmbeddings = BertSentenceChunkEmbeddings
       .pretrained("sbluebert_base_uncased_mli", "en", "clinical/models")
        .setInputCols(Array("sentence", "ner_chunk"))
        .setOutputCol("sentence_chunk_embeddings")
    
    val pipeline = new Pipeline()
         .setStages(Array(
             documentAssembler,
             sentenceDetector,
             tokenizer,
             wordEmbeddings,
             nerModel,
             nerConverter,
             sentenceChunkEmbeddings))
    
    val sampleText = "Her Diabetes has become type 2 in the last year with her Diabetes." +
       " He complains of swelling in his right forearm."
    
    val testDataset = Seq("").toDS.toDF("text")
    val result = pipeline.fit(emptyDataset).transform(testDataset)
    
    result
       .selectExpr("explode(sentence_chunk_embeddings) AS s")
       .selectExpr("s.result", "slice(s.embeddings, 1, 5) AS averageEmbedding")
       .show(truncate=false)
    
    +-----------------------------+-----------------------------------------------------------------+
    |                       result|                                                 averageEmbedding|
    +-----------------------------+-----------------------------------------------------------------+
    |Her Diabetes                 |[-0.31995273, -0.04710883, -0.28973156, -0.1294758, 0.12481072]  |
    |type 2                       |[-0.027161136, -0.24613449, -0.0949309, 0.1825444, -0.2252143]   |
    |her Diabetes                 |[-0.31995273, -0.04710883, -0.28973156, -0.1294758, 0.12481072]  |
    |swelling in his right forearm|[-0.45139068, 0.12400375, -0.0075617577, -0.90806055, 0.12871636]|
    +-----------------------------+-----------------------------------------------------------------+
    See also

    BertEmbeddings for token-level embeddings

    BertSentenceEmbeddings for sentence-level embeddings

    Annotators Main Page for a list of transformer based embeddings

  3. trait ReadBertSentenceChunksTensorflowModel extends ReadTensorflowModel
  4. trait ReadablePretrainedBertSentenceChunksModel extends ParamsAndFeaturesReadable[BertSentenceChunkEmbeddings] with HasPretrained[BertSentenceChunkEmbeddings]

Ungrouped