p

com.johnsnowlabs.ml

tensorflow

package tensorflow

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class ClassifierDatasetEncoder extends Serializable
  2. case class ClassifierDatasetEncoderParams(tags: Array[String]) extends Product with Serializable
  3. case class DatasetEncoderParams(tags: List[String], chars: List[Char], emptyVector: List[Float], embeddingsDim: Int, defaultTag: String = "O") extends Product with Serializable

    tags

    list of unique tags

    chars

    list of unique characters

    emptyVector

    list of embeddings

    embeddingsDim

    dimension of embeddings

    defaultTag

    the default tag

  4. trait Logging extends AnyRef
  5. class NerBatch extends AnyRef

    Batch that contains data in Tensorflow input format.

  6. class NerDatasetEncoder extends Serializable
  7. trait ReadTensorflowModel extends AnyRef
  8. case class SentenceGrouper[T](getLength: (T) ⇒ Int, sizes: Array[Int] = Array(5, 10, 20, 50))(implicit evidence$1: ClassTag[T]) extends Product with Serializable
  9. class TensorResources extends AnyRef

    This class is being used to initialize Tensors of different types and shapes for Tensorflow operations

  10. class TensorflowAlbert extends Serializable

    This class is used to calculate ALBERT embeddings for For Sequence Batches of WordpieceTokenizedSentence.

    This class is used to calculate ALBERT embeddings for For Sequence Batches of WordpieceTokenizedSentence. Input for this model must be tokenzied with a SentencePieceModel,

    This Tensorflow model is using the weights provided by https://tfhub.dev/google/albert_base/3 * sequence_output: representations of every token in the input sequence with shape [batch_size, max_sequence_length, hidden_size].

    ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS - Google Research, Toyota Technological Institute at Chicago This these embeddings represent the outputs generated by the Albert model. All offical Albert releases by google in TF-HUB are supported with this Albert Wrapper:

    TF-HUB Models : albert_base = https://tfhub.dev/google/albert_base/3 | 768-embed-dim, 12-layer, 12-heads, 12M parameters albert_large = https://tfhub.dev/google/albert_large/3 | 1024-embed-dim, 24-layer, 16-heads, 18M parameters albert_xlarge = https://tfhub.dev/google/albert_xlarge/3 | 2048-embed-dim, 24-layer, 32-heads, 60M parameters albert_xxlarge = https://tfhub.dev/google/albert_xxlarge/3 | 4096-embed-dim, 12-layer, 64-heads, 235M parameters

    This model requires input tokenization with SentencePiece model, which is provided by Spark NLP

    For additional information see : https://arxiv.org/pdf/1909.11942.pdf https://github.com/google-research/ALBERT https://tfhub.dev/s?q=albert

    Tips:

    ALBERT uses repeating layers which results in a small memory footprint, however the computational cost remains similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same number of (repeating) layers.

  11. class TensorflowBert extends Serializable

    BERT (Bidirectional Encoder Representations from Transformers) provides dense vector representations for natural language by using a deep, pre-trained neural network with the Transformer architecture

    BERT (Bidirectional Encoder Representations from Transformers) provides dense vector representations for natural language by using a deep, pre-trained neural network with the Transformer architecture

    See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/embeddings/BertEmbeddingsTestSpec.scala for further reference on how to use this API. Sources:

  12. class TensorflowClassifier extends Serializable with Logging
  13. class TensorflowElmo extends Serializable

    Embeddings from a language model trained on the 1 Billion Word Benchmark.

    Embeddings from a language model trained on the 1 Billion Word Benchmark.

    Note that this is a very computationally expensive module compared to word embedding modules that only perform embedding lookups. The use of an accelerator is recommended.

    word_emb: the character-based word representations with shape [batch_size, max_length, 512]. == word_emb

    lstm_outputs1: the first LSTM hidden state with shape [batch_size, max_length, 1024]. === lstm_outputs1

    lstm_outputs2: the second LSTM hidden state with shape [batch_size, max_length, 1024]. === lstm_outputs2

    elmo: the weighted sum of the 3 layers, where the weights are trainable. This tensor has shape [batch_size, max_length, 1024] == elmo

    See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/embeddings/ElmoEmbeddingsTestSpec.scala for further reference on how to use this API.

  14. class TensorflowLD extends Serializable

    Language Identification and Detection by using CNNs and RNNs architectures in TensowrFlow

    Language Identification and Detection by using CNNs and RNNs architectures in TensowrFlow

    The models are trained on large datasets such as Wikipedia and Tatoeba The output is a language code in Wiki Code style: https://en.wikipedia.org/wiki/List_of_Wikipedias

  15. class TensorflowMarian extends Serializable

    MarianTransformer: Fast Neural Machine Translation

    MarianTransformer: Fast Neural Machine Translation

    MarianTransformer uses models trained by MarianNMT.

    Marian is an efficient, free Neural Machine Translation framework written in pure C++ with minimal dependencies. It is mainly being developed by the Microsoft Translator team. Many academic (most notably the University of Edinburgh and in the past the Adam Mickiewicz University in Poznań) and commercial contributors help with its development.

    It is currently the engine behind the Microsoft Translator Neural Machine Translation services and being deployed by many companies, organizations and research projects (see below for an incomplete list).

    Sources : MarianNMT https://marian-nmt.github.io/ Marian: Fast Neural Machine Translation in C++ https://www.aclweb.org/anthology/P18-4020/

  16. class TensorflowMultiClassifier extends Serializable with Logging
  17. class TensorflowNer extends Serializable with Logging
  18. class TensorflowSentenceDetectorDL extends Serializable with Logging
  19. class TensorflowSentiment extends Serializable with Logging
  20. class TensorflowSpell extends Logging with Serializable
  21. class TensorflowT5 extends Serializable

    This class is used to run T5 model for For Sequence Batches of WordpieceTokenizedSentence.

    This class is used to run T5 model for For Sequence Batches of WordpieceTokenizedSentence. Input for this model must be tokenized with a SentencePieceModel,

  22. class TensorflowUSE extends Serializable

    The Universal Sentence Encoder encodes text into high dimensional vectors that can be used for text classification, semantic similarity, clustering and other natural language tasks.

    The Universal Sentence Encoder encodes text into high dimensional vectors that can be used for text classification, semantic similarity, clustering and other natural language tasks.

    See https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/embeddings/UniversalSentenceEncoderTestSpec.scala for further reference on how to use this API.

  23. class TensorflowWrapper extends Serializable
  24. class TensorflowXlnet extends Serializable

    XlnetEmbeddings (XLNet): Generalized Autoregressive Pretraining for Language Understanding

    XlnetEmbeddings (XLNet): Generalized Autoregressive Pretraining for Language Understanding

    Note that this is a very computationally expensive module compared to word embedding modules that only perform embedding lookups. The use of an accelerator is recommended.

    XLNet is a new unsupervised language representation learning method based on a novel generalized permutation language modeling objective. Additionally, XLNet employs Transformer-XL as the backbone model, exhibiting excellent performance for language tasks involving long context. Overall, XLNet achieves state-of-the-art (SOTA) results on various downstream language tasks including question answering, natural language inference, sentiment analysis, and document ranking.

    XLNet-Large = https://storage.googleapis.com/xlnet/released_models/cased_L-24_H-1024_A-16.zip | 24-layer, 1024-hidden, 16-heads XLNet-Base = https://storage.googleapis.com/xlnet/released_models/cased_L-12_H-768_A-12.zip | 12-layer, 768-hidden, 12-heads. This model is trained on full data (different from the one in the paper).

  25. case class Variables(variables: Array[Byte], index: Array[Byte]) extends Product with Serializable
  26. trait WriteTensorflowModel extends AnyRef

Value Members

  1. object NerBatch
  2. object TensorResources
  3. object TensorflowWrapper extends Serializable

Ungrouped