Packages

package classification

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class DocumentLogRegClassifierApproach extends AnnotatorApproach[DocumentLogRegClassifierModel] with CheckLicense

    Trains a model to classify documents with a Logarithmic Regression algorithm.

    Trains a model to classify documents with a Logarithmic Regression algorithm. Training data requires columns for text and their label. The result is a trained DocumentLogRegClassifierModel.

    Example

    Define pipeline stages to prepare the data

    val document_assembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val tokenizer = new Tokenizer()
      .setInputCols("document")
      .setOutputCol("token")
    
    val normalizer = new Normalizer()
      .setInputCols("token")
      .setOutputCol("normalized")
    
    val stopwords_cleaner = new StopWordsCleaner()
      .setInputCols("normalized")
      .setOutputCol("cleanTokens")
      .setCaseSensitive(false)
    
    val stemmer = new Stemmer()
      .setInputCols("cleanTokens")
      .setOutputCol("stem")

    Define the document classifier and fit training data to it

    val logreg = new DocumentLogRegClassifierApproach()
      .setInputCols("stem")
      .setLabelCol("category")
      .setOutputCol("prediction")
    
    val pipeline = new Pipeline().setStages(Array(
      document_assembler,
      tokenizer,
      normalizer,
      stopwords_cleaner,
      stemmer,
      logreg
    ))
    
    val model = pipeline.fit(trainingData)
    See also

    DocumentLogRegClassifierModel for instantiated models

  2. class DocumentLogRegClassifierModel extends Model[DocumentLogRegClassifierModel] with RawAnnotator[DocumentLogRegClassifierModel] with CanBeLazy with CheckLicense

    Classifies documents with a Logarithmic Regression algorithm.

    Classifies documents with a Logarithmic Regression algorithm. Currently there are no pretrained models available. Please see DocumentLogRegClassifierApproach to train your own model.

    Please check out the Models Hub for available models in the future.

  3. class DocumentMLClassifierApproach extends AnnotatorApproach[DocumentMLClassifierModel] with DocumentMLClassifierParams with CheckLicense

    Trains a model to classify documents with a Logarithmic Regression algorithm.

    Trains a model to classify documents with a Logarithmic Regression algorithm. Training data requires columns for text and their label. The result is a trained DocumentMLClassifierModel.

    Example

    Define pipeline stages to prepare the data

    val document_assembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val tokenizer = new Tokenizer()
      .setInputCols("document")
      .setOutputCol("token")
    
    val normalizer = new Normalizer()
      .setInputCols("token")
      .setOutputCol("normalized")
    
    val stopwords_cleaner = new StopWordsCleaner()
      .setInputCols("normalized")
      .setOutputCol("cleanTokens")
      .setCaseSensitive(false)
    
    val stemmer = new Stemmer()
      .setInputCols("cleanTokens")
      .setOutputCol("stem")

    Define the document classifier and fit training data to it

    val logreg = new DocumentMLClassifierApproach()
      .setInputCols("stem")
      .setLabelCol("category")
      .setOutputCol("prediction")
    
    val pipeline = new Pipeline().setStages(Array(
      document_assembler,
      tokenizer,
      normalizer,
      stopwords_cleaner,
      stemmer,
      logreg
    ))
    
    val model = pipeline.fit(trainingData)
    See also

    DocumentMLClassifierModel for instantiated models

  4. class DocumentMLClassifierModel extends Model[DocumentMLClassifierModel] with RawAnnotator[DocumentMLClassifierModel] with DocumentMLClassifierParams with CanBeLazy with CheckLicense

    Classifies documents with a Logarithmic Regression algorithm.

    Classifies documents with a Logarithmic Regression algorithm. Currently there are no pretrained models available. Please see DocumentMLClassifierApproach to train your own model.

    Please check out the Models Hub for available models in the future.

  5. trait DocumentMLClassifierParams extends Params
  6. class FewShotClassifierApproach extends GenericLogRegClassifierApproach
  7. class FewShotClassifierModel extends GenericLogRegClassifierModel
  8. class GenericLogRegClassifierApproach extends GenericClassifierApproach

    An implementation of multinomial logistic regression.

    An implementation of multinomial logistic regression.

    Example

    Define pipeline stages to prepare the data

     val documentAssembler = new DocumentAssembler()
       .setInputCol("text")
       .setOutputCol("document")
     val sentenceEmbeddings = BertSentenceEmbeddings
       .pretrained()
       .setInputCols(Array("document"))
       .setOutputCol("sentence_embedding")
     val featuresAssembler = new FeaturesAssembler()
       .setInputCols(Array("sentence_embedding"))
       .setOutputCol("feature_vector")
     val logRegClassifier = new GenericLogRegClassifierApproach()
       .setInputCols("feature_vector")
       .setOutputCol("prediction")
       .setLabelColumn("label")
       .setModelFile("src/test/resources/classification/log_reg_graph.pb")
       .setEpochsNumber(10)
       .setBatchSize(1)
       .setMultiClass(false)
       .setlearningRate(0.01f)
    
    val pipeline = new Pipeline().setStages(Array(
      documentAssembler,
      sentenceEmbeddings,
      featuresAssembler,
      logRegClassifier,
    ))
    
    val model = pipeline.fit(trainingData)
    See also

    DocumentLogRegClassifierModel for instantiated models

  9. class GenericLogRegClassifierModel extends GenericClassifierModel with ParamsAndFeaturesWritable

    Logistic regression classification

    Logistic regression classification

    Please check out the Models Hub for available models.

  10. class GenericSVMClassifierApproach extends GenericClassifierApproach

    An implementation of Support Vector Machine (SVM) classification .

    An implementation of Support Vector Machine (SVM) classification .

    Example

    Define pipeline stages to prepare the data

     val documentAssembler = new DocumentAssembler()
       .setInputCol("text")
       .setOutputCol("document")
     val sentenceEmbeddings = BertSentenceEmbeddings
       .pretrained()
       .setInputCols(Array("document"))
       .setOutputCol("sentence_embedding")
     val featuresAssembler = new FeaturesAssembler()
       .setInputCols(Array("sentence_embedding"))
       .setOutputCol("feature_vector")
     val svmClassifier = new GenericSVMClassifierApproach()
       .setInputCols("feature_vector")
       .setOutputCol("prediction")
       .setLabelColumn("label")
       .setModelFile("src/test/resources/classification/svm_graph.pb")
       .setEpochsNumber(10)
       .setBatchSize(1)
       .setMultiClass(false)
       .setlearningRate(0.01f)
    
    val pipeline = new Pipeline().setStages(Array(
      documentAssembler,
      sentenceEmbeddings,
      featuresAssembler,
      svmClassifier,
    ))
    
    val model = pipeline.fit(trainingData)
    See also

    DocumentLogRegClassifierModel for instantiated models

  11. class GenericSVMClassifierModel extends GenericClassifierModel with ParamsAndFeaturesWritable

    Support vector machine (SVM) classification

    Support vector machine (SVM) classification

    Please check out the Models Hub for available models.

  12. class MedicalBertForSequenceClassification extends AnnotatorModel[MedicalBertForSequenceClassification] with HasBatchedAnnotate[MedicalBertForSequenceClassification] with WriteTensorflowModel with WriteOnnxModel with HasCaseSensitiveProperties with HasEngine with CheckLicense

    MedicalBertForSequenceClassification can load Bert Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.

    MedicalBertForSequenceClassification can load Bert Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.

    Pretrained models can be loaded with pretrained of the companion object:

    val sequenceClassifier = MedicalBertForSequenceClassification.pretrained()
      .setInputCols("token", "document")
      .setOutputCol("label")

    The default model is "bert_sequence_classifier_ade", if no name is provided.

    For available pretrained models please see the Models Hub.

    Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. The Spark NLP Workshop example shows how to import them https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.

    Example

    import spark.implicits._
    import com.johnsnowlabs.nlp.base._
    import com.johnsnowlabs.nlp.annotator._
    import org.apache.spark.ml.Pipeline
    
    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val tokenizer = new Tokenizer()
      .setInputCols("document")
      .setOutputCol("token")
    
    val sequenceClassifier = MedicalBertForSequenceClassification.pretrained()
      .setInputCols("token", "document")
      .setOutputCol("label")
      .setCaseSensitive(true)
    
    val pipeline = new Pipeline().setStages(Array(
      documentAssembler,
      tokenizer,
      sequenceClassifier
    ))
    
    val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
    val result = pipeline.fit(data).transform(data)
    
    result.select("label.result").show(false)
    +--------------------+
    |result              |
    +--------------------+
    |[True, False]       |
    +--------------------+
    See also

    MedicalBertForSequenceClassification for sequnece-level classification

    Annotators Main Page for a list of transformer based classifiers

  13. class MedicalBertForTokenClassifier extends AnnotatorModel[MedicalBertForTokenClassifier] with HasBatchedAnnotate[MedicalBertForTokenClassifier] with WriteTensorflowModel with WriteOnnxModel with HasCaseSensitiveProperties with HasEngine with CheckLicense
  14. class MedicalDistilBertForSequenceClassification extends AnnotatorModel[MedicalDistilBertForSequenceClassification] with HasBatchedAnnotate[MedicalDistilBertForSequenceClassification] with WriteTensorflowModel with WriteOnnxModel with HasCaseSensitiveProperties with HasEngine with CheckLicense

    MedicalDistilBertForSequenceClassification can load DistilBERT Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.

    MedicalDistilBertForSequenceClassification can load DistilBERT Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.

    Pretrained models can be loaded with pretrained of the companion object:

    val sequenceClassifier = MedicalDistilBertForSequenceClassification.pretrained()
      .setInputCols("token", "document")
      .setOutputCol("label")

    The default model is "distilbert_base_sequence_classifier_imdb", if no name is provided.

    For available pretrained models please see the Models Hub.

    Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. The Spark NLP Workshop example shows how to import them https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the MedicalDistilBertForSequenceClassificationTestSpec.

    Example

    import spark.implicits._
    import com.johnsnowlabs.nlp.base._
    import com.johnsnowlabs.nlp.annotator._
    import org.apache.spark.ml.Pipeline
    
    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val tokenizer = new Tokenizer()
      .setInputCols("document")
      .setOutputCol("token")
    
    val sequenceClassifier = MedicalDistilBertForSequenceClassification.pretrained()
      .setInputCols("token", "document")
      .setOutputCol("label")
      .setCaseSensitive(true)
    
    val pipeline = new Pipeline().setStages(Array(
      documentAssembler,
      tokenizer,
      sequenceClassifier
    ))
    
    val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
    val result = pipeline.fit(data).transform(data)
    
    result.select("label.result").show(false)
    +--------------------+
    |result              |
    +--------------------+
    |[neg, neg]          |
    |[pos, pos, pos, pos]|
    +--------------------+
    See also

    MedicalDistilBertForSequenceClassification for sequence-level classification

    Annotators Main Page for a list of transformer based classifiers

  15. trait ReadBertForTokenTensorflowModel extends ReadTensorflowModel with ReadOnnxModel
  16. trait ReadDistilBertForSequenceTensorflowModel extends ReadTensorflowModel with ReadOnnxModel
  17. trait ReadMedicalBertForSequenceClassification extends ReadTensorflowModel with ReadOnnxModel
  18. trait ReadablePretrainedBertForSequenceModel extends ParamsAndFeaturesReadable[MedicalBertForSequenceClassification] with HasPretrained[MedicalBertForSequenceClassification]
  19. trait ReadablePretrainedBertForTokenModel extends ParamsAndFeaturesReadable[MedicalBertForTokenClassifier] with HasPretrained[MedicalBertForTokenClassifier]
  20. trait ReadablePretrainedDistilBertForSequenceModel extends ParamsAndFeaturesReadable[MedicalDistilBertForSequenceClassification] with HasPretrained[MedicalDistilBertForSequenceClassification]
  21. trait ReadablePretrainedDocumentLogRegClassifierModel extends ParamsAndFeaturesReadable[DocumentLogRegClassifierModel] with HasPretrained[DocumentLogRegClassifierModel]
  22. trait ReadablePretrainedDocumentMLClassifierModel extends ParamsAndFeaturesReadable[DocumentMLClassifierModel] with HasPretrained[DocumentMLClassifierModel]

Value Members

  1. object DocumentLogRegClassifierModel extends ReadablePretrainedDocumentLogRegClassifierModel with Serializable
  2. object DocumentMLClassifierModel extends ReadablePretrainedDocumentMLClassifierModel with Serializable
  3. object FewShotClassifierApproach extends FewShotClassifierApproach
  4. object FewShotClassifierModel extends ReadsGenericClassifierGraph[FewShotClassifierModel] with ReadablePretrainedGenericClassifier[FewShotClassifierModel] with Serializable
  5. object GenericLogRegClassifierApproach extends GenericLogRegClassifierApproach
  6. object GenericLogRegClassifierModel extends ReadsGenericClassifierGraph[GenericLogRegClassifierModel] with ReadablePretrainedGenericClassifier[GenericLogRegClassifierModel] with Serializable
  7. object GenericSVMClassifierApproach extends GenericSVMClassifierApproach
  8. object GenericSVMClassifierModel extends ReadsGenericClassifierGraph[GenericSVMClassifierModel] with ReadablePretrainedGenericClassifier[GenericSVMClassifierModel] with Serializable
  9. object MedicalBertForSequenceClassification extends ReadablePretrainedBertForSequenceModel with ReadMedicalBertForSequenceClassification with Serializable

    This is the companion object of MedicalBertForSequenceClassification.

    This is the companion object of MedicalBertForSequenceClassification. Please refer to that class for the documentation.

  10. object MedicalBertForTokenClassifier extends ReadablePretrainedBertForTokenModel with ReadBertForTokenTensorflowModel with Serializable

    This is the companion object of MedicalBertForTokenClassifier.

    This is the companion object of MedicalBertForTokenClassifier. Please refer to that class for the documentation.

  11. object MedicalDistilBertForSequenceClassification extends ReadablePretrainedDistilBertForSequenceModel with ReadDistilBertForSequenceTensorflowModel with Serializable

    This is the companion object of MedicalDistilBertForSequenceClassification.

    This is the companion object of MedicalDistilBertForSequenceClassification. Please refer to that class for the documentation.

Ungrouped