package classification
- Alphabetic
- Public
- All
Type Members
-
class
DocumentLogRegClassifierApproach extends AnnotatorApproach[DocumentLogRegClassifierModel] with CheckLicense
Trains a model to classify documents with a Logarithmic Regression algorithm.
Trains a model to classify documents with a Logarithmic Regression algorithm. Training data requires columns for text and their label. The result is a trained DocumentLogRegClassifierModel.
Example
Define pipeline stages to prepare the data
val document_assembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val normalizer = new Normalizer() .setInputCols("token") .setOutputCol("normalized") val stopwords_cleaner = new StopWordsCleaner() .setInputCols("normalized") .setOutputCol("cleanTokens") .setCaseSensitive(false) val stemmer = new Stemmer() .setInputCols("cleanTokens") .setOutputCol("stem")
Define the document classifier and fit training data to it
val logreg = new DocumentLogRegClassifierApproach() .setInputCols("stem") .setLabelCol("category") .setOutputCol("prediction") val pipeline = new Pipeline().setStages(Array( document_assembler, tokenizer, normalizer, stopwords_cleaner, stemmer, logreg )) val model = pipeline.fit(trainingData)
- See also
DocumentLogRegClassifierModel for instantiated models
-
class
DocumentLogRegClassifierModel extends Model[DocumentLogRegClassifierModel] with RawAnnotator[DocumentLogRegClassifierModel] with CanBeLazy with CheckLicense
Classifies documents with a Logarithmic Regression algorithm.
Classifies documents with a Logarithmic Regression algorithm. Currently there are no pretrained models available. Please see DocumentLogRegClassifierApproach to train your own model.
Please check out the Models Hub for available models in the future.
-
class
DocumentMLClassifierApproach extends AnnotatorApproach[DocumentMLClassifierModel] with DocumentMLClassifierParams with CheckLicense
Trains a model to classify documents with a Logarithmic Regression algorithm.
Trains a model to classify documents with a Logarithmic Regression algorithm. Training data requires columns for text and their label. The result is a trained DocumentMLClassifierModel.
Example
Define pipeline stages to prepare the data
val document_assembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val normalizer = new Normalizer() .setInputCols("token") .setOutputCol("normalized") val stopwords_cleaner = new StopWordsCleaner() .setInputCols("normalized") .setOutputCol("cleanTokens") .setCaseSensitive(false) val stemmer = new Stemmer() .setInputCols("cleanTokens") .setOutputCol("stem")
Define the document classifier and fit training data to it
val logreg = new DocumentMLClassifierApproach() .setInputCols("stem") .setLabelCol("category") .setOutputCol("prediction") val pipeline = new Pipeline().setStages(Array( document_assembler, tokenizer, normalizer, stopwords_cleaner, stemmer, logreg )) val model = pipeline.fit(trainingData)
- See also
DocumentMLClassifierModel for instantiated models
-
class
DocumentMLClassifierModel extends Model[DocumentMLClassifierModel] with RawAnnotator[DocumentMLClassifierModel] with DocumentMLClassifierParams with CanBeLazy with CheckLicense
Classifies documents with a Logarithmic Regression algorithm.
Classifies documents with a Logarithmic Regression algorithm. Currently there are no pretrained models available. Please see DocumentMLClassifierApproach to train your own model.
Please check out the Models Hub for available models in the future.
- trait DocumentMLClassifierParams extends Params
-
class
GenericLogRegClassifierApproach extends GenericClassifierApproach
An implementation of multinomial logistic regression.
An implementation of multinomial logistic regression.
Example
Define pipeline stages to prepare the data
val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val sentenceEmbeddings = BertSentenceEmbeddings .pretrained() .setInputCols(Array("document")) .setOutputCol("sentence_embedding") val featuresAssembler = new FeaturesAssembler() .setInputCols(Array("sentence_embedding")) .setOutputCol("feature_vector") val logRegClassifier = new GenericLogRegClassifierApproach() .setInputCols("feature_vector") .setOutputCol("prediction") .setLabelColumn("label") .setModelFile("src/test/resources/classification/log_reg_graph.pb") .setEpochsNumber(10) .setBatchSize(1) .setMultiClass(false) .setlearningRate(0.01f) val pipeline = new Pipeline().setStages(Array( documentAssembler, sentenceEmbeddings, featuresAssembler, logRegClassifier, )) val model = pipeline.fit(trainingData)
- See also
DocumentLogRegClassifierModel for instantiated models
-
class
GenericLogRegClassifierModel extends GenericClassifierModel with ParamsAndFeaturesWritable
Logistic regression classification
Logistic regression classification
Please check out the Models Hub for available models.
-
class
GenericSVMClassifierApproach extends GenericClassifierApproach
An implementation of Support Vector Machine (SVM) classification .
An implementation of Support Vector Machine (SVM) classification .
Example
Define pipeline stages to prepare the data
val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val sentenceEmbeddings = BertSentenceEmbeddings .pretrained() .setInputCols(Array("document")) .setOutputCol("sentence_embedding") val featuresAssembler = new FeaturesAssembler() .setInputCols(Array("sentence_embedding")) .setOutputCol("feature_vector") val svmClassifier = new GenericSVMClassifierApproach() .setInputCols("feature_vector") .setOutputCol("prediction") .setLabelColumn("label") .setModelFile("src/test/resources/classification/svm_graph.pb") .setEpochsNumber(10) .setBatchSize(1) .setMultiClass(false) .setlearningRate(0.01f) val pipeline = new Pipeline().setStages(Array( documentAssembler, sentenceEmbeddings, featuresAssembler, svmClassifier, )) val model = pipeline.fit(trainingData)
- See also
DocumentLogRegClassifierModel for instantiated models
-
class
GenericSVMClassifierModel extends GenericClassifierModel with ParamsAndFeaturesWritable
Support vector machine (SVM) classification
Support vector machine (SVM) classification
Please check out the Models Hub for available models.
-
class
MedicalBertForSequenceClassification extends AnnotatorModel[MedicalBertForSequenceClassification] with HasBatchedAnnotate[MedicalBertForSequenceClassification] with WriteTensorflowModel with HasCaseSensitiveProperties with HasEngine with CheckLicense
MedicalBertForSequenceClassification can load Bert Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
MedicalBertForSequenceClassification can load Bert Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val sequenceClassifier = MedicalBertForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"bert_sequence_classifier_ade"
, if no name is provided.For available pretrained models please see the Models Hub.
Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. The Spark NLP Workshop example shows how to import them https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val sequenceClassifier = MedicalBertForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, sequenceClassifier )) val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +--------------------+ |result | +--------------------+ |[True, False] | +--------------------+
- See also
MedicalBertForSequenceClassification for sequnece-level classification
Annotators Main Page for a list of transformer based classifiers
- class MedicalBertForTokenClassifier extends AnnotatorModel[MedicalBertForTokenClassifier] with HasBatchedAnnotate[MedicalBertForTokenClassifier] with WriteTensorflowModel with HasCaseSensitiveProperties with HasEngine with CheckLicense
-
class
MedicalDistilBertForSequenceClassification extends AnnotatorModel[MedicalDistilBertForSequenceClassification] with HasBatchedAnnotate[MedicalDistilBertForSequenceClassification] with WriteTensorflowModel with HasCaseSensitiveProperties with HasEngine with CheckLicense
MedicalDistilBertForSequenceClassification can load DistilBERT Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
MedicalDistilBertForSequenceClassification can load DistilBERT Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val sequenceClassifier = MedicalDistilBertForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"distilbert_base_sequence_classifier_imdb"
, if no name is provided.For available pretrained models please see the Models Hub.
Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. The Spark NLP Workshop example shows how to import them https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the MedicalDistilBertForSequenceClassificationTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val sequenceClassifier = MedicalDistilBertForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, sequenceClassifier )) val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +--------------------+ |result | +--------------------+ |[neg, neg] | |[pos, pos, pos, pos]| +--------------------+
- See also
MedicalDistilBertForSequenceClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
- trait ReadBertForTokenTensorflowModel extends ReadTensorflowModel
- trait ReadDistilBertForSequenceTensorflowModel extends ReadTensorflowModel
- trait ReadMedicalBertForSequenceClassification extends ReadTensorflowModel
- trait ReadablePretrainedBertForSequenceModel extends ParamsAndFeaturesReadable[MedicalBertForSequenceClassification] with HasPretrained[MedicalBertForSequenceClassification]
- trait ReadablePretrainedBertForTokenModel extends ParamsAndFeaturesReadable[MedicalBertForTokenClassifier] with HasPretrained[MedicalBertForTokenClassifier]
- trait ReadablePretrainedDistilBertForSequenceModel extends ParamsAndFeaturesReadable[MedicalDistilBertForSequenceClassification] with HasPretrained[MedicalDistilBertForSequenceClassification]
- trait ReadablePretrainedDocumentLogRegClassifierModel extends ParamsAndFeaturesReadable[DocumentLogRegClassifierModel] with HasPretrained[DocumentLogRegClassifierModel]
- trait ReadablePretrainedDocumentMLClassifierModel extends ParamsAndFeaturesReadable[DocumentMLClassifierModel] with HasPretrained[DocumentMLClassifierModel]
Value Members
- object DocumentLogRegClassifierModel extends ReadablePretrainedDocumentLogRegClassifierModel with Serializable
- object DocumentMLClassifierModel extends ReadablePretrainedDocumentMLClassifierModel with Serializable
- object GenericLogRegClassifierApproach extends GenericLogRegClassifierApproach
- object GenericLogRegClassifierModel extends ReadsGenericClassifierGraph[GenericLogRegClassifierModel] with ReadablePretrainedGenericClassifier[GenericLogRegClassifierModel] with Serializable
- object GenericSVMClassifierApproach extends GenericSVMClassifierApproach
- object GenericSVMClassifierModel extends ReadsGenericClassifierGraph[GenericSVMClassifierModel] with ReadablePretrainedGenericClassifier[GenericSVMClassifierModel] with Serializable
-
object
MedicalBertForSequenceClassification extends ReadablePretrainedBertForSequenceModel with ReadMedicalBertForSequenceClassification with Serializable
This is the companion object of MedicalBertForSequenceClassification.
This is the companion object of MedicalBertForSequenceClassification. Please refer to that class for the documentation.
-
object
MedicalBertForTokenClassifier extends ReadablePretrainedBertForTokenModel with ReadBertForTokenTensorflowModel with Serializable
This is the companion object of MedicalBertForTokenClassifier.
This is the companion object of MedicalBertForTokenClassifier. Please refer to that class for the documentation.
-
object
MedicalDistilBertForSequenceClassification extends ReadablePretrainedDistilBertForSequenceModel with ReadDistilBertForSequenceTensorflowModel with Serializable
This is the companion object of MedicalDistilBertForSequenceClassification.
This is the companion object of MedicalDistilBertForSequenceClassification. Please refer to that class for the documentation.