package classification
- Alphabetic
- Public
- All
Type Members
-
class
DocumentLogRegClassifierApproach extends AnnotatorApproach[DocumentLogRegClassifierModel] with CheckLicense
Trains a model to classify documents with a Logarithmic Regression algorithm.
Trains a model to classify documents with a Logarithmic Regression algorithm. Training data requires columns for text and their label. The result is a trained DocumentLogRegClassifierModel.
Example
Define pipeline stages to prepare the data
val document_assembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val normalizer = new Normalizer() .setInputCols("token") .setOutputCol("normalized") val stopwords_cleaner = new StopWordsCleaner() .setInputCols("normalized") .setOutputCol("cleanTokens") .setCaseSensitive(false) val stemmer = new Stemmer() .setInputCols("cleanTokens") .setOutputCol("stem")
Define the document classifier and fit training data to it
val logreg = new DocumentLogRegClassifierApproach() .setInputCols("stem") .setLabelCol("category") .setOutputCol("prediction") val pipeline = new Pipeline().setStages(Array( document_assembler, tokenizer, normalizer, stopwords_cleaner, stemmer, logreg )) val model = pipeline.fit(trainingData)
- See also
DocumentLogRegClassifierModel for instantiated models
-
class
DocumentLogRegClassifierModel extends Model[DocumentLogRegClassifierModel] with RawAnnotator[DocumentLogRegClassifierModel] with CanBeLazy with CheckLicense
Classifies documents with a Logarithmic Regression algorithm.
Classifies documents with a Logarithmic Regression algorithm. Currently there are no pretrained models available. Please see DocumentLogRegClassifierApproach to train your own model.
Please check out the Models Hub for available models in the future.
-
class
MedicalBertForSequenceClassification extends AnnotatorModel[MedicalBertForSequenceClassification] with HasBatchedAnnotate[MedicalBertForSequenceClassification] with WriteTensorflowModel with HasCaseSensitiveProperties with CheckLicense
MedicalBertForSequenceClassification can load Bert Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
MedicalBertForSequenceClassification can load Bert Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val sequenceClassifier = MedicalBertForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"bert_sequence_classifier_ade"
, if no name is provided.For available pretrained models please see the Models Hub.
Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. The Spark NLP Workshop example shows how to import them https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val sequenceClassifier = MedicalBertForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, sequenceClassifier )) val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +--------------------+ |result | +--------------------+ |[True, False] | +--------------------+
- See also
MedicalBertForSequenceClassification for sequnece-level classification
Annotators Main Page for a list of transformer based classifiers
- class MedicalBertForTokenClassifier extends AnnotatorModel[MedicalBertForTokenClassifier] with HasBatchedAnnotate[MedicalBertForTokenClassifier] with WriteTensorflowModel with HasCaseSensitiveProperties with CheckLicense
-
class
MedicalDistilBertForSequenceClassification extends AnnotatorModel[MedicalDistilBertForSequenceClassification] with HasBatchedAnnotate[MedicalDistilBertForSequenceClassification] with WriteTensorflowModel with HasCaseSensitiveProperties with CheckLicense
MedicalDistilBertForSequenceClassification can load DistilBERT Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
MedicalDistilBertForSequenceClassification can load DistilBERT Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val sequenceClassifier = MedicalDistilBertForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label")
The default model is
"distilbert_base_sequence_classifier_imdb"
, if no name is provided.For available pretrained models please see the Models Hub.
Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. The Spark NLP Workshop example shows how to import them https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the MedicalDistilBertForSequenceClassificationTestSpec.
Example
import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val tokenizer = new Tokenizer() .setInputCols("document") .setOutputCol("token") val sequenceClassifier = MedicalDistilBertForSequenceClassification.pretrained() .setInputCols("token", "document") .setOutputCol("label") .setCaseSensitive(true) val pipeline = new Pipeline().setStages(Array( documentAssembler, tokenizer, sequenceClassifier )) val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text") val result = pipeline.fit(data).transform(data) result.select("label.result").show(false) +--------------------+ |result | +--------------------+ |[neg, neg] | |[pos, pos, pos, pos]| +--------------------+
- See also
MedicalDistilBertForSequenceClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
- trait ReadBertForTokenTensorflowModel extends ReadTensorflowModel
- trait ReadDistilBertForSequenceTensorflowModel extends ReadTensorflowModel
- trait ReadMedicalBertForSequenceClassification extends ReadTensorflowModel
- trait ReadablePretrainedBertForSequenceModel extends ParamsAndFeaturesReadable[MedicalBertForSequenceClassification] with HasPretrained[MedicalBertForSequenceClassification]
- trait ReadablePretrainedBertForTokenModel extends ParamsAndFeaturesReadable[MedicalBertForTokenClassifier] with HasPretrained[MedicalBertForTokenClassifier]
- trait ReadablePretrainedDistilBertForSequenceModel extends ParamsAndFeaturesReadable[MedicalDistilBertForSequenceClassification] with HasPretrained[MedicalDistilBertForSequenceClassification]
- trait ReadablePretrainedDocumentLogRegClassifierModel extends ParamsAndFeaturesReadable[DocumentLogRegClassifierModel] with HasPretrained[DocumentLogRegClassifierModel]
Value Members
- object DocumentLogRegClassifierModel extends ReadablePretrainedDocumentLogRegClassifierModel with Serializable
-
object
MedicalBertForSequenceClassification extends ReadablePretrainedBertForSequenceModel with ReadMedicalBertForSequenceClassification with Serializable
This is the companion object of MedicalBertForSequenceClassification.
This is the companion object of MedicalBertForSequenceClassification. Please refer to that class for the documentation.
-
object
MedicalBertForTokenClassifier extends ReadablePretrainedBertForTokenModel with ReadBertForTokenTensorflowModel with Serializable
This is the companion object of MedicalBertForTokenClassifier.
This is the companion object of MedicalBertForTokenClassifier. Please refer to that class for the documentation.
-
object
MedicalDistilBertForSequenceClassification extends ReadablePretrainedDistilBertForSequenceModel with ReadDistilBertForSequenceTensorflowModel with Serializable
This is the companion object of MedicalDistilBertForSequenceClassification.
This is the companion object of MedicalDistilBertForSequenceClassification. Please refer to that class for the documentation.