classification

package classification

Ordering

Alphabetic

Visibility

Public
All

Type Members

class DocumentLogRegClassifierApproach extends AnnotatorApproach[DocumentLogRegClassifierModel] with CheckLicense

Trains a model to classify documents with a Logarithmic Regression algorithm.

Trains a model to classify documents with a Logarithmic Regression algorithm. Training data requires columns for text and their label. The result is a trained DocumentLogRegClassifierModel.

Example

Define pipeline stages to prepare the data

val document_assembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val normalizer = new Normalizer()
  .setInputCols("token")
  .setOutputCol("normalized")

val stopwords_cleaner = new StopWordsCleaner()
  .setInputCols("normalized")
  .setOutputCol("cleanTokens")
  .setCaseSensitive(false)

val stemmer = new Stemmer()
  .setInputCols("cleanTokens")
  .setOutputCol("stem")

Define the document classifier and fit training data to it

val logreg = new DocumentLogRegClassifierApproach()
  .setInputCols("stem")
  .setLabelCol("category")
  .setOutputCol("prediction")

val pipeline = new Pipeline().setStages(Array(
  document_assembler,
  tokenizer,
  normalizer,
  stopwords_cleaner,
  stemmer,
  logreg
))

val model = pipeline.fit(trainingData)

See also: DocumentLogRegClassifierModel for instantiated models

class DocumentLogRegClassifierModel extends Model[DocumentLogRegClassifierModel] with RawAnnotator[DocumentLogRegClassifierModel] with CanBeLazy with CheckLicense
Classifies documents with a Logarithmic Regression algorithm.
Classifies documents with a Logarithmic Regression algorithm. Currently there are no pretrained models available. Please see DocumentLogRegClassifierApproach to train your own model.
Please check out the Models Hub for available models in the future.

class DocumentMLClassifierApproach extends AnnotatorApproach[DocumentMLClassifierModel] with DocumentMLClassifierParams with CheckLicense

Trains a model to classify documents with a Logarithmic Regression algorithm.

Trains a model to classify documents with a Logarithmic Regression algorithm. Training data requires columns for text and their label. The result is a trained DocumentMLClassifierModel.

Example

Define pipeline stages to prepare the data

val document_assembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val normalizer = new Normalizer()
  .setInputCols("token")
  .setOutputCol("normalized")

val stopwords_cleaner = new StopWordsCleaner()
  .setInputCols("normalized")
  .setOutputCol("cleanTokens")
  .setCaseSensitive(false)

val stemmer = new Stemmer()
  .setInputCols("cleanTokens")
  .setOutputCol("stem")

Define the document classifier and fit training data to it

val logreg = new DocumentMLClassifierApproach()
  .setInputCols("stem")
  .setLabelCol("category")
  .setOutputCol("prediction")

val pipeline = new Pipeline().setStages(Array(
  document_assembler,
  tokenizer,
  normalizer,
  stopwords_cleaner,
  stemmer,
  logreg
))

val model = pipeline.fit(trainingData)

See also: DocumentMLClassifierModel for instantiated models

class DocumentMLClassifierModel extends Model[DocumentMLClassifierModel] with RawAnnotator[DocumentMLClassifierModel] with DocumentMLClassifierParams with CanBeLazy with CheckLicense
Classifies documents with a Logarithmic Regression algorithm.
Classifies documents with a Logarithmic Regression algorithm. Currently there are no pretrained models available. Please see DocumentMLClassifierApproach to train your own model.
Please check out the Models Hub for available models in the future.
trait DocumentMLClassifierParams extends Params

class FewShotAssertionClassifierApproach extends GenericClassifierApproach with WhiteAndBlackListParams

An implementation of multinomial logistic regression.

Example

Define pipeline stages to prepare the data

 val documentAssembler = new DocumentAssembler()
   .setInputCol("text")
   .setOutputCol("document")
 val sentenceEmbeddings = BertSentenceEmbeddings
   .pretrained()
   .setInputCols(Array("document"))
   .setOutputCol("sentence_embedding")
 val featuresAssembler = new FeaturesAssembler()
   .setInputCols(Array("sentence_embedding"))
   .setOutputCol("feature_vector")
 val logRegClassifier = new FewShotAssertionClassifierApproach()
   .setInputCols("feature_vector")
   .setOutputCol("prediction")
   .setLabelColumn("label")
   .setModelFile("src/test/resources/classification/log_reg_graph.pb")
   .setEpochsNumber(10)
   .setBatchSize(1)
   .setMultiClass(false)
   .setlearningRate(0.01f)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  sentenceEmbeddings,
  featuresAssembler,
  logRegClassifier,
))

val model = pipeline.fit(trainingData)

See also: DocumentLogRegClassifierModel for instantiated models

class FewShotAssertionClassifierModel extends GenericClassifierModel with HasStorageRef with WhiteAndBlackListParams with WriteOnnxModel

FewShotAssertionClassifierModel does assertion classification using can run large (LLMS based) few shot classifiers based on the SetFit approach.

Example

Define pipeline stages to prepare the data

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val sentenceDetector = new SentenceDetector()
   .setInputCols(Array("document"))
   .setOutputCol("sentences")

val tokenizer = Tokenizer()
   .setInputCols(Array("sentence"))
   .setOutputCol("token")

val embeddings = WordEmbeddingsModel
   .pretrained("embeddings_clinical", "en", "clinical/models")
   .setInputCols(Array("sentence", "token"))
   .setOutputCol("embeddings")
   .setCaseSensitive(False)

val ner = MedicalNerModel
   .pretrained("ner_jsl", "en", "clinical/models")
   .setInputCols(["sentence", "token", "embeddings"])
   .setOutputCol("ner")

val nerConverter = NerConverter()
   .setInputCols(Array("sentence", "token", "ner"))
   .setWhiteList("Disease_Syndrome_Disorder", "Hypertension")
   .setOutputCol("ner_chunk")

 val fewShotAssertionClassifier = LargeFewShotClassifierModel
   .pretrained("clinical_assertion")
   .setInputCols(Array("sentence"))
   .setBatchSize(1)
   .setOutputCol("label")

 val pipeline = new Pipeline().setStages(Array(
  documentAssembler, sentenceDetector, tokenizer, embeddings, ner, nerConverter, fewShotAssertionClassifier))

 val model = pipeline.fit(Seq().toDS.toDF("text"))
 val results = model.transform(
   Seq("Includes hypertension and chronic obstructive pulmonary disease.").toDS.toDF("text"))

 results
   .selectExpr("explode(assertion) as assertion")
   .selectExpr("assertion.result", "assertion.metadata.chunk", "assertion.metadata.confidence")
   .show(truncate = false)

+-------+-------------------------------------+----------+
|result |chunk                                |confidence|
+-------+-------------------------------------+----------+
|present|hypertension                         |1.0       |
|present|chronic obstructive pulmonary disease|1.0       |
|absent |arteriovenous malformations          |1.0       |
|absent |vascular malformation                |0.9999997 |
+-------+-------------------------------------+----------+

See also: LargeFewShotClassifierModel for instantiated models
https://arxiv.org/abs/2209.11055 for details about the SetFit approach

class FewShotClassifierApproach extends GenericLogRegClassifierApproach
class FewShotClassifierModel extends GenericLogRegClassifierModel

class GenericLogRegClassifierApproach extends GenericClassifierApproach

An implementation of multinomial logistic regression.

Example

Define pipeline stages to prepare the data

 val documentAssembler = new DocumentAssembler()
   .setInputCol("text")
   .setOutputCol("document")
 val sentenceEmbeddings = BertSentenceEmbeddings
   .pretrained()
   .setInputCols(Array("document"))
   .setOutputCol("sentence_embedding")
 val featuresAssembler = new FeaturesAssembler()
   .setInputCols(Array("sentence_embedding"))
   .setOutputCol("feature_vector")
 val logRegClassifier = new GenericLogRegClassifierApproach()
   .setInputCols("feature_vector")
   .setOutputCol("prediction")
   .setLabelColumn("label")
   .setModelFile("src/test/resources/classification/log_reg_graph.pb")
   .setEpochsNumber(10)
   .setBatchSize(1)
   .setMultiClass(false)
   .setlearningRate(0.01f)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  sentenceEmbeddings,
  featuresAssembler,
  logRegClassifier,
))

val model = pipeline.fit(trainingData)

See also: DocumentLogRegClassifierModel for instantiated models

class GenericLogRegClassifierModel extends GenericClassifierModel with ParamsAndFeaturesWritable
Logistic regression classification
Logistic regression classification
Please check out the Models Hub for available models.

class GenericSVMClassifierApproach extends GenericClassifierApproach

An implementation of Support Vector Machine (SVM) classification .

Example

Define pipeline stages to prepare the data

 val documentAssembler = new DocumentAssembler()
   .setInputCol("text")
   .setOutputCol("document")
 val sentenceEmbeddings = BertSentenceEmbeddings
   .pretrained()
   .setInputCols(Array("document"))
   .setOutputCol("sentence_embedding")
 val featuresAssembler = new FeaturesAssembler()
   .setInputCols(Array("sentence_embedding"))
   .setOutputCol("feature_vector")
 val svmClassifier = new GenericSVMClassifierApproach()
   .setInputCols("feature_vector")
   .setOutputCol("prediction")
   .setLabelColumn("label")
   .setModelFile("src/test/resources/classification/svm_graph.pb")
   .setEpochsNumber(10)
   .setBatchSize(1)
   .setMultiClass(false)
   .setlearningRate(0.01f)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  sentenceEmbeddings,
  featuresAssembler,
  svmClassifier,
))

val model = pipeline.fit(trainingData)

See also: DocumentLogRegClassifierModel for instantiated models

class GenericSVMClassifierModel extends GenericClassifierModel with ParamsAndFeaturesWritable
Support vector machine (SVM) classification
Support vector machine (SVM) classification
Please check out the Models Hub for available models.

class LargeFewShotClassifierModel extends AnnotatorModel[LargeFewShotClassifierModel] with HasStorageRef with WriteOnnxModel with HasCaseSensitiveProperties with ParamsAndFeaturesWritable with HasBatchedAnnotate[LargeFewShotClassifierModel] with CheckLicense

LargeFewShotClassifierModel annotator can run large (LLMS based) few shot classifiers based on the SetFit approach.

Example

Define pipeline stages to prepare the data

val document_assembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")


 val largeFewShotClassifier = LargeFewShotClassifierModel.pretrained()
   .setInputCols(Array("document"))
   .setBatchSize(1)
   .setOutputCol("label")

 val pipeline = new Pipeline().setStages(Array(
  document_assembler,
  largeFewShotClassifier))

 val model = pipeline.fit(Seq().toDS.toDF("text"))
 val results = model.transform(
   Seq("I felt a bit drowsy and had blurred vision after taking Aspirin.").toDS.toDF("text"))

 results
   .selectExpr("explode(label) as label")
   .select("label.result", "label.metadata.confidence").show()

+------+----------+
|result|confidence|
+------+----------+
|   ADE| 0.9672883|
+------+----------+

See also: LargeFewShotClassifierModel for instantiated models
https://arxiv.org/abs/2209.11055 for details about the SetFit approach

class MedicalBertForSequenceClassification extends AnnotatorModel[MedicalBertForSequenceClassification] with HasBatchedAnnotate[MedicalBertForSequenceClassification] with WriteTensorflowModel with WriteOnnxModel with HasCaseSensitiveProperties with HasEngine with CheckLicense
MedicalBertForSequenceClassification can load Bert Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
MedicalBertForSequenceClassification can load Bert Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with pretrained of the companion object:
```
val sequenceClassifier = MedicalBertForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
```
The default model is "bert_sequence_classifier_ade", if no name is provided.
For available pretrained models please see the Models Hub.
Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. The Spark NLP Workshop example shows how to import them https://github.com/JohnSnowLabs/spark-nlp/discussions/5669.
Example
```
import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val sequenceClassifier = MedicalBertForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
  .setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  sequenceClassifier
))

val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+--------------------+
|result              |
+--------------------+
|[True, False]       |
+--------------------+
```
See also
MedicalBertForSequenceClassification for sequnece-level classification
Annotators Main Page for a list of transformer based classifiers
class MedicalBertForTokenClassifier extends AnnotatorModel[MedicalBertForTokenClassifier] with HasBatchedAnnotate[MedicalBertForTokenClassifier] with WriteTensorflowModel with WriteOnnxModel with HasCaseSensitiveProperties with HasEngine with CheckLicense
class MedicalDistilBertForSequenceClassification extends AnnotatorModel[MedicalDistilBertForSequenceClassification] with HasBatchedAnnotate[MedicalDistilBertForSequenceClassification] with WriteTensorflowModel with WriteOnnxModel with HasCaseSensitiveProperties with HasEngine with CheckLicense
MedicalDistilBertForSequenceClassification can load DistilBERT Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g.
MedicalDistilBertForSequenceClassification can load DistilBERT Models with sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for multi-class document classification tasks.
Pretrained models can be loaded with pretrained of the companion object:
```
val sequenceClassifier = MedicalDistilBertForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
```
The default model is "distilbert_base_sequence_classifier_imdb", if no name is provided.
For available pretrained models please see the Models Hub.
Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. The Spark NLP Workshop example shows how to import them https://github.com/JohnSnowLabs/spark-nlp/discussions/5669. and the MedicalDistilBertForSequenceClassificationTestSpec.
Example
```
import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols("document")
  .setOutputCol("token")

val sequenceClassifier = MedicalDistilBertForSequenceClassification.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")
  .setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
  documentAssembler,
  tokenizer,
  sequenceClassifier
))

val data = Seq("John Lenon was born in London and lived in Paris. My name is Sarah and I live in London").toDF("text")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+--------------------+
|result              |
+--------------------+
|[neg, neg]          |
|[pos, pos, pos, pos]|
+--------------------+
```
See also
MedicalDistilBertForSequenceClassification for sequence-level classification
Annotators Main Page for a list of transformer based classifiers
trait ReadBertForTokenTensorflowModel extends ReadTensorflowModel with ReadOnnxModel
trait ReadDistilBertForSequenceTensorflowModel extends ReadTensorflowModel with ReadOnnxModel
trait ReadFewShotAssertionClassifierModel extends InternalReadOnnxModel with ReadsGenericClassifierGraph[FewShotAssertionClassifierModel]
trait ReadLargeFewShotClassifierModel extends InternalReadOnnxModel
trait ReadMedicalBertForSequenceClassification extends ReadTensorflowModel with ReadOnnxModel
trait ReadablePretrainedBertForSequenceModel extends ParamsAndFeaturesReadable[MedicalBertForSequenceClassification] with HasPretrained[MedicalBertForSequenceClassification]
trait ReadablePretrainedBertForTokenModel extends ParamsAndFeaturesReadable[MedicalBertForTokenClassifier] with HasPretrained[MedicalBertForTokenClassifier]
trait ReadablePretrainedDistilBertForSequenceModel extends ParamsAndFeaturesReadable[MedicalDistilBertForSequenceClassification] with HasPretrained[MedicalDistilBertForSequenceClassification]
trait ReadablePretrainedDocumentLogRegClassifierModel extends ParamsAndFeaturesReadable[DocumentLogRegClassifierModel] with HasPretrained[DocumentLogRegClassifierModel]
trait ReadablePretrainedDocumentMLClassifierModel extends ParamsAndFeaturesReadable[DocumentMLClassifierModel] with HasPretrained[DocumentMLClassifierModel]
trait ReadablePretrainedFewShotAssertionClassifierModel extends ParamsAndFeaturesReadable[FewShotAssertionClassifierModel] with HasPretrained[FewShotAssertionClassifierModel]
trait ReadablePretrainedLargeFewShotClassifierModel extends ParamsAndFeaturesReadable[LargeFewShotClassifierModel] with HasPretrained[LargeFewShotClassifierModel]

Value Members

object DocumentLogRegClassifierModel extends ReadablePretrainedDocumentLogRegClassifierModel with Serializable
object DocumentMLClassifierModel extends ReadablePretrainedDocumentMLClassifierModel with Serializable
object FewShotAssertionClassifierApproach extends FewShotAssertionClassifierApproach
object FewShotAssertionClassifierModel extends ReadablePretrainedFewShotAssertionClassifierModel with ReadFewShotAssertionClassifierModel with Serializable
object FewShotClassifierApproach extends FewShotClassifierApproach
object FewShotClassifierModel extends ReadsGenericClassifierGraph[FewShotClassifierModel] with ReadablePretrainedGenericClassifier[FewShotClassifierModel] with Serializable
object GenericLogRegClassifierApproach extends GenericLogRegClassifierApproach
object GenericLogRegClassifierModel extends ReadsGenericClassifierGraph[GenericLogRegClassifierModel] with ReadablePretrainedGenericClassifier[GenericLogRegClassifierModel] with Serializable
object GenericSVMClassifierApproach extends GenericSVMClassifierApproach
object GenericSVMClassifierModel extends ReadsGenericClassifierGraph[GenericSVMClassifierModel] with ReadablePretrainedGenericClassifier[GenericSVMClassifierModel] with Serializable
object LargeFewShotClassifierModel extends ReadablePretrainedLargeFewShotClassifierModel with ReadLargeFewShotClassifierModel with Serializable
object MedicalBertForSequenceClassification extends ReadablePretrainedBertForSequenceModel with ReadMedicalBertForSequenceClassification with Serializable
This is the companion object of MedicalBertForSequenceClassification.
This is the companion object of MedicalBertForSequenceClassification. Please refer to that class for the documentation.
object MedicalBertForTokenClassifier extends ReadablePretrainedBertForTokenModel with ReadBertForTokenTensorflowModel with Serializable
This is the companion object of MedicalBertForTokenClassifier.
This is the companion object of MedicalBertForTokenClassifier. Please refer to that class for the documentation.
object MedicalDistilBertForSequenceClassification extends ReadablePretrainedDistilBertForSequenceModel with ReadDistilBertForSequenceTensorflowModel with Serializable
This is the companion object of MedicalDistilBertForSequenceClassification.
This is the companion object of MedicalDistilBertForSequenceClassification. Please refer to that class for the documentation.

Packages

classification

package classification

Type Members

Example

Example

Example

Example

Example

Example

Example

Example

Example

Value Members

Ungrouped

Packages

classification 

package classification

Type Members

Example

Example

Example

Example

Example

Example

Example

Example

Example

Value Members

Ungrouped

classification