package cv
- Alphabetic
- Public
- All
Type Members
- trait ReadSwinForImageDLModel extends ReadTensorflowModel
- trait ReadViTForImageDLModel extends ReadTensorflowModel
- trait ReadablePretrainedSwinForImageModel extends ParamsAndFeaturesReadable[SwinForImageClassification] with HasPretrained[SwinForImageClassification]
- trait ReadablePretrainedViTForImageModel extends ParamsAndFeaturesReadable[ViTForImageClassification] with HasPretrained[ViTForImageClassification]
-
class
SwinForImageClassification extends ViTForImageClassification
SwinImageClassification is an image classifier based on Swin.
SwinImageClassification is an image classifier based on Swin.
The Swin Transformer was proposed in Swin Transformer: Hierarchical Vision Transformer using Shifted Windows by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
It is basically a hierarchical Transformer whose representation is computed with shifted windows. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection.
Pretrained models can be loaded with
pretrained
of the companion object:val imageClassifier = SwinForImageClassification.pretrained() .setInputCols("image_assembler") .setOutputCol("class")
The default model is
"image_classifier_swin_base_patch4_window7_224"
, if no name is provided.For available pretrained models please see the Models Hub.
Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see SwinForImageClassificationTest.
References:
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Paper Abstract:
This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text. To address these differences, we propose a hierarchical Transformer whose representation is computed with Shifted windows. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection. This hierarchical architecture has the flexibility to model at various scales and has linear computational complexity with respect to image size. These qualities of Swin Transformer make it compatible with a broad range of vision tasks, including image classification (87.3 top-1 accuracy on ImageNet-1K) and dense prediction tasks such as object detection (58.7 box AP and 51.1 mask AP on COCO test- dev) and semantic segmentation (53.5 mIoU on ADE20K val). Its performance surpasses the previous state-of-the- art by a large margin of +2.7 box AP and +2.6 mask AP on COCO, and +3.2 mIoU on ADE20K, demonstrating the potential of Transformer-based models as vision backbones. The hierarchical design and the shifted window approach also prove beneficial for all-MLP architectures.
Example
import com.johnsnowlabs.nlp.annotator._ import com.johnsnowlabs.nlp.ImageAssembler import org.apache.spark.ml.Pipeline val imageDF: DataFrame = spark.read .format("image") .option("dropInvalid", value = true) .load("src/test/resources/image/") val imageAssembler = new ImageAssembler() .setInputCol("image") .setOutputCol("image_assembler") val imageClassifier = SwinForImageClassification .pretrained() .setInputCols("image_assembler") .setOutputCol("class") val pipeline = new Pipeline().setStages(Array(imageAssembler, imageClassifier)) val pipelineDF = pipeline.fit(imageDF).transform(imageDF) pipelineDF .selectExpr("reverse(split(image.origin, '/'))[0] as image_name", "class.result") .show(truncate = false) +-----------------+----------------------------------------------------------+ |image_name |result | +-----------------+----------------------------------------------------------+ |palace.JPEG |[palace] | |egyptian_cat.jpeg|[tabby, tabby cat] | |hippopotamus.JPEG|[hippopotamus, hippo, river horse, Hippopotamus amphibius]| |hen.JPEG |[hen] | |ostrich.JPEG |[ostrich, Struthio camelus] | |junco.JPEG |[junco, snowbird] | |bluetick.jpg |[bluetick] | |chihuahua.jpg |[Chihuahua] | |tractor.JPEG |[tractor] | |ox.JPEG |[ox] | +-----------------+----------------------------------------------------------+
-
class
ViTForImageClassification extends AnnotatorModel[ViTForImageClassification] with HasBatchedAnnotateImage[ViTForImageClassification] with HasImageFeatureProperties with WriteTensorflowModel with HasEngine
Vision Transformer (ViT) for image classification.
Vision Transformer (ViT) for image classification.
ViT is a transformer based alternative to the convolutional neural networks usually used for image recognition tasks.
Pretrained models can be loaded with
pretrained
of the companion object:val imageClassifier = ViTForImageClassification.pretrained() .setInputCols("image_assembler") .setOutputCol("class")
The default model is
"image_classifier_vit_base_patch16_224"
, if no name is provided.For available pretrained models please see the Models Hub.
Models from the HuggingFace 🤗 Transformers library are also compatible with Spark NLP 🚀. To see which models are compatible and how to import them see https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended examples, see ViTImageClassificationTestSpec.
References:
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper Abstract:
While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
Example
import com.johnsnowlabs.nlp.annotator._ import com.johnsnowlabs.nlp.ImageAssembler import org.apache.spark.ml.Pipeline val imageDF: DataFrame = spark.read .format("image") .option("dropInvalid", value = true) .load("src/test/resources/image/") val imageAssembler = new ImageAssembler() .setInputCol("image") .setOutputCol("image_assembler") val imageClassifier = ViTForImageClassification .pretrained() .setInputCols("image_assembler") .setOutputCol("class") val pipeline = new Pipeline().setStages(Array(imageAssembler, imageClassifier)) val pipelineDF = pipeline.fit(imageDF).transform(imageDF) pipelineDF .selectExpr("reverse(split(image.origin, '/'))[0] as image_name", "class.result") .show(truncate = false) +-----------------+----------------------------------------------------------+ |image_name |result | +-----------------+----------------------------------------------------------+ |palace.JPEG |[palace] | |egyptian_cat.jpeg|[Egyptian cat] | |hippopotamus.JPEG|[hippopotamus, hippo, river horse, Hippopotamus amphibius]| |hen.JPEG |[hen] | |ostrich.JPEG |[ostrich, Struthio camelus] | |junco.JPEG |[junco, snowbird] | |bluetick.jpg |[bluetick] | |chihuahua.jpg |[Chihuahua] | |tractor.JPEG |[tractor] | |ox.JPEG |[ox] | +-----------------+----------------------------------------------------------+
Value Members
-
object
SwinForImageClassification extends ReadablePretrainedSwinForImageModel with ReadSwinForImageDLModel with Serializable
This is the companion object of SwinForImageClassification.
This is the companion object of SwinForImageClassification. Please refer to that class for the documentation.
-
object
ViTForImageClassification extends ReadablePretrainedViTForImageModel with ReadViTForImageDLModel with Serializable
This is the companion object of ViTForImageClassification.
This is the companion object of ViTForImageClassification. Please refer to that class for the documentation.