SWIN transformer based contrastive self-supervised learning for animal detection and classification

Agilandeeswari, L.; Meena, S. Divya

doi:10.1007/s11042-022-13629-x

SWIN transformer based contrastive self-supervised learning for animal detection and classification

Published: 01 September 2022

Volume 82, pages 10445–10470, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

L. Agilandeeswari¹ &
S. Divya Meena²

892 Accesses
Explore all metrics

Abstract

The subdomain of computer vision applications is Image Classification which helps in categorizing the images. The advent of handheld devices and image sensors leads to the availability of a huge amount of data without labels. Hence, to categorize these images, a supervised learning algorithm won’t be suitable as it requires labels. On the other hand, unsupervised learning uses clustering that also not useful as its accuracy is not reliable as the data are not labeled in advance. Self-Supervised Learning techniques can be used to overcome this problem. In this work, we present a novel Swin Transformer based Contrastive Self-Supervised Learning (Swin-TCSSL), where the paired sample is formed using the transformation of the given input image and this paired sample is passed to the Swin-T transformer which produces a feature vector. The maximum Mutual Information of these feature vectors is used to form robust clusters and these cluster labels get propagates to the Swin Transformer block until the appropriate clusters are obtained. It is then followed by contrastive learning and finally produces the classified output. The experimental results prove that the proposed system is invariant to occlusion, viewpoint variation, and illumination effects. The proposed Swin-TSSCL achieves state-of-the-art results in 5 benchmark datasets namely CIFAR-10, Snapshot Serengeti, Stanford dogs, Animals with attributes, and ImageNet dataset. As evident from the rigorous experiments, the proposed Swin-TCSSL has set a new global state-of-the-art with an average accuracy of 97.63%, which is comparatively higher than the state-of-the-art systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

UBHIC: Top-Down Semi-supervised Hierarchical Image Classification Algorithm

SCAN: Learning to Classify Images Without Labels

Self-supervised Learning: A Succinct Review

Article 20 January 2023

Data availability

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

Al-Halah Z, Stiefelhagen R (2015, January) How to transfer? Zero-shot object recognition via hierarchical transfer of semantic attributes. In: 2015 IEEE winter conference on applications of computer vision. IEEE. pp. 837-843
Bau D, Zhu JY, Strobelt H, Zhou B, Tenenbaum JB, Freeman WT, Torralba A (2019) Visualizing and understanding generative adversarial networks. arXiv preprint arXiv:1901.09887
Becker S, Hinton GE (1992) Self-organizing neural network that discovers surfaces in random-dot stereograms. Nature 355(6356):161–163
Article Google Scholar
Caron M, Touvron H, Misra I, Jégou H, Mairal J, Bojanowski P, Joulin A (2021) Emerging properties in self-supervised vision transformers. arXiv preprint arXiv:2104.14294
Chapelle O, Scholkopf B, Zien A (2009) Semi-supervised learning (chapelle, o. et al., eds.; 2006) [book reviews]. IEEE Trans Neural Netw 20(3):542–542
Chen T, Kornblith S, Norouzi M, Hinton G (2020, November) A simple framework for contrastive learning of visual representations. In: International conference on machine learning. PMLR. pp. 1597-1607
Chen X, Xie S, He K (2021) An empirical study of training self-supervised visual transformers. arXiv preprint arXiv:2104.02057
Dhillon IS, Mallela S, Modha DS (2003, August) Information-theoretic co-clustering. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. pp. 89-98
Dosovitskiy A, Springenberg JT, Riedmiller M, Brox T (2014) Discriminative unsupervised feature learning with convolutional neural networks. Advances Neural Inf Process Syst 27:766–774
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Friedman N, Mosenzon O, Slonim N, Tishby N (2013) Multivariate information bottleneck. arXiv preprint arXiv:1301.2270
Goyal P, Mahajan D, Gupta A, Misra I (2019) Scaling and benchmarking self-supervised visual representation learning. In: Proceedings of the IEEE international conference on computer vision. pp. 6391-6400
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pages 770–778
He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9729-9738
Hjelm RD, Fedorov A, Lavoie-Marchildon S, Grewal K, Bachman P, Trischler A, Bengio Y (2018) Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670
Hu W, Miyato T, Tokui S, Matsumoto E, Sugiyama M (2017) Learning discrete representations via information maximizing self-augmented training. arXiv preprint arXiv:1702.08720
Huang G, Liu Z, Maaten Lvd, Weinberger KQ (2017) Densely connected convolutional networks, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 2261–2269. https://doi.org/10.1109/CVPR.2017.243
Ji X, Henriques JF, Vedaldi A (2019) Invariant information clustering for unsupervised image classification and segmentation. In: Proceedings of the IEEE international conference on computer vision. pp. 9865-9874
Jing L, Tian Y (2020) Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans Pattern Anal Mach Intell 43(11):4037–4058
Article Google Scholar
Khosla A, Jayadevaprakash N, Yao B, Li FF (2011, June) Novel dataset for fine-grained image categorization: Stanford dogs. In: Proc. CVPR workshop on fine-grained visual categorization (FGVC). Vol. 2, no. 1
Li J, Zhou P, Xiong C, Socher R, Hoi SC (2020) Prototypical contrastive learning of unsupervised representations. arXiv preprint arXiv:2005.04966
Li C, Yang J, Zhang P, Gao M, Xiao B, Dai X, Yuan L, Gao J (n.d.) Efficient Self-supervised Vision Transformers for Representation Learning. https://doi.org/10.48550/arXiv.2106.09785
Liao X, Li K, Yin J (2017) Separable data hiding in encrypted image based on compressive sensing and discrete fourier transform. Multimed Tools Appl 76:20739–20753. https://doi.org/10.1007/s11042-016-3971-4
Article Google Scholar
Lin T-Y, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. In: The IEEE International Conference on Computer Vision, ICCV, pp. 2999–3007
Liu X, Zhang F, Hou Z, Mian L, Wang Z, Zhang J, Tang J (2021) Self-supervised learning: generative or contrastive. IEEE Trans Knowl Data Eng
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, ..., Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 10012–10022
Meena SD, Agilandeeswari L (2019) An efficient framework for animal breeds classification using semi-supervised learning and multi-part convolutional neural network (MP-CNN). IEEE Access 7:151783–151802
Article Google Scholar
Meena SD, Agilandeeswari L (2020) Stacked convolutional autoencoder for detecting animal images in cluttered scenes with a novel feature extraction framework. In: Soft computing for problem solving. Springer, Singapore. pp. 513–522
Meena D, Agilandeeswari L (2020) Invariant features-based fuzzy inference system for animal detection and recognition using thermal images. Int J Fuzzy Syst 22:1868–1879
Article Google Scholar
Meena SD, Agilandeeswari L (n.d.) Adaboost Cascade Classifier for Classification and Identification of Wild Animals using Movidius Neural Compute Stick
Meena SD, Loganathan A (2020) Intelligent animal detection system using sparse multi discriminative-neural network (SMD-NN) to mitigate animal-vehicle collision. Environ Sci Pollut Res 27:1–16
Article Google Scholar
Misra I, Maaten LVD (2020) Self-supervised learning of pretext-invariant representations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6707-6717
MohanRajan SN, Loganathan A (2021) Modelling spatial drivers for LU/LC change prediction using hybrid machine learning methods in Javadi Hills, Tamil Nadu, India. J Indian Soc Remote Sens 49:913–934
Article Google Scholar
Mohanrajan SN, Loganathan A (2022) Novel vision transformer–based bi-LSTM model for LU/LC prediction—Javadi Hills, India. Appl Sci 12(13):6387
Article Google Scholar
Navin MS, Agilandeeswari L (2020) Multispectral and hyperspectral images based land use/land cover change prediction analysis: an extensive review Multimed Tools Appl Scopus Indexed with Impact factor 2.313
Prabukumar M, Agilandeeswari L, Ganesan K (2018) An optimized lung Cancer diagnosis system using cuckoo search optimization and support vector machine classifier. J Ambient Intell Humanized Comput Springer
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Selvaraju RR, Das A, Vedantam R, Cogswell M, Parikh D, Batra D (2016) Grad-CAM: why did you say that?. arXiv preprint arXiv:1611.07450
Sohn K, Berthelot D, Li C L, Zhang Z, Carlini N, Cubuk ED, ..., Raffel C (2020) Fixmatch: Simplifying semi-supervised learning with consistency and confidence. arXiv preprint arXiv:2001.07685
Sundaram DM, Loganathan A (2020) FSSCaps-DetCountNet: fuzzy soft sets and CapsNet-based detection and counting network for monitoring animals from aerial images. J Appl Remote Sens 14(2):026521
Article Google Scholar
Sundaram DM, Loganathan A (2020) A new supervised clustering framework using multi discriminative parts and expectation–maximization approach for a fine-grained animal breed classification (SC-MPEM). Neural Process Lett 52(1):727–766
Swanson A, Kosmala M, Lintott C, Simpson R, Smith A, Packer C (2015) Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna. Sci Data 2(1):1–14
Article Google Scholar
Tian Y, Krishnan D, Isola P (2019) Contrastive multiview coding. arXiv preprint arXiv:1906.05849
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2020) Training data-efficient image transformers and distillation through attention. arXiv preprint arXiv:2012.12877
Van Gansbeke W, Vandenhende S, Georgoulis S, Proesmans M, Van Gool L (2020) Learning to classify images without labels. arXiv preprint arXiv:2005.12320
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems. pages 5998–6008
Wang J, Wang J, Song J, Xu XS, Shen HT, Li S (2014) Optimized cartesian k-means. IEEE Trans Knowl Data Eng 27(1):180–192
Article Google Scholar
Wu Z, Xiong Y, Yu SX, Lin D (2018) Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3733-3742
Xie J, Girshick R, Farhadi A (2016, June) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning. pp. 478-487
Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. Adv Neural Inf Proces Syst 17:1601–1608
Google Scholar
Zhong H, Chen C, Jin Z, Hua XS (2020) Deep robust clustering by contrastive learning. arXiv preprint arXiv:2008.03030
Zou W, Zhu S, Yu K, Ng A (2012) Deep learning of invariant features via simulated fixations in video. Adv Neural Inf Proces Syst 25:3203–3211

Download references

Author information

Authors and Affiliations

School of Information Technology and Engineering, VIT, Vellore, TN, India
L. Agilandeeswari
School of Computer Science and Engineering, VIT, Amaravathi, AP, India
S. Divya Meena

Authors

L. Agilandeeswari
View author publications
You can also search for this author in PubMed Google Scholar
S. Divya Meena
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to L. Agilandeeswari.

Ethics declarations

Conflict of interest

There is no Conflict of Interest or competing interests by the authors.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Agilandeeswari, L., Meena, S.D. SWIN transformer based contrastive self-supervised learning for animal detection and classification. Multimed Tools Appl 82, 10445–10470 (2023). https://doi.org/10.1007/s11042-022-13629-x

Download citation

Received: 12 February 2021
Revised: 05 July 2022
Accepted: 01 August 2022
Published: 01 September 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11042-022-13629-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SWIN transformer based contrastive self-supervised learning for animal detection and classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

UBHIC: Top-Down Semi-supervised Hierarchical Image Classification Algorithm

SCAN: Learning to Classify Images Without Labels

Self-supervised Learning: A Succinct Review

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

SWIN transformer based contrastive self-supervised learning for animal detection and classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

UBHIC: Top-Down Semi-supervised Hierarchical Image Classification Algorithm

SCAN: Learning to Classify Images Without Labels

Self-supervised Learning: A Succinct Review

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation