Swin transformer-based supervised hashing

Peng, Liangkang; Qian, Jiangbo; Wang, Chong; Liu, Baisong; Dong, Yihong

doi:10.1007/s10489-022-04410-6

Swin transformer-based supervised hashing

Published: 06 January 2023

Volume 53, pages 17548–17560, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Liangkang Peng¹,
Jiangbo Qian ORCID: orcid.org/0000-0003-4245-3246¹,
Chong Wang¹,
Baisong Liu¹ &
…
Yihong Dong¹

983 Accesses
10 Citations
1 Altmetric
Explore all metrics

Abstract

With the rapid development of the modern internet, image data are growing explosively. How to retrieve specific images from such big data has become an urgent problem. The common solution is the hash-based approximate nearest neighbor retrieval method, which uses compact binary hash codes to represent the original image data. When calculating the image similarity, it can quickly retrieve similar images by bit operation and requires only a small memory space to store hash codes. In recent years, the combination of deep learning and hash learning has led to breakthroughs in hash-based image retrieval methods. In particular, convolutional neural networks (CNNs) are widely used in various deep hashing methods. However, CNNs cannot capture global image information well when extracting image features, which affects the quality of the hash codes. Therefore, we first introduce the Swin Transformer network into hash learning and propose Swin Transformer-based supervised hashing (SWTH). Using the Swin Transformer as the feature extraction backbone network, we can capture the global context information of an image as much as possible by establishing the relations among different blocks of the image. Furthermore, the Swin Transformer adopts a hierarchical structure of layer-by-layer downsampling, which can obtain rich multiscale feature information while extracting global information. After the feature extraction network, we add a hash layer for hash learning. The image feature representation and hash function can be learned by optimizing the combination of hash loss, classification loss and quantization loss. Extensive experimental results show that the SWTH method outperforms many state-of-the-art methods and achieves excellent retrieval performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 7

Fig. 8

A novel deep hashing method for fast image retrieval

Article 13 August 2018

Deep Multi-Scale Hashing for Image Retrieval (DMSH)

Multi-feature Fusion-Based Central Similarity Deep Supervised Hashing

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

Notes

https://www.cs.toronto.edu/~kriz/cifar.html
https://image-net.org/index.php
The SWTH source codes could be downloaded from https://github.com/plk-t/SWTH

References

Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval. ACM Press New York, vol 463
Cakir F, He K, Bargal SA, Sclaroff S (2019) Hashing with mutual information. IEEE Trans Pattern Anal Mach Intell 41(10):2424–2437
Article Google Scholar
Cao Z, Long M, Wang J, Yu PS (2017) Hashnet: Deep learning to hash by continuation. In: Proceedings of the IEEE international conference on computer vision, pp 5608–5617
Chen Z, Yuan X, Lu J, Tian Q, Zhou J (2018) Deep hashing via discrepancy minimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6838–6847
Dmochowski JP, Sajda P, Parra LC (2010) Maximum likelihood in cost-sensitive learning: model specification, approximations, and upper bounds. J Mach Learn Res, vol 11(12)
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: 9th International conference on learning representations, ICLR 2021, virtual event, Austria, 3-7 May 2021
Fan L, Ng KW, Ju C, Zhang T, Chan CS (2021) Deep polarized network for supervised learning of accurate binary hashing codes. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence, p 7
Gionis A, Indyk P, Motwani R et al (1999) Similarity search in high dimensions via hashing. In: Vldb, vol 99, pp 518–529
Gong Y, Lazebnik S, Gordo A, Perronnin F (2012) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35 (12):2916–2929
Article Google Scholar
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images
Kulis B, Darrell T (2009) Learning to hash with binary reconstructive embeddings. Adv Neural Inf Process Syst, vol 22
Lai H, Pan Y, Liu Y, Yan S (2015) Simultaneous feature learning and hash coding with deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3270–3278
Liu W, Wang J, Ji R, Jiang YG, Chang SF (2012) Supervised hashing with kernels. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 2074–2081
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Lu J, Chen M, Sun Y, Wang W, Wang Y, Yang X (2021) A smart adversarial attack on deep hashing based image retrieval. In: Proceedings of the 2021 international conference on multimedia retrieval, pp 227–235
Luo W, Li Y, Urtasun R, Zemel R (2016) Understanding the effective receptive field in deep convolutional neural networks. Adv Neural Inf Process Syst, vol 29
Miao S, Du S, Feng R, Zhang Y, Li H, Liu T, Zheng L, Fan W (2022) Balanced single-shot object detection using cross-context attention-guided network. Pattern Recognit 122:108258
Article Google Scholar
Morgado P, Li Y, Costa Pereira J, Saberian M, Vasconcelos N (2021) Deep hashing with hash-consistent large margin proxy embeddings. Int J Comput Vis 129(2):419–438
Article Google Scholar
Peng J, Wang H, Yue S, Zhang Z (2022) Context-aware co-supervision for accurate object detection. Pattern Recognit 121:108199
Article Google Scholar
Plichoski GF, Chidambaram C, Parpinelli RS (2021) A face recognition framework based on a pool of techniques and differential evolution. Inf Sci 543:219–241
Article Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Shen F, Shen C, Liu W, Tao Shen H (2015) Supervised discrete hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 37–45
Shen X, Dong G, Zheng Y, Lan L, Tsang I, Sun Q (2021) Deep co-image-label hashing for multi-label image retrieval. IEEE Trans Multimed
Su S, Zhang C, Han K, Tian Y (2018) Greedy hash: towards fast optimization for accurate hash coding in cnn. Adv Neural Inf Process Syst, vol 31
Sun P, Wu J, Li S, Lin P, Huang J, Li X (2021) Real-time semantic segmentation via auto depth, downsampling joint decision and feature aggregation. Int J Comput Vis 129(5):1506–1525
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst, vol 30
Wang J, Zhang T, Sebe N, Shen HT et al (2017) A survey on learning to hash. IEEE Trans Pattern Anal Mach Intell 40(4):769–790
Article Google Scholar
Wang W, Zhang H, Zhang Z, Liu L, Shao L (2021) Sparse graph based self-supervised hashing for scalable image retrieval. Inf Sci 547:622–640
Article MathSciNet MATH Google Scholar
Wang Y, Ou X, Liang J, Sun Z (2020) Deep semantic reconstruction hashing for similarity retrieval. IEEE Trans Circuits Syst Video Technol 31(1):387–400
Article Google Scholar
Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: Twenty-eighth AAAI conference on artificial intelligence
Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang ZH, Tay FE, Feng J, Yan S (2021) Tokens-to-token vit: training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 558– 567
Yuan M, Qin B, Li J, Qian J, Xin Y (2021) Hidden multi-distance loss-based full-convolution hashing. Appl Soft Comput 109:107508
Article Google Scholar
Zhai H, Lai S, Jin H, Qian X, Mei T (2021) Deep transfer hashing for image retrieval. IEEE Trans Circuits Syst Video Technol, vol 31
Zhang B, Qian J, Xie X, Xin Y, Dong Y (2021) Capsnet-based supervised hashing. Appl Intell 51(8):5912–5926
Article Google Scholar
Zhang D, Wu XJ (2022) Robust and discrete matrix factorization hashing for cross-modal retrieval. Pattern Recogn, vol 122
Zhang D, Wu XJ (2022) Scalable discrete matrix factorization and semantic autoencoder for cross-media retrieval. IEEE Trans Cybern, vol 52
Zhang D, Wu XJ, Xu T, Kittler J (2022) Watch: two-stage discrete cross-media hashing. IEEE Trans Knowl Data Eng
Zhang D, Wu XJ, Xu T, Yin H (2021) Dah: discrete asymmetric hashing for efficient cross-media retrieval. IEEE Trans Knowl Data Eng
Zhang D, Wu XJ, Yu J (2021) Discrete bidirectional matrix factorization hashing for zero-shot cross-media retrieval. In: Pattern recognition and computer vision, pp 524–536
Zhang D, Wu XJ, Yu J (2021) Label consistent flexible matrix factorization hashing for efficient cross-modal retrieval. ACM Trans Multimed Comput Commun Appl, vol 17
Zhou B, Khosla A, Lapedriza À, Oliva A, Torralba A (2015) Object detectors emerge in deep scene cnns. In: 3rd International conference on learning representations, ICLR 2015. Conference track proceedings, San Diego, CA, USA, 7-9 May 2015
Zhu H, Long M, Wang J, Cao Y (2016) Deep hashing network for efficient similarity retrieval. In: Proceedings of the AAAI conference on artificial intelligence, vol 30

Download references

Acknowledgements

This work was supported in part by China NSF Grant No. 62271274, Zhejiang NSF Grant No. LZ20F020001 and No. LY20F020009, and the programs sponsored by K. C. Wong Magna Fund in Ningbo University. The authors wish to thank the handling editor and anonymous reviewers for their time and constructive suggestions to improve the paper. (Corresponding author: Jiangbo Qian.)

Author information

Authors and Affiliations

Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo, China
Liangkang Peng, Jiangbo Qian, Chong Wang, Baisong Liu & Yihong Dong

Authors

Liangkang Peng
View author publications
You can also search for this author inPubMed Google Scholar
Jiangbo Qian
View author publications
You can also search for this author inPubMed Google Scholar
Chong Wang
View author publications
You can also search for this author inPubMed Google Scholar
Baisong Liu
View author publications
You can also search for this author inPubMed Google Scholar
Yihong Dong
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jiangbo Qian.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Peng, L., Qian, J., Wang, C. et al. Swin transformer-based supervised hashing. Appl Intell 53, 17548–17560 (2023). https://doi.org/10.1007/s10489-022-04410-6

Download citation

Accepted: 13 December 2022
Published: 06 January 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s10489-022-04410-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Swin transformer-based supervised hashing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A novel deep hashing method for fast image retrieval

Deep Multi-Scale Hashing for Image Retrieval (DMSH)

Multi-feature Fusion-Based Central Similarity Deep Supervised Hashing

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now