Abstract
With the rapid development of the modern internet, image data are growing explosively. How to retrieve specific images from such big data has become an urgent problem. The common solution is the hash-based approximate nearest neighbor retrieval method, which uses compact binary hash codes to represent the original image data. When calculating the image similarity, it can quickly retrieve similar images by bit operation and requires only a small memory space to store hash codes. In recent years, the combination of deep learning and hash learning has led to breakthroughs in hash-based image retrieval methods. In particular, convolutional neural networks (CNNs) are widely used in various deep hashing methods. However, CNNs cannot capture global image information well when extracting image features, which affects the quality of the hash codes. Therefore, we first introduce the Swin Transformer network into hash learning and propose Swin Transformer-based supervised hashing (SWTH). Using the Swin Transformer as the feature extraction backbone network, we can capture the global context information of an image as much as possible by establishing the relations among different blocks of the image. Furthermore, the Swin Transformer adopts a hierarchical structure of layer-by-layer downsampling, which can obtain rich multiscale feature information while extracting global information. After the feature extraction network, we add a hash layer for hash learning. The image feature representation and hash function can be learned by optimizing the combination of hash loss, classification loss and quantization loss. Extensive experimental results show that the SWTH method outperforms many state-of-the-art methods and achieves excellent retrieval performance.








Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Notes
The SWTH source codes could be downloaded from https://github.com/plk-t/SWTH
References
Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval. ACM Press New York, vol 463
Cakir F, He K, Bargal SA, Sclaroff S (2019) Hashing with mutual information. IEEE Trans Pattern Anal Mach Intell 41(10):2424–2437
Cao Z, Long M, Wang J, Yu PS (2017) Hashnet: Deep learning to hash by continuation. In: Proceedings of the IEEE international conference on computer vision, pp 5608–5617
Chen Z, Yuan X, Lu J, Tian Q, Zhou J (2018) Deep hashing via discrepancy minimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6838–6847
Dmochowski JP, Sajda P, Parra LC (2010) Maximum likelihood in cost-sensitive learning: model specification, approximations, and upper bounds. J Mach Learn Res, vol 11(12)
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: 9th International conference on learning representations, ICLR 2021, virtual event, Austria, 3-7 May 2021
Fan L, Ng KW, Ju C, Zhang T, Chan CS (2021) Deep polarized network for supervised learning of accurate binary hashing codes. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence, p 7
Gionis A, Indyk P, Motwani R et al (1999) Similarity search in high dimensions via hashing. In: Vldb, vol 99, pp 518–529
Gong Y, Lazebnik S, Gordo A, Perronnin F (2012) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35 (12):2916–2929
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images
Kulis B, Darrell T (2009) Learning to hash with binary reconstructive embeddings. Adv Neural Inf Process Syst, vol 22
Lai H, Pan Y, Liu Y, Yan S (2015) Simultaneous feature learning and hash coding with deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3270–3278
Liu W, Wang J, Ji R, Jiang YG, Chang SF (2012) Supervised hashing with kernels. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 2074–2081
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Lu J, Chen M, Sun Y, Wang W, Wang Y, Yang X (2021) A smart adversarial attack on deep hashing based image retrieval. In: Proceedings of the 2021 international conference on multimedia retrieval, pp 227–235
Luo W, Li Y, Urtasun R, Zemel R (2016) Understanding the effective receptive field in deep convolutional neural networks. Adv Neural Inf Process Syst, vol 29
Miao S, Du S, Feng R, Zhang Y, Li H, Liu T, Zheng L, Fan W (2022) Balanced single-shot object detection using cross-context attention-guided network. Pattern Recognit 122:108258
Morgado P, Li Y, Costa Pereira J, Saberian M, Vasconcelos N (2021) Deep hashing with hash-consistent large margin proxy embeddings. Int J Comput Vis 129(2):419–438
Peng J, Wang H, Yue S, Zhang Z (2022) Context-aware co-supervision for accurate object detection. Pattern Recognit 121:108199
Plichoski GF, Chidambaram C, Parpinelli RS (2021) A face recognition framework based on a pool of techniques and differential evolution. Inf Sci 543:219–241
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Shen F, Shen C, Liu W, Tao Shen H (2015) Supervised discrete hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 37–45
Shen X, Dong G, Zheng Y, Lan L, Tsang I, Sun Q (2021) Deep co-image-label hashing for multi-label image retrieval. IEEE Trans Multimed
Su S, Zhang C, Han K, Tian Y (2018) Greedy hash: towards fast optimization for accurate hash coding in cnn. Adv Neural Inf Process Syst, vol 31
Sun P, Wu J, Li S, Lin P, Huang J, Li X (2021) Real-time semantic segmentation via auto depth, downsampling joint decision and feature aggregation. Int J Comput Vis 129(5):1506–1525
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst, vol 30
Wang J, Zhang T, Sebe N, Shen HT et al (2017) A survey on learning to hash. IEEE Trans Pattern Anal Mach Intell 40(4):769–790
Wang W, Zhang H, Zhang Z, Liu L, Shao L (2021) Sparse graph based self-supervised hashing for scalable image retrieval. Inf Sci 547:622–640
Wang Y, Ou X, Liang J, Sun Z (2020) Deep semantic reconstruction hashing for similarity retrieval. IEEE Trans Circuits Syst Video Technol 31(1):387–400
Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: Twenty-eighth AAAI conference on artificial intelligence
Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang ZH, Tay FE, Feng J, Yan S (2021) Tokens-to-token vit: training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 558– 567
Yuan M, Qin B, Li J, Qian J, Xin Y (2021) Hidden multi-distance loss-based full-convolution hashing. Appl Soft Comput 109:107508
Zhai H, Lai S, Jin H, Qian X, Mei T (2021) Deep transfer hashing for image retrieval. IEEE Trans Circuits Syst Video Technol, vol 31
Zhang B, Qian J, Xie X, Xin Y, Dong Y (2021) Capsnet-based supervised hashing. Appl Intell 51(8):5912–5926
Zhang D, Wu XJ (2022) Robust and discrete matrix factorization hashing for cross-modal retrieval. Pattern Recogn, vol 122
Zhang D, Wu XJ (2022) Scalable discrete matrix factorization and semantic autoencoder for cross-media retrieval. IEEE Trans Cybern, vol 52
Zhang D, Wu XJ, Xu T, Kittler J (2022) Watch: two-stage discrete cross-media hashing. IEEE Trans Knowl Data Eng
Zhang D, Wu XJ, Xu T, Yin H (2021) Dah: discrete asymmetric hashing for efficient cross-media retrieval. IEEE Trans Knowl Data Eng
Zhang D, Wu XJ, Yu J (2021) Discrete bidirectional matrix factorization hashing for zero-shot cross-media retrieval. In: Pattern recognition and computer vision, pp 524–536
Zhang D, Wu XJ, Yu J (2021) Label consistent flexible matrix factorization hashing for efficient cross-modal retrieval. ACM Trans Multimed Comput Commun Appl, vol 17
Zhou B, Khosla A, Lapedriza À, Oliva A, Torralba A (2015) Object detectors emerge in deep scene cnns. In: 3rd International conference on learning representations, ICLR 2015. Conference track proceedings, San Diego, CA, USA, 7-9 May 2015
Zhu H, Long M, Wang J, Cao Y (2016) Deep hashing network for efficient similarity retrieval. In: Proceedings of the AAAI conference on artificial intelligence, vol 30
Acknowledgements
This work was supported in part by China NSF Grant No. 62271274, Zhejiang NSF Grant No. LZ20F020001 and No. LY20F020009, and the programs sponsored by K. C. Wong Magna Fund in Ningbo University. The authors wish to thank the handling editor and anonymous reviewers for their time and constructive suggestions to improve the paper. (Corresponding author: Jiangbo Qian.)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Peng, L., Qian, J., Wang, C. et al. Swin transformer-based supervised hashing. Appl Intell 53, 17548–17560 (2023). https://doi.org/10.1007/s10489-022-04410-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04410-6