Skip to main content
Log in

Swin transformer-based supervised hashing

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

With the rapid development of the modern internet, image data are growing explosively. How to retrieve specific images from such big data has become an urgent problem. The common solution is the hash-based approximate nearest neighbor retrieval method, which uses compact binary hash codes to represent the original image data. When calculating the image similarity, it can quickly retrieve similar images by bit operation and requires only a small memory space to store hash codes. In recent years, the combination of deep learning and hash learning has led to breakthroughs in hash-based image retrieval methods. In particular, convolutional neural networks (CNNs) are widely used in various deep hashing methods. However, CNNs cannot capture global image information well when extracting image features, which affects the quality of the hash codes. Therefore, we first introduce the Swin Transformer network into hash learning and propose Swin Transformer-based supervised hashing (SWTH). Using the Swin Transformer as the feature extraction backbone network, we can capture the global context information of an image as much as possible by establishing the relations among different blocks of the image. Furthermore, the Swin Transformer adopts a hierarchical structure of layer-by-layer downsampling, which can obtain rich multiscale feature information while extracting global information. After the feature extraction network, we add a hash layer for hash learning. The image feature representation and hash function can be learned by optimizing the combination of hash loss, classification loss and quantization loss. Extensive experimental results show that the SWTH method outperforms many state-of-the-art methods and achieves excellent retrieval performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. https://www.cs.toronto.edu/~kriz/cifar.html

  2. https://image-net.org/index.php

  3. The SWTH source codes could be downloaded from https://github.com/plk-t/SWTH

References

  1. Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval. ACM Press New York, vol 463

  2. Cakir F, He K, Bargal SA, Sclaroff S (2019) Hashing with mutual information. IEEE Trans Pattern Anal Mach Intell 41(10):2424–2437

    Article  Google Scholar 

  3. Cao Z, Long M, Wang J, Yu PS (2017) Hashnet: Deep learning to hash by continuation. In: Proceedings of the IEEE international conference on computer vision, pp 5608–5617

  4. Chen Z, Yuan X, Lu J, Tian Q, Zhou J (2018) Deep hashing via discrepancy minimization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6838–6847

  5. Dmochowski JP, Sajda P, Parra LC (2010) Maximum likelihood in cost-sensitive learning: model specification, approximations, and upper bounds. J Mach Learn Res, vol 11(12)

  6. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: 9th International conference on learning representations, ICLR 2021, virtual event, Austria, 3-7 May 2021

  7. Fan L, Ng KW, Ju C, Zhang T, Chan CS (2021) Deep polarized network for supervised learning of accurate binary hashing codes. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence, p 7

  8. Gionis A, Indyk P, Motwani R et al (1999) Similarity search in high dimensions via hashing. In: Vldb, vol 99, pp 518–529

  9. Gong Y, Lazebnik S, Gordo A, Perronnin F (2012) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35 (12):2916–2929

    Article  Google Scholar 

  10. Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images

  11. Kulis B, Darrell T (2009) Learning to hash with binary reconstructive embeddings. Adv Neural Inf Process Syst, vol 22

  12. Lai H, Pan Y, Liu Y, Yan S (2015) Simultaneous feature learning and hash coding with deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3270–3278

  13. Liu W, Wang J, Ji R, Jiang YG, Chang SF (2012) Supervised hashing with kernels. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 2074–2081

  14. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022

  15. Lu J, Chen M, Sun Y, Wang W, Wang Y, Yang X (2021) A smart adversarial attack on deep hashing based image retrieval. In: Proceedings of the 2021 international conference on multimedia retrieval, pp 227–235

  16. Luo W, Li Y, Urtasun R, Zemel R (2016) Understanding the effective receptive field in deep convolutional neural networks. Adv Neural Inf Process Syst, vol 29

  17. Miao S, Du S, Feng R, Zhang Y, Li H, Liu T, Zheng L, Fan W (2022) Balanced single-shot object detection using cross-context attention-guided network. Pattern Recognit 122:108258

    Article  Google Scholar 

  18. Morgado P, Li Y, Costa Pereira J, Saberian M, Vasconcelos N (2021) Deep hashing with hash-consistent large margin proxy embeddings. Int J Comput Vis 129(2):419–438

    Article  Google Scholar 

  19. Peng J, Wang H, Yue S, Zhang Z (2022) Context-aware co-supervision for accurate object detection. Pattern Recognit 121:108199

    Article  Google Scholar 

  20. Plichoski GF, Chidambaram C, Parpinelli RS (2021) A face recognition framework based on a pool of techniques and differential evolution. Inf Sci 543:219–241

    Article  Google Scholar 

  21. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  22. Shen F, Shen C, Liu W, Tao Shen H (2015) Supervised discrete hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 37–45

  23. Shen X, Dong G, Zheng Y, Lan L, Tsang I, Sun Q (2021) Deep co-image-label hashing for multi-label image retrieval. IEEE Trans Multimed

  24. Su S, Zhang C, Han K, Tian Y (2018) Greedy hash: towards fast optimization for accurate hash coding in cnn. Adv Neural Inf Process Syst, vol 31

  25. Sun P, Wu J, Li S, Lin P, Huang J, Li X (2021) Real-time semantic segmentation via auto depth, downsampling joint decision and feature aggregation. Int J Comput Vis 129(5):1506–1525

    Article  Google Scholar 

  26. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst, vol 30

  27. Wang J, Zhang T, Sebe N, Shen HT et al (2017) A survey on learning to hash. IEEE Trans Pattern Anal Mach Intell 40(4):769–790

    Article  Google Scholar 

  28. Wang W, Zhang H, Zhang Z, Liu L, Shao L (2021) Sparse graph based self-supervised hashing for scalable image retrieval. Inf Sci 547:622–640

    Article  MathSciNet  MATH  Google Scholar 

  29. Wang Y, Ou X, Liang J, Sun Z (2020) Deep semantic reconstruction hashing for similarity retrieval. IEEE Trans Circuits Syst Video Technol 31(1):387–400

    Article  Google Scholar 

  30. Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: Twenty-eighth AAAI conference on artificial intelligence

  31. Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang ZH, Tay FE, Feng J, Yan S (2021) Tokens-to-token vit: training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 558– 567

  32. Yuan M, Qin B, Li J, Qian J, Xin Y (2021) Hidden multi-distance loss-based full-convolution hashing. Appl Soft Comput 109:107508

    Article  Google Scholar 

  33. Zhai H, Lai S, Jin H, Qian X, Mei T (2021) Deep transfer hashing for image retrieval. IEEE Trans Circuits Syst Video Technol, vol 31

  34. Zhang B, Qian J, Xie X, Xin Y, Dong Y (2021) Capsnet-based supervised hashing. Appl Intell 51(8):5912–5926

    Article  Google Scholar 

  35. Zhang D, Wu XJ (2022) Robust and discrete matrix factorization hashing for cross-modal retrieval. Pattern Recogn, vol 122

  36. Zhang D, Wu XJ (2022) Scalable discrete matrix factorization and semantic autoencoder for cross-media retrieval. IEEE Trans Cybern, vol 52

  37. Zhang D, Wu XJ, Xu T, Kittler J (2022) Watch: two-stage discrete cross-media hashing. IEEE Trans Knowl Data Eng

  38. Zhang D, Wu XJ, Xu T, Yin H (2021) Dah: discrete asymmetric hashing for efficient cross-media retrieval. IEEE Trans Knowl Data Eng

  39. Zhang D, Wu XJ, Yu J (2021) Discrete bidirectional matrix factorization hashing for zero-shot cross-media retrieval. In: Pattern recognition and computer vision, pp 524–536

  40. Zhang D, Wu XJ, Yu J (2021) Label consistent flexible matrix factorization hashing for efficient cross-modal retrieval. ACM Trans Multimed Comput Commun Appl, vol 17

  41. Zhou B, Khosla A, Lapedriza À, Oliva A, Torralba A (2015) Object detectors emerge in deep scene cnns. In: 3rd International conference on learning representations, ICLR 2015. Conference track proceedings, San Diego, CA, USA, 7-9 May 2015

  42. Zhu H, Long M, Wang J, Cao Y (2016) Deep hashing network for efficient similarity retrieval. In: Proceedings of the AAAI conference on artificial intelligence, vol 30

Download references

Acknowledgements

This work was supported in part by China NSF Grant No. 62271274, Zhejiang NSF Grant No. LZ20F020001 and No. LY20F020009, and the programs sponsored by K. C. Wong Magna Fund in Ningbo University. The authors wish to thank the handling editor and anonymous reviewers for their time and constructive suggestions to improve the paper. (Corresponding author: Jiangbo Qian.)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiangbo Qian.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, L., Qian, J., Wang, C. et al. Swin transformer-based supervised hashing. Appl Intell 53, 17548–17560 (2023). https://doi.org/10.1007/s10489-022-04410-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04410-6

Keywords

Navigation