Skip to main content
Log in

Deep adversarial multi-label cross-modal hashing algorithm

  • Regular Paper
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

In recent years, more and more researchers employ the hashing algorithm to improve the large-scale cross-modal retrieval efficiency by mapping the floating-point feature into the compact binary code. However, the cross-modal hashing algorithm usually computes the similarity relationship based on single class labels, while ignoring the multi-label information. To solve the above problem, we propose the deep adversarial multi-label cross-modal hashing algorithm (DAMCH) which takes both multi-label and deep feature into consideration during establishing the cross-modal neighbor matrix. Firstly, we propose the inter- and intra-modal neighbor relationship preserving function to make the Hamming neighbor relationship be consistent with the original neighbor relationship. Secondly, we design linear classification functions to learn binary features’ semantic labels and establish the hash semantic preserving loss function to guarantee the binary features have the same semantic information as the original label. Furthermore, we establish the intra-modal adversarial loss function to minimize the information loss during mapping the floating-point feature into the compact binary code, and propose the inter-modal adversarial loss function to ensure different modal features own the same distribution. Finally, we conduct the cross-modal retrieval comparative experiments and the ablation studies on two public datasets MIRFickr and NUS-WIDE. The experimental results show that DAMCH outperforms the current state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Li Z, Lu H, Fu H (2022) Image-text bidirectional learning network based cross-modal retrieval. Neurocomputing 483:148–159

    Article  Google Scholar 

  2. Cai L, Zhu L, Zhang H, Zhu X (2022) DA-GAN: dual attention generative adversarial network for cross-modal retrieval. Future Internet 14:43

    Article  Google Scholar 

  3. Zhen L, Hu P, Wang X, Peng D (2019) Deep supervised cross-modal retrieval. In: Proceedings of the computer vision and pattern recognition (CVPR), pp 10394–10403

  4. Kan M, Shan S, Zhang H, Lao S, Chen X (2012) Multi-view discriminant analysis. In: Proceedings of the European conference on computer vision, pp 808–821

  5. Shaishav K, Raghavendra U (2011) Learning hash functions for cross-view similarity search. In: Proceedings of the 22nd international joint conference on artificial intelligence, pp 1360–1365

  6. Wang D, Gao X, Wang X, He L (2015) Semantic topic multi-modal hashing for cross-media retrieval. In: Proceedings of the international joint conference on artificial intelligence, pp 3890–3896

  7. Michael M, Alexander MB, Fabrice MB, Nikos P (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: Proceedings of the computer vision and pattern recognition (CVPR), pp 3594–3601

  8. Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the twenty-eighth (AAAI) conference on artificial intelligence, pp 2177–2183

  9. Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the computer vision and pattern recognition (CVPR), pp 3864–3872

  10. Jiang Q, Li W (2017) Deep cross-modal hashing. In: Proceedings of the computer vision and pattern recognition (CVPR), pp 3270–3278

  11. Chen Z, Yu W, Li C, Nie L, Xu X (2018) Dual deep neural networks cross-modal hashing. In: AAAI, pp 274–281

  12. Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018 )Self-supervised adversarial hashing networks for cross-modal retrieval. In: Proceedings of the computer vision and pattern recognition (CVPR), pp 4242–4251

  13. Yang EK, Deng C, Liu W, Liu X, Tao D, Gao X (2017) Pairwise relationship guided deep hashing for cross-modal retrieval. In: Proceedings of the thirty-first (AAAI) conference on artificial intelligence, pp 1618–1625

  14. Zhan Y, Luo X, Wang Y, Xu X (2020) Supervised hierarchical deep hashing for cross-modal retrieval. In: The 28th international conference on multimedia, pp 3386–3394

  15. Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 785–796

  16. Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the 2014 IEEE conference on computer vision and pattern recognition (CVPR), pp 2083–2090

  17. Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the twenty-eighth conference on artificial intelligence (AAAI). pp 2177–2183

  18. Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the computer vision and pattern recognition (CVPR), pp 3864–3872

  19. Zhang J, Peng Y, Yuan M (2020) SCH-GAN: semi-supervised cross-modal hashing by generative adversarial network. IEEE Trans Cybern 50(2):489–502

    Article  Google Scholar 

  20. Tu R, Mao X, Ma B, Hu Y, Yan T, Wei W, Huang H (2022) Deep cross-modal hashing with hashing functions and unified hash codes jointly learning. IEEE Trans Knowl Data Eng 34:560–572

    Article  Google Scholar 

  21. Lu J, Tang J, Li Z, Guo J (2020) Deep semantic multimodal hashing network for scalable image-text and video-text retrievals? IEEE Trans Neural Netw Learn Syst 99:1–14

    Google Scholar 

  22. Gu W, Gu X, Gu J, Li B, Xiong Z, Wang W (2019) adversary guided asymmetric hashing for cross-modal retrieval. In: Proceedings of the 2019 on international conference on multimedia retrieval, pp 159–167

  23. Chen S, Wu S, Wang L, Yu Z (2021) Self-attention and adversary learning deep hashing network for cross-modal retrieval. Comput Electr Eng 93:107262

    Article  Google Scholar 

  24. Zhang M, Zhou Z (2013) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26:1819–1837

    Article  Google Scholar 

  25. Henry G, Bernhard P, Michael C (2016) Learning distance metrics for multi-label classification. In: Proceedings of the machine learning. PMLR, pp 318–333

  26. Zou X, Wang X, Bakker Erwin M, Wu S (2021) Multi-label semantics preserving based deep cross-modal hashing. Signal Process Image Commun 93:116131

    Article  Google Scholar 

  27. Bai C, Zeng C, Ma Q, Zhang J, Chen S (2020) Deep adversarial discrete hashing for cross-modal retrieval. In: Proceedings of the 2020 on international conference on multimedia retrieval

  28. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd international conference on learning representations

  29. Siddan G, Palraj P (2022) Foetal neurodegenerative disease classification using improved deep ResNet classification based VGG-19 feature extraction network. Multimed Tools Appl 81:2393–2408

    Article  Google Scholar 

  30. Mu Y, Ni R, Zhang C, Gong H, Hu T, Li S, Sun Y, Zhang T, Guo Y (2021) A lightweight model of VGG-16 for remote sensing image classification. IEEE J Sel Top Appl Earth Obs Remote Sens 14:6916–6922

    Article  Google Scholar 

  31. Zhang C, Meng D, He J (2020) VGG-16 convolutional neural network-oriented detection of filling flow status of viscous food. J Adv Comput Intell Intell Inf 24:568–575

    Article  Google Scholar 

  32. Olga R, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 211–252

  33. Li Z, Xu X, Dl Zhang et al (2021) Based on deep residual network. CSSE 2:36

    Google Scholar 

  34. Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation. In: Proceedings of the international conference on multimedia information retrieval, pp 39–43

  35. Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of the international conference on image and video retrieval

  36. Cao Y, Long M, Wang J (2017) Correlation hashing network for efficient cross-modal retrieval. In: Proceedings of the British machine vision conference

  37. Nie X, Wang B, Li J, Hao F, Jian M, Yin Y (2021) Deep multiscale fusion hashing for cross-modal retrieval. IEEE Trans Circuits Syst Video Technol 31:401–410

    Article  Google Scholar 

  38. Wang B, Yang Y, Xu X, Alan H, Shen H (2017) Adversarial cross-modal retrieval. In: Proceedings of the 2017 ACM on multimedia conference. Mountain View, pp 154–162

Download references

Acknowledgements

This research was funded by the National Natural Science Foundation of China, Grant Number 61841602, the Natural Science Foundation of Shandong Province of China, Grant Number ZR2021MF017, ZR2020MF147 and ZR2018PF005, the Youth Innovation Science and Technology Team Foundation of Shandong Higher School, Grant Number 2021KJ031 and the Fundamental Research Funds for the Central Universities, JLU, Grant Number 93K172021K12.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhen Wang.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article. The authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Ethical approval

The results/data/figures in this manuscript have not been published elsewhere, nor are they under consideration (from you or one of your Contributing Authors) by another publisher. We have read the Springer journal policies on author responsibilities and submit this manuscript in accordance with those policies. All of the material is owned by the authors, and/or no permissions are required.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, X., Wang, Z., Liu, W. et al. Deep adversarial multi-label cross-modal hashing algorithm. Int J Multimed Info Retr 12, 16 (2023). https://doi.org/10.1007/s13735-023-00288-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13735-023-00288-3

Keywords