Deep adversarial multi-label cross-modal hashing algorithm

Yang, Xiaohan; Wang, Zhen; Liu, Wenhao; Chang, Xinyi; Wu, Nannan

doi:10.1007/s13735-023-00288-3

Deep adversarial multi-label cross-modal hashing algorithm

Regular Paper
Published: 29 July 2023

Volume 12, article number 16, (2023)
Cite this article

International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Xiaohan Yang¹,
Zhen Wang^1,2,
Wenhao Liu¹,
Xinyi Chang¹ &
…
Nannan Wu¹

306 Accesses
Explore all metrics

Abstract

In recent years, more and more researchers employ the hashing algorithm to improve the large-scale cross-modal retrieval efficiency by mapping the floating-point feature into the compact binary code. However, the cross-modal hashing algorithm usually computes the similarity relationship based on single class labels, while ignoring the multi-label information. To solve the above problem, we propose the deep adversarial multi-label cross-modal hashing algorithm (DAMCH) which takes both multi-label and deep feature into consideration during establishing the cross-modal neighbor matrix. Firstly, we propose the inter- and intra-modal neighbor relationship preserving function to make the Hamming neighbor relationship be consistent with the original neighbor relationship. Secondly, we design linear classification functions to learn binary features’ semantic labels and establish the hash semantic preserving loss function to guarantee the binary features have the same semantic information as the original label. Furthermore, we establish the intra-modal adversarial loss function to minimize the information loss during mapping the floating-point feature into the compact binary code, and propose the inter-modal adversarial loss function to ensure different modal features own the same distribution. Finally, we conduct the cross-modal retrieval comparative experiments and the ablation studies on two public datasets MIRFickr and NUS-WIDE. The experimental results show that DAMCH outperforms the current state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Label-Based Deep Semantic Hashing for Cross-Modal Retrieval

Pseudo-label driven deep hashing for unsupervised cross-modal retrieval

Article 11 May 2023

Multi-head Hashing with Orthogonal Decomposition for Cross-modal Retrieval

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Li Z, Lu H, Fu H (2022) Image-text bidirectional learning network based cross-modal retrieval. Neurocomputing 483:148–159
Article Google Scholar
Cai L, Zhu L, Zhang H, Zhu X (2022) DA-GAN: dual attention generative adversarial network for cross-modal retrieval. Future Internet 14:43
Article Google Scholar
Zhen L, Hu P, Wang X, Peng D (2019) Deep supervised cross-modal retrieval. In: Proceedings of the computer vision and pattern recognition (CVPR), pp 10394–10403
Kan M, Shan S, Zhang H, Lao S, Chen X (2012) Multi-view discriminant analysis. In: Proceedings of the European conference on computer vision, pp 808–821
Shaishav K, Raghavendra U (2011) Learning hash functions for cross-view similarity search. In: Proceedings of the 22nd international joint conference on artificial intelligence, pp 1360–1365
Wang D, Gao X, Wang X, He L (2015) Semantic topic multi-modal hashing for cross-media retrieval. In: Proceedings of the international joint conference on artificial intelligence, pp 3890–3896
Michael M, Alexander MB, Fabrice MB, Nikos P (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: Proceedings of the computer vision and pattern recognition (CVPR), pp 3594–3601
Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the twenty-eighth (AAAI) conference on artificial intelligence, pp 2177–2183
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the computer vision and pattern recognition (CVPR), pp 3864–3872
Jiang Q, Li W (2017) Deep cross-modal hashing. In: Proceedings of the computer vision and pattern recognition (CVPR), pp 3270–3278
Chen Z, Yu W, Li C, Nie L, Xu X (2018) Dual deep neural networks cross-modal hashing. In: AAAI, pp 274–281
Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018 )Self-supervised adversarial hashing networks for cross-modal retrieval. In: Proceedings of the computer vision and pattern recognition (CVPR), pp 4242–4251
Yang EK, Deng C, Liu W, Liu X, Tao D, Gao X (2017) Pairwise relationship guided deep hashing for cross-modal retrieval. In: Proceedings of the thirty-first (AAAI) conference on artificial intelligence, pp 1618–1625
Zhan Y, Luo X, Wang Y, Xu X (2020) Supervised hierarchical deep hashing for cross-modal retrieval. In: The 28th international conference on multimedia, pp 3386–3394
Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 785–796
Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the 2014 IEEE conference on computer vision and pattern recognition (CVPR), pp 2083–2090
Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the twenty-eighth conference on artificial intelligence (AAAI). pp 2177–2183
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the computer vision and pattern recognition (CVPR), pp 3864–3872
Zhang J, Peng Y, Yuan M (2020) SCH-GAN: semi-supervised cross-modal hashing by generative adversarial network. IEEE Trans Cybern 50(2):489–502
Article Google Scholar
Tu R, Mao X, Ma B, Hu Y, Yan T, Wei W, Huang H (2022) Deep cross-modal hashing with hashing functions and unified hash codes jointly learning. IEEE Trans Knowl Data Eng 34:560–572
Article Google Scholar
Lu J, Tang J, Li Z, Guo J (2020) Deep semantic multimodal hashing network for scalable image-text and video-text retrievals? IEEE Trans Neural Netw Learn Syst 99:1–14
Google Scholar
Gu W, Gu X, Gu J, Li B, Xiong Z, Wang W (2019) adversary guided asymmetric hashing for cross-modal retrieval. In: Proceedings of the 2019 on international conference on multimedia retrieval, pp 159–167
Chen S, Wu S, Wang L, Yu Z (2021) Self-attention and adversary learning deep hashing network for cross-modal retrieval. Comput Electr Eng 93:107262
Article Google Scholar
Zhang M, Zhou Z (2013) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26:1819–1837
Article Google Scholar
Henry G, Bernhard P, Michael C (2016) Learning distance metrics for multi-label classification. In: Proceedings of the machine learning. PMLR, pp 318–333
Zou X, Wang X, Bakker Erwin M, Wu S (2021) Multi-label semantics preserving based deep cross-modal hashing. Signal Process Image Commun 93:116131
Article Google Scholar
Bai C, Zeng C, Ma Q, Zhang J, Chen S (2020) Deep adversarial discrete hashing for cross-modal retrieval. In: Proceedings of the 2020 on international conference on multimedia retrieval
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd international conference on learning representations
Siddan G, Palraj P (2022) Foetal neurodegenerative disease classification using improved deep ResNet classification based VGG-19 feature extraction network. Multimed Tools Appl 81:2393–2408
Article Google Scholar
Mu Y, Ni R, Zhang C, Gong H, Hu T, Li S, Sun Y, Zhang T, Guo Y (2021) A lightweight model of VGG-16 for remote sensing image classification. IEEE J Sel Top Appl Earth Obs Remote Sens 14:6916–6922
Article Google Scholar
Zhang C, Meng D, He J (2020) VGG-16 convolutional neural network-oriented detection of filling flow status of viscous food. J Adv Comput Intell Intell Inf 24:568–575
Article Google Scholar
Olga R, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 211–252
Li Z, Xu X, Dl Zhang et al (2021) Based on deep residual network. CSSE 2:36
Google Scholar
Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation. In: Proceedings of the international conference on multimedia information retrieval, pp 39–43
Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of the international conference on image and video retrieval
Cao Y, Long M, Wang J (2017) Correlation hashing network for efficient cross-modal retrieval. In: Proceedings of the British machine vision conference
Nie X, Wang B, Li J, Hao F, Jian M, Yin Y (2021) Deep multiscale fusion hashing for cross-modal retrieval. IEEE Trans Circuits Syst Video Technol 31:401–410
Article Google Scholar
Wang B, Yang Y, Xu X, Alan H, Shen H (2017) Adversarial cross-modal retrieval. In: Proceedings of the 2017 ACM on multimedia conference. Mountain View, pp 154–162

Download references

Acknowledgements

This research was funded by the National Natural Science Foundation of China, Grant Number 61841602, the Natural Science Foundation of Shandong Province of China, Grant Number ZR2021MF017, ZR2020MF147 and ZR2018PF005, the Youth Innovation Science and Technology Team Foundation of Shandong Higher School, Grant Number 2021KJ031 and the Fundamental Research Funds for the Central Universities, JLU, Grant Number 93K172021K12.

Author information

Authors and Affiliations

School of Computer Science and Technology, Shandong University of Technology, Xincun West, Zhangdian, 255000, Zibo, China
Xiaohan Yang, Zhen Wang, Wenhao Liu, Xinyi Chang & Nannan Wu
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Qianjin street, Changchun, 130012, Jilin, China
Zhen Wang

Authors

Xiaohan Yang
View author publications
You can also search for this author inPubMed Google Scholar
Zhen Wang
View author publications
You can also search for this author inPubMed Google Scholar
Wenhao Liu
View author publications
You can also search for this author inPubMed Google Scholar
Xinyi Chang
View author publications
You can also search for this author inPubMed Google Scholar
Nannan Wu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zhen Wang.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article. The authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Ethical approval

The results/data/figures in this manuscript have not been published elsewhere, nor are they under consideration (from you or one of your Contributing Authors) by another publisher. We have read the Springer journal policies on author responsibilities and submit this manuscript in accordance with those policies. All of the material is owned by the authors, and/or no permissions are required.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, X., Wang, Z., Liu, W. et al. Deep adversarial multi-label cross-modal hashing algorithm. Int J Multimed Info Retr 12, 16 (2023). https://doi.org/10.1007/s13735-023-00288-3

Download citation

Received: 27 February 2023
Revised: 25 May 2023
Accepted: 10 July 2023
Published: 29 July 2023
DOI: https://doi.org/10.1007/s13735-023-00288-3

Keywords

Part of a collection:

S.I. Open-domain Image Retrieval in the Wild

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep adversarial multi-label cross-modal hashing algorithm

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Label-Based Deep Semantic Hashing for Cross-Modal Retrieval

Pseudo-label driven deep hashing for unsupervised cross-modal retrieval

Multi-head Hashing with Orthogonal Decomposition for Cross-modal Retrieval

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now