Abstract
In recent years, more and more researchers employ the hashing algorithm to improve the large-scale cross-modal retrieval efficiency by mapping the floating-point feature into the compact binary code. However, the cross-modal hashing algorithm usually computes the similarity relationship based on single class labels, while ignoring the multi-label information. To solve the above problem, we propose the deep adversarial multi-label cross-modal hashing algorithm (DAMCH) which takes both multi-label and deep feature into consideration during establishing the cross-modal neighbor matrix. Firstly, we propose the inter- and intra-modal neighbor relationship preserving function to make the Hamming neighbor relationship be consistent with the original neighbor relationship. Secondly, we design linear classification functions to learn binary features’ semantic labels and establish the hash semantic preserving loss function to guarantee the binary features have the same semantic information as the original label. Furthermore, we establish the intra-modal adversarial loss function to minimize the information loss during mapping the floating-point feature into the compact binary code, and propose the inter-modal adversarial loss function to ensure different modal features own the same distribution. Finally, we conduct the cross-modal retrieval comparative experiments and the ablation studies on two public datasets MIRFickr and NUS-WIDE. The experimental results show that DAMCH outperforms the current state-of-the-art methods.






Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Li Z, Lu H, Fu H (2022) Image-text bidirectional learning network based cross-modal retrieval. Neurocomputing 483:148–159
Cai L, Zhu L, Zhang H, Zhu X (2022) DA-GAN: dual attention generative adversarial network for cross-modal retrieval. Future Internet 14:43
Zhen L, Hu P, Wang X, Peng D (2019) Deep supervised cross-modal retrieval. In: Proceedings of the computer vision and pattern recognition (CVPR), pp 10394–10403
Kan M, Shan S, Zhang H, Lao S, Chen X (2012) Multi-view discriminant analysis. In: Proceedings of the European conference on computer vision, pp 808–821
Shaishav K, Raghavendra U (2011) Learning hash functions for cross-view similarity search. In: Proceedings of the 22nd international joint conference on artificial intelligence, pp 1360–1365
Wang D, Gao X, Wang X, He L (2015) Semantic topic multi-modal hashing for cross-media retrieval. In: Proceedings of the international joint conference on artificial intelligence, pp 3890–3896
Michael M, Alexander MB, Fabrice MB, Nikos P (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: Proceedings of the computer vision and pattern recognition (CVPR), pp 3594–3601
Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the twenty-eighth (AAAI) conference on artificial intelligence, pp 2177–2183
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the computer vision and pattern recognition (CVPR), pp 3864–3872
Jiang Q, Li W (2017) Deep cross-modal hashing. In: Proceedings of the computer vision and pattern recognition (CVPR), pp 3270–3278
Chen Z, Yu W, Li C, Nie L, Xu X (2018) Dual deep neural networks cross-modal hashing. In: AAAI, pp 274–281
Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018 )Self-supervised adversarial hashing networks for cross-modal retrieval. In: Proceedings of the computer vision and pattern recognition (CVPR), pp 4242–4251
Yang EK, Deng C, Liu W, Liu X, Tao D, Gao X (2017) Pairwise relationship guided deep hashing for cross-modal retrieval. In: Proceedings of the thirty-first (AAAI) conference on artificial intelligence, pp 1618–1625
Zhan Y, Luo X, Wang Y, Xu X (2020) Supervised hierarchical deep hashing for cross-modal retrieval. In: The 28th international conference on multimedia, pp 3386–3394
Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 785–796
Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the 2014 IEEE conference on computer vision and pattern recognition (CVPR), pp 2083–2090
Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the twenty-eighth conference on artificial intelligence (AAAI). pp 2177–2183
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the computer vision and pattern recognition (CVPR), pp 3864–3872
Zhang J, Peng Y, Yuan M (2020) SCH-GAN: semi-supervised cross-modal hashing by generative adversarial network. IEEE Trans Cybern 50(2):489–502
Tu R, Mao X, Ma B, Hu Y, Yan T, Wei W, Huang H (2022) Deep cross-modal hashing with hashing functions and unified hash codes jointly learning. IEEE Trans Knowl Data Eng 34:560–572
Lu J, Tang J, Li Z, Guo J (2020) Deep semantic multimodal hashing network for scalable image-text and video-text retrievals? IEEE Trans Neural Netw Learn Syst 99:1–14
Gu W, Gu X, Gu J, Li B, Xiong Z, Wang W (2019) adversary guided asymmetric hashing for cross-modal retrieval. In: Proceedings of the 2019 on international conference on multimedia retrieval, pp 159–167
Chen S, Wu S, Wang L, Yu Z (2021) Self-attention and adversary learning deep hashing network for cross-modal retrieval. Comput Electr Eng 93:107262
Zhang M, Zhou Z (2013) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26:1819–1837
Henry G, Bernhard P, Michael C (2016) Learning distance metrics for multi-label classification. In: Proceedings of the machine learning. PMLR, pp 318–333
Zou X, Wang X, Bakker Erwin M, Wu S (2021) Multi-label semantics preserving based deep cross-modal hashing. Signal Process Image Commun 93:116131
Bai C, Zeng C, Ma Q, Zhang J, Chen S (2020) Deep adversarial discrete hashing for cross-modal retrieval. In: Proceedings of the 2020 on international conference on multimedia retrieval
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd international conference on learning representations
Siddan G, Palraj P (2022) Foetal neurodegenerative disease classification using improved deep ResNet classification based VGG-19 feature extraction network. Multimed Tools Appl 81:2393–2408
Mu Y, Ni R, Zhang C, Gong H, Hu T, Li S, Sun Y, Zhang T, Guo Y (2021) A lightweight model of VGG-16 for remote sensing image classification. IEEE J Sel Top Appl Earth Obs Remote Sens 14:6916–6922
Zhang C, Meng D, He J (2020) VGG-16 convolutional neural network-oriented detection of filling flow status of viscous food. J Adv Comput Intell Intell Inf 24:568–575
Olga R, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 211–252
Li Z, Xu X, Dl Zhang et al (2021) Based on deep residual network. CSSE 2:36
Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation. In: Proceedings of the international conference on multimedia information retrieval, pp 39–43
Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of the international conference on image and video retrieval
Cao Y, Long M, Wang J (2017) Correlation hashing network for efficient cross-modal retrieval. In: Proceedings of the British machine vision conference
Nie X, Wang B, Li J, Hao F, Jian M, Yin Y (2021) Deep multiscale fusion hashing for cross-modal retrieval. IEEE Trans Circuits Syst Video Technol 31:401–410
Wang B, Yang Y, Xu X, Alan H, Shen H (2017) Adversarial cross-modal retrieval. In: Proceedings of the 2017 ACM on multimedia conference. Mountain View, pp 154–162
Acknowledgements
This research was funded by the National Natural Science Foundation of China, Grant Number 61841602, the Natural Science Foundation of Shandong Province of China, Grant Number ZR2021MF017, ZR2020MF147 and ZR2018PF005, the Youth Innovation Science and Technology Team Foundation of Shandong Higher School, Grant Number 2021KJ031 and the Fundamental Research Funds for the Central Universities, JLU, Grant Number 93K172021K12.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article. The authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Ethical approval
The results/data/figures in this manuscript have not been published elsewhere, nor are they under consideration (from you or one of your Contributing Authors) by another publisher. We have read the Springer journal policies on author responsibilities and submit this manuscript in accordance with those policies. All of the material is owned by the authors, and/or no permissions are required.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, X., Wang, Z., Liu, W. et al. Deep adversarial multi-label cross-modal hashing algorithm. Int J Multimed Info Retr 12, 16 (2023). https://doi.org/10.1007/s13735-023-00288-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13735-023-00288-3