Abstract
Deep hashing methods effectively enhance the performance of conventional machine learning retrieval models, particularly in visual medium evolving cross-modal retrieval tasks, by relying on the outstanding feature extraction ability of deep neural networks (DNNs). The state-of-the-art deep hashing research focuses on designing prominent models by employing DNNs to discover semantic information from different modalities of data and execute relevant information retrieval tasks. However, the robustness attribute considered essential for reliable DNN model design has limited concerns on deep hashing models. In this article, we present an end-to-end adversarial training framework for cross-modal retrieval. Our framework leverages a projected gradient descent(PGD)-based method to generate adversarial samples, which are then combined with normal samples to achieve robust training. Our approach addresses the vulnerability issues of existing cross-modal retrieval models and fills the gap in retrieval task design. We conduct extensive experiments and compare our model with state-of-the-art cross-modal retrieval models on three benchmark datasets to verify that our model can effectively boost the performance of deep hashing retrieval models on cross-modal retrieval . This work highlights the effectiveness of adversarial training in efficient deep hashing model design.
Similar content being viewed by others
References
Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: An overview with application to learning methods. Neural Comput 16(12):2639–2664
Wang J, Zhang T, Sebe N, Shen HT et al (2017) A survey on learning to hash. IEEE Trans Pattern Anal Mach Intell 40(4):769–790
Li YN, Wang P, Su YT (2015) Robust image hashing based on selective quaternion invariance. IEEE Signal Process Lett 22(12):2396–2400
Shen X, Shen F, Sun QS, Yuan YH, Shen HT (2016) Robust cross-view hashing for multimedia retrieval. IEEE Signal Process Lett 23(6):893–897
Lu J, Liong VE, Zhou J (2017) Deep hashing for scalable image search. IEEE Trans Image Process 26(5):2352–2367
Lai H, Pan Y, Liu Y, Yan S (2015) Simultaneous feature learning and hash coding with deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3270–3278. IEEE, Boston, MA, USA
Liu H, Wang R, Shan S, Chen X (2016) Deep supervised hashing for fast image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2064–2072. IEEE, Las Vegas, Nevada, USA
Zhu Y, Li Y, Wang S (2019) Unsupervised deep hashing with adaptive feature learning for image retrieval. IEEE Signal Process Lett 26(3):395–399
Ma L, Li X, Shi Y, Wu J, Zhang Y (2020) Correlation filtering-based hashing for fine-grained image retrieval. IEEE Signal Process Lett 27:2129–2133
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, Talwar K, Zhang L (2016) Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. CCS’16, pp 308–318. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2976749.2978318
Papernot N, McDaniel P, Goodfellow I, Jha S, Celik ZB, Swami A (2017) Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia conference on computer and communications security, pp 506–519. ACM. New York, NY, USA
Zhang H, Yu Y, Jiao J, Xing E, El Ghaoui L, Jordan M (2019) Theoretically principled trade-off between robustness and accuracy. In: International conference on machine learning, PMLR, pp 7472–7482
Salman H, Ilyas A, Engstrom L, Kapoor A, Madry A (2020) Do adversarially robust ImageNet models transfer better? Adv Neural Inf Process Syst 33:3533–3545
Kaur P, Pannu HS, Malhi AK (2021) Comparative analysis on cross-modal information retrieval: a review. Comput Sci Rev 39(100):336
Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: Twenty-eighth AAAI conference on artificial intelligence. AAAI, Québec City, Québec, Canada
Jiang QY, Li WJ (2017) Deep cross-modal hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3232–3240. IEEE, Honolulu, Hawaii, USA
Su S, Zhong Z, Zhang C (2019) Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3027–3035. IEEE, Long Beach, CA, USA
Cai L, Zhu L, Zhang H, Zhu X (2022) Da-gan: Dual attention generative adversarial network for cross-modal retrieval. Futur Internet 14(2):43
Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018) Self-supervised adversarial hashing networks for cross-modal retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4242–4251. IEEE, Salt Lake City, Utah, USA
Wang B, Yang Y, Xu X, Hanjalic A, Shen HT (2017) Adversarial cross-modal retrieval. In: Proceedings of the 25th ACM international conference on Multimedia, pp 154–162. ACM, Mountain View, CA, USA
Gu W, Gu X, Gu J, Li B, Xiong Z, Wang W (2019) Adversary guided asymmetric hashing for cross-modal retrieval. In: Proceedings of the 2019 on international conference on multimedia retrieval, pp 159–167. ACM, New York, NY, USA
Pouyanfar S, Sadiq S, Yan Y, Tian H, Tao Y, Reyes MP, Shyu ML, Chen SC, Iyengar SS (2018) A survey on deep learning: Algorithms, techniques, and applications. ACM Comput Surv (CSUR) 51(5):1–36
Zhang X, Zheng X, Mao W (2021) Adversarial perturbation defense on deep neural networks. ACM Comput Surv (CSUR) 54(8):1–36
Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: Bengio Y, LeCun Y (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. arXiv:1412.6572
Xu H, Liu X, Li Y, Jain A, Tang J (2021) To be robust or to be fair: Towards fairness in adversarial training. In: International Conference on Machine Learning, PMLR, pp 11492–11501
Shaham U, Yamada Y, Negahban S (2018) Understanding adversarial training: Increasing local stability of supervised models through robust optimization. Neurocomputing 307:195–204
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. In: 6th International Conference on Learning Representations
Wong E, Rice L, Kolter JZ (2020) Fast is better than free: Revisiting adversarial training. arXiv preprint arXiv:2001.03994
Singla V, Singla S, Feizi S, Jacobs D (2021) Low curvature activations reduce overfitting in adversarial training. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 16423–16433. IEEE, Virtual
Tsipras D, Santurkar S, Engstrom L, Turner A, Madry A (2019) Robustness may be at odds with accuracy In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. http://openreview.net/forum?id=SyxAb30cY7
Kim H, Lee W, Lee J (2021) Understanding catastrophic overfitting in single-step adversarial training. Proceedings of the AAAI Conference on Artificial Intelligence 35:8119–8127
Pereira JC, Coviello E, Doyle G, Rasiwasia N, Lanckriet GR, Levy R, Vasconcelos N (2013) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535
Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on Multimedia information retrieval, pp 39–43. ACM, New York, NY, USA
Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of Singapore. In: Proceedings of the ACM international conference on image and video retrieval, pp 1–9. ACM, New York, NY, USA
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3864–3872. IEEE, Boston, MA, USA
Li Y, van Gemert J (2021) Deep unsupervised image hashing by maximizing bit entropy. Proceedings of the AAAI Conference on Artificial Intelligence 35:2002–2010
Hoang T, Do T-T, Nguyen TV, Cheung N-M (2022) Multimodal mutual information maximization: A novel approach for unsupervised deep crossmodal hashing. IEEE Transactions on Neural Networks and Learning Systems, pp 1–14. https://doi.org/10.1109/TNNLS.2021.3135420
Zou X, Wu S, Zhang N, Bakker EM (2022) Multi-label modality enhanced attention based self-supervised deep cross-modal hashing. Knowledge-Based Systems 239:107927. https://doi.org/10.1016/j.knosys.2021.107927
Shi Y, Zhao Y, Liu X, Zheng F, Ou W, You X, Peng Q (2022) Deep adaptively-enhanced hashing with discriminative similarity guidance for unsupervised cross-modal retrieval. IEEE Trans Circ Syst Video Technol 32(10):7255–7268. https://doi.org/10.1109/TCSVT.2022.3172716
Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pp 415–424. ACM, New York, NY, USA
Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2075–2082. IEEE, Columbus, OH, USA
Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the AAAI conference on artificial intelligence, vol 28. AAAI, Québec City, Canada
Schmidt L, Santurkar S, Tsipras D, Talwar K, Madry A (2018) Adversarially robust generalization requires more data. Adv Neural Inf Process Syst 31:5019–5031
Zhang D, Zhang T, Lu Y, Zhu Z, Dong B (2019) You only propagate once: Accelerating adversarial training via maximal principle. Adv Neural Inf Process Syst 32:227–238
Acknowledgements
This work is supported by the Ministry of Science and Technology of China under Grant No:2020AAA0108401, and the Natural Science Foundation of China under Grant Nos. 72225011 and 71621002.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Xiaolong Zheng and Xingwei Zhang contributed equally to this work.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, X., Zheng, X., Mao, W. et al. Boosting deep cross-modal retrieval hashing with adversarially robust training. Appl Intell 53, 23698–23710 (2023). https://doi.org/10.1007/s10489-023-04715-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04715-0