Boosting deep cross-modal retrieval hashing with adversarially robust training

Zhang, Xingwei; Zheng, Xiaolong; Mao, Wenji; Zeng, Daniel Dajun

doi:10.1007/s10489-023-04715-0

Boosting deep cross-modal retrieval hashing with adversarially robust training

Published: 13 July 2023

Volume 53, pages 23698–23710, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Xingwei Zhang^1,2,
Xiaolong Zheng ORCID: orcid.org/0000-0003-0405-5458^1,2,
Wenji Mao^1,2 &
…
Daniel Dajun Zeng^1,2

278 Accesses
Explore all metrics

Abstract

Deep hashing methods effectively enhance the performance of conventional machine learning retrieval models, particularly in visual medium evolving cross-modal retrieval tasks, by relying on the outstanding feature extraction ability of deep neural networks (DNNs). The state-of-the-art deep hashing research focuses on designing prominent models by employing DNNs to discover semantic information from different modalities of data and execute relevant information retrieval tasks. However, the robustness attribute considered essential for reliable DNN model design has limited concerns on deep hashing models. In this article, we present an end-to-end adversarial training framework for cross-modal retrieval. Our framework leverages a projected gradient descent(PGD)-based method to generate adversarial samples, which are then combined with normal samples to achieve robust training. Our approach addresses the vulnerability issues of existing cross-modal retrieval models and fills the gap in retrieval task design. We conduct extensive experiments and compare our model with state-of-the-art cross-modal retrieval models on three benchmark datasets to verify that our model can effectively boost the performance of deep hashing retrieval models on cross-modal retrieval . This work highlights the effectiveness of adversarial training in efficient deep hashing model design.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention-Aware Deep Adversarial Hashing for Cross-Modal Retrieval

A Novel Cross Modal Hashing Algorithm Based on Multi-modal Deep Learning

Deep semantic hashing with dual attention for cross-modal retrieval

Article 12 November 2021

References

Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: An overview with application to learning methods. Neural Comput 16(12):2639–2664
Article MATH Google Scholar
Wang J, Zhang T, Sebe N, Shen HT et al (2017) A survey on learning to hash. IEEE Trans Pattern Anal Mach Intell 40(4):769–790
Li YN, Wang P, Su YT (2015) Robust image hashing based on selective quaternion invariance. IEEE Signal Process Lett 22(12):2396–2400
Article Google Scholar
Shen X, Shen F, Sun QS, Yuan YH, Shen HT (2016) Robust cross-view hashing for multimedia retrieval. IEEE Signal Process Lett 23(6):893–897
Article Google Scholar
Lu J, Liong VE, Zhou J (2017) Deep hashing for scalable image search. IEEE Trans Image Process 26(5):2352–2367
Article MathSciNet MATH Google Scholar
Lai H, Pan Y, Liu Y, Yan S (2015) Simultaneous feature learning and hash coding with deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3270–3278. IEEE, Boston, MA, USA
Liu H, Wang R, Shan S, Chen X (2016) Deep supervised hashing for fast image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2064–2072. IEEE, Las Vegas, Nevada, USA
Zhu Y, Li Y, Wang S (2019) Unsupervised deep hashing with adaptive feature learning for image retrieval. IEEE Signal Process Lett 26(3):395–399
Article Google Scholar
Ma L, Li X, Shi Y, Wu J, Zhang Y (2020) Correlation filtering-based hashing for fine-grained image retrieval. IEEE Signal Process Lett 27:2129–2133
Article Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, Talwar K, Zhang L (2016) Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. CCS’16, pp 308–318. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2976749.2978318
Papernot N, McDaniel P, Goodfellow I, Jha S, Celik ZB, Swami A (2017) Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia conference on computer and communications security, pp 506–519. ACM. New York, NY, USA
Zhang H, Yu Y, Jiao J, Xing E, El Ghaoui L, Jordan M (2019) Theoretically principled trade-off between robustness and accuracy. In: International conference on machine learning, PMLR, pp 7472–7482
Salman H, Ilyas A, Engstrom L, Kapoor A, Madry A (2020) Do adversarially robust ImageNet models transfer better? Adv Neural Inf Process Syst 33:3533–3545
Google Scholar
Kaur P, Pannu HS, Malhi AK (2021) Comparative analysis on cross-modal information retrieval: a review. Comput Sci Rev 39(100):336
Google Scholar
Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: Twenty-eighth AAAI conference on artificial intelligence. AAAI, Québec City, Québec, Canada
Jiang QY, Li WJ (2017) Deep cross-modal hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3232–3240. IEEE, Honolulu, Hawaii, USA
Su S, Zhong Z, Zhang C (2019) Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3027–3035. IEEE, Long Beach, CA, USA
Cai L, Zhu L, Zhang H, Zhu X (2022) Da-gan: Dual attention generative adversarial network for cross-modal retrieval. Futur Internet 14(2):43
Article Google Scholar
Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018) Self-supervised adversarial hashing networks for cross-modal retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4242–4251. IEEE, Salt Lake City, Utah, USA
Wang B, Yang Y, Xu X, Hanjalic A, Shen HT (2017) Adversarial cross-modal retrieval. In: Proceedings of the 25th ACM international conference on Multimedia, pp 154–162. ACM, Mountain View, CA, USA
Gu W, Gu X, Gu J, Li B, Xiong Z, Wang W (2019) Adversary guided asymmetric hashing for cross-modal retrieval. In: Proceedings of the 2019 on international conference on multimedia retrieval, pp 159–167. ACM, New York, NY, USA
Pouyanfar S, Sadiq S, Yan Y, Tian H, Tao Y, Reyes MP, Shyu ML, Chen SC, Iyengar SS (2018) A survey on deep learning: Algorithms, techniques, and applications. ACM Comput Surv (CSUR) 51(5):1–36
Article Google Scholar
Zhang X, Zheng X, Mao W (2021) Adversarial perturbation defense on deep neural networks. ACM Comput Surv (CSUR) 54(8):1–36
Google Scholar
Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: Bengio Y, LeCun Y (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. arXiv:1412.6572
Xu H, Liu X, Li Y, Jain A, Tang J (2021) To be robust or to be fair: Towards fairness in adversarial training. In: International Conference on Machine Learning, PMLR, pp 11492–11501
Shaham U, Yamada Y, Negahban S (2018) Understanding adversarial training: Increasing local stability of supervised models through robust optimization. Neurocomputing 307:195–204
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. In: 6th International Conference on Learning Representations
Wong E, Rice L, Kolter JZ (2020) Fast is better than free: Revisiting adversarial training. arXiv preprint arXiv:2001.03994
Singla V, Singla S, Feizi S, Jacobs D (2021) Low curvature activations reduce overfitting in adversarial training. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 16423–16433. IEEE, Virtual
Tsipras D, Santurkar S, Engstrom L, Turner A, Madry A (2019) Robustness may be at odds with accuracy In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. http://openreview.net/forum?id=SyxAb30cY7
Kim H, Lee W, Lee J (2021) Understanding catastrophic overfitting in single-step adversarial training. Proceedings of the AAAI Conference on Artificial Intelligence 35:8119–8127
Article Google Scholar
Pereira JC, Coviello E, Doyle G, Rasiwasia N, Lanckriet GR, Levy R, Vasconcelos N (2013) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535
Article Google Scholar
Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on Multimedia information retrieval, pp 39–43. ACM, New York, NY, USA
Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of Singapore. In: Proceedings of the ACM international conference on image and video retrieval, pp 1–9. ACM, New York, NY, USA
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3864–3872. IEEE, Boston, MA, USA
Li Y, van Gemert J (2021) Deep unsupervised image hashing by maximizing bit entropy. Proceedings of the AAAI Conference on Artificial Intelligence 35:2002–2010
Article Google Scholar
Hoang T, Do T-T, Nguyen TV, Cheung N-M (2022) Multimodal mutual information maximization: A novel approach for unsupervised deep crossmodal hashing. IEEE Transactions on Neural Networks and Learning Systems, pp 1–14. https://doi.org/10.1109/TNNLS.2021.3135420
Zou X, Wu S, Zhang N, Bakker EM (2022) Multi-label modality enhanced attention based self-supervised deep cross-modal hashing. Knowledge-Based Systems 239:107927. https://doi.org/10.1016/j.knosys.2021.107927
Article Google Scholar
Shi Y, Zhao Y, Liu X, Zheng F, Ou W, You X, Peng Q (2022) Deep adaptively-enhanced hashing with discriminative similarity guidance for unsupervised cross-modal retrieval. IEEE Trans Circ Syst Video Technol 32(10):7255–7268. https://doi.org/10.1109/TCSVT.2022.3172716
Article Google Scholar
Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pp 415–424. ACM, New York, NY, USA
Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2075–2082. IEEE, Columbus, OH, USA
Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the AAAI conference on artificial intelligence, vol 28. AAAI, Québec City, Canada
Schmidt L, Santurkar S, Tsipras D, Talwar K, Madry A (2018) Adversarially robust generalization requires more data. Adv Neural Inf Process Syst 31:5019–5031
Zhang D, Zhang T, Lu Y, Zhu Z, Dong B (2019) You only propagate once: Accelerating adversarial training via maximal principle. Adv Neural Inf Process Syst 32:227–238

Download references

Acknowledgements

This work is supported by the Ministry of Science and Technology of China under Grant No:2020AAA0108401, and the Natural Science Foundation of China under Grant Nos. 72225011 and 71621002.

Author information

Authors and Affiliations

School of Artificial Intelligence, University of Chinese Academy of Sciences, No. 19, Yuquan Road, Beijing, 100049, Haidian, China
Xingwei Zhang, Xiaolong Zheng, Wenji Mao & Daniel Dajun Zeng
Institute of Automation, Chinese Academy of Sciences, No. 95 Zhongguancun East Road, Beijing, 100053, Beijing, China
Xingwei Zhang, Xiaolong Zheng, Wenji Mao & Daniel Dajun Zeng

Authors

Xingwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolong Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Wenji Mao
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Dajun Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaolong Zheng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Xiaolong Zheng and Xingwei Zhang contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, X., Zheng, X., Mao, W. et al. Boosting deep cross-modal retrieval hashing with adversarially robust training. Appl Intell 53, 23698–23710 (2023). https://doi.org/10.1007/s10489-023-04715-0

Download citation

Accepted: 15 May 2023
Published: 13 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10489-023-04715-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Boosting deep cross-modal retrieval hashing with adversarially robust training

Abstract

Access this article

Similar content being viewed by others

Attention-Aware Deep Adversarial Hashing for Cross-Modal Retrieval

A Novel Cross Modal Hashing Algorithm Based on Multi-modal Deep Learning

Deep semantic hashing with dual attention for cross-modal retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Boosting deep cross-modal retrieval hashing with adversarially robust training

Abstract

Access this article

Similar content being viewed by others

Attention-Aware Deep Adversarial Hashing for Cross-Modal Retrieval

A Novel Cross Modal Hashing Algorithm Based on Multi-modal Deep Learning

Deep semantic hashing with dual attention for cross-modal retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation