Skip to main content
Log in

Boosting deep cross-modal retrieval hashing with adversarially robust training

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Deep hashing methods effectively enhance the performance of conventional machine learning retrieval models, particularly in visual medium evolving cross-modal retrieval tasks, by relying on the outstanding feature extraction ability of deep neural networks (DNNs). The state-of-the-art deep hashing research focuses on designing prominent models by employing DNNs to discover semantic information from different modalities of data and execute relevant information retrieval tasks. However, the robustness attribute considered essential for reliable DNN model design has limited concerns on deep hashing models. In this article, we present an end-to-end adversarial training framework for cross-modal retrieval. Our framework leverages a projected gradient descent(PGD)-based method to generate adversarial samples, which are then combined with normal samples to achieve robust training. Our approach addresses the vulnerability issues of existing cross-modal retrieval models and fills the gap in retrieval task design. We conduct extensive experiments and compare our model with state-of-the-art cross-modal retrieval models on three benchmark datasets to verify that our model can effectively boost the performance of deep hashing retrieval models on cross-modal retrieval . This work highlights the effectiveness of adversarial training in efficient deep hashing model design.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: An overview with application to learning methods. Neural Comput 16(12):2639–2664

    Article  MATH  Google Scholar 

  2. Wang J, Zhang T, Sebe N, Shen HT et al (2017) A survey on learning to hash. IEEE Trans Pattern Anal Mach Intell 40(4):769–790

  3. Li YN, Wang P, Su YT (2015) Robust image hashing based on selective quaternion invariance. IEEE Signal Process Lett 22(12):2396–2400

    Article  Google Scholar 

  4. Shen X, Shen F, Sun QS, Yuan YH, Shen HT (2016) Robust cross-view hashing for multimedia retrieval. IEEE Signal Process Lett 23(6):893–897

    Article  Google Scholar 

  5. Lu J, Liong VE, Zhou J (2017) Deep hashing for scalable image search. IEEE Trans Image Process 26(5):2352–2367

    Article  MathSciNet  MATH  Google Scholar 

  6. Lai H, Pan Y, Liu Y, Yan S (2015) Simultaneous feature learning and hash coding with deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3270–3278. IEEE, Boston, MA, USA

  7. Liu H, Wang R, Shan S, Chen X (2016) Deep supervised hashing for fast image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2064–2072. IEEE, Las Vegas, Nevada, USA

  8. Zhu Y, Li Y, Wang S (2019) Unsupervised deep hashing with adaptive feature learning for image retrieval. IEEE Signal Process Lett 26(3):395–399

    Article  Google Scholar 

  9. Ma L, Li X, Shi Y, Wu J, Zhang Y (2020) Correlation filtering-based hashing for fine-grained image retrieval. IEEE Signal Process Lett 27:2129–2133

    Article  Google Scholar 

  10. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  11. Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, Talwar K, Zhang L (2016) Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. CCS’16, pp 308–318. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2976749.2978318

  12. Papernot N, McDaniel P, Goodfellow I, Jha S, Celik ZB, Swami A (2017) Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia conference on computer and communications security, pp 506–519. ACM. New York, NY, USA

  13. Zhang H, Yu Y, Jiao J, Xing E, El Ghaoui L, Jordan M (2019) Theoretically principled trade-off between robustness and accuracy. In: International conference on machine learning, PMLR, pp 7472–7482

  14. Salman H, Ilyas A, Engstrom L, Kapoor A, Madry A (2020) Do adversarially robust ImageNet models transfer better? Adv Neural Inf Process Syst 33:3533–3545

    Google Scholar 

  15. Kaur P, Pannu HS, Malhi AK (2021) Comparative analysis on cross-modal information retrieval: a review. Comput Sci Rev 39(100):336

    Google Scholar 

  16. Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: Twenty-eighth AAAI conference on artificial intelligence. AAAI, Québec City, Québec, Canada

  17. Jiang QY, Li WJ (2017) Deep cross-modal hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3232–3240. IEEE, Honolulu, Hawaii, USA

  18. Su S, Zhong Z, Zhang C (2019) Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3027–3035. IEEE, Long Beach, CA, USA

  19. Cai L, Zhu L, Zhang H, Zhu X (2022) Da-gan: Dual attention generative adversarial network for cross-modal retrieval. Futur Internet 14(2):43

    Article  Google Scholar 

  20. Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018) Self-supervised adversarial hashing networks for cross-modal retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4242–4251. IEEE, Salt Lake City, Utah, USA

  21. Wang B, Yang Y, Xu X, Hanjalic A, Shen HT (2017) Adversarial cross-modal retrieval. In: Proceedings of the 25th ACM international conference on Multimedia, pp 154–162. ACM, Mountain View, CA, USA

  22. Gu W, Gu X, Gu J, Li B, Xiong Z, Wang W (2019) Adversary guided asymmetric hashing for cross-modal retrieval. In: Proceedings of the 2019 on international conference on multimedia retrieval, pp 159–167. ACM, New York, NY, USA

  23. Pouyanfar S, Sadiq S, Yan Y, Tian H, Tao Y, Reyes MP, Shyu ML, Chen SC, Iyengar SS (2018) A survey on deep learning: Algorithms, techniques, and applications. ACM Comput Surv (CSUR) 51(5):1–36

    Article  Google Scholar 

  24. Zhang X, Zheng X, Mao W (2021) Adversarial perturbation defense on deep neural networks. ACM Comput Surv (CSUR) 54(8):1–36

    Google Scholar 

  25. Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: Bengio Y, LeCun Y (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. arXiv:1412.6572

  26. Xu H, Liu X, Li Y, Jain A, Tang J (2021) To be robust or to be fair: Towards fairness in adversarial training. In: International Conference on Machine Learning, PMLR, pp 11492–11501

  27. Shaham U, Yamada Y, Negahban S (2018) Understanding adversarial training: Increasing local stability of supervised models through robust optimization. Neurocomputing 307:195–204

  28. Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. In: 6th International Conference on Learning Representations

  29. Wong E, Rice L, Kolter JZ (2020) Fast is better than free: Revisiting adversarial training. arXiv preprint arXiv:2001.03994

  30. Singla V, Singla S, Feizi S, Jacobs D (2021) Low curvature activations reduce overfitting in adversarial training. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 16423–16433. IEEE, Virtual

  31. Tsipras D, Santurkar S, Engstrom L, Turner A, Madry A (2019) Robustness may be at odds with accuracy In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. http://openreview.net/forum?id=SyxAb30cY7

  32. Kim H, Lee W, Lee J (2021) Understanding catastrophic overfitting in single-step adversarial training. Proceedings of the AAAI Conference on Artificial Intelligence 35:8119–8127

    Article  Google Scholar 

  33. Pereira JC, Coviello E, Doyle G, Rasiwasia N, Lanckriet GR, Levy R, Vasconcelos N (2013) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535

    Article  Google Scholar 

  34. Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on Multimedia information retrieval, pp 39–43. ACM, New York, NY, USA

  35. Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of Singapore. In: Proceedings of the ACM international conference on image and video retrieval, pp 1–9. ACM, New York, NY, USA

  36. Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3864–3872. IEEE, Boston, MA, USA

  37. Li Y, van Gemert J (2021) Deep unsupervised image hashing by maximizing bit entropy. Proceedings of the AAAI Conference on Artificial Intelligence 35:2002–2010

    Article  Google Scholar 

  38. Hoang T, Do T-T, Nguyen TV, Cheung N-M (2022) Multimodal mutual information maximization: A novel approach for unsupervised deep crossmodal hashing. IEEE Transactions on Neural Networks and Learning Systems, pp 1–14. https://doi.org/10.1109/TNNLS.2021.3135420

  39. Zou X, Wu S, Zhang N, Bakker EM (2022) Multi-label modality enhanced attention based self-supervised deep cross-modal hashing. Knowledge-Based Systems 239:107927. https://doi.org/10.1016/j.knosys.2021.107927

    Article  Google Scholar 

  40. Shi Y, Zhao Y, Liu X, Zheng F, Ou W, You X, Peng Q (2022) Deep adaptively-enhanced hashing with discriminative similarity guidance for unsupervised cross-modal retrieval. IEEE Trans Circ Syst Video Technol 32(10):7255–7268. https://doi.org/10.1109/TCSVT.2022.3172716

    Article  Google Scholar 

  41. Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pp 415–424. ACM, New York, NY, USA

  42. Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2075–2082. IEEE, Columbus, OH, USA

  43. Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the AAAI conference on artificial intelligence, vol 28. AAAI, Québec City, Canada

  44. Schmidt L, Santurkar S, Tsipras D, Talwar K, Madry A (2018) Adversarially robust generalization requires more data. Adv Neural Inf Process Syst 31:5019–5031

  45. Zhang D, Zhang T, Lu Y, Zhu Z, Dong B (2019) You only propagate once: Accelerating adversarial training via maximal principle. Adv Neural Inf Process Syst 32:227–238

Download references

Acknowledgements

This work is supported by the Ministry of Science and Technology of China under Grant No:2020AAA0108401, and the Natural Science Foundation of China under Grant Nos. 72225011 and 71621002.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaolong Zheng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Xiaolong Zheng and Xingwei Zhang contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, X., Zheng, X., Mao, W. et al. Boosting deep cross-modal retrieval hashing with adversarially robust training. Appl Intell 53, 23698–23710 (2023). https://doi.org/10.1007/s10489-023-04715-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04715-0

Keywords

Navigation