Skip to main content
Log in

Deep noise mitigation and semantic reconstruction hashing for unsupervised cross-modal retrieval

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Cross-modal hashing has attracted much attention due to low storage cost and high retrieval efficiency. Compared with the supervised counterparts, the unsupervised cross-modal hashing methods suffer from severe performance degradation without label guidance. Pseudo label-based unsupervised methods have been proved to be an effective way to improve the discriminative ability of hash codes. However, there are varies of noises during the process of creating pseudo labels by clustering algorithms. To mitigate the effects of noise, in this paper, we propose a novel deep noise mitigation and semantic reconstruction hashing (DNMSRH) for unsupervised cross-modal retrieval. Specifically, an unsupervised data balancing strategy is used to search the equivalent training data in each cluster satisfying the distribution of the minimum variance within the class and the maximum variance between classes, which effectively mitigates the data noise caused by the misclassification of outliers. Meanwhile, a joint symmetric multi-metric similarity reconstruction framework is constructed, which cannot only joint the semantic information of heterogeneous modalities, but also preserve and extend the pairwise instance correlation of original features. Furthermore, offline hard and online soft pseudo labels are introduced to mitigate the effects of noisy labels, where soft pseudo labels are generated by the collaborative training of heterogeneous image and text networks. Extensive experiments on three benchmark datasets for unsupervised cross-modal retrieval demonstrate that DNMSRH significantly outperforms the state-of-the-art competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Jia Y, Salzmann M, Darrell T (2011) Learning cross-modality similarity for multinomial data. In: 2011 international conference on computer vision, pp 2407–2414. IEEE

  2. Wang C, Yang H, Meinel C (2015) Deep semantic mapping for cross-modal retrieval. In: 2015 IEEE 27th international conference on tools with artificial intelligence (ICTAI), pp 234–241. IEEE

  3. Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans Image Process 25(7):3157–3166

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  4. Xu X, Shen F, Yang Y, Shen HT, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans Image Process 26(5):2494–2507

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  5. Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 3594–3601. IEEE

  6. Zhang D, Li W-J (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: proceedings of the AAAI conference on artificial intelligence, vol. 28

  7. Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3864–3872

  8. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110

    Article  Google Scholar 

  9. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vision 42(3):145–175

    Article  Google Scholar 

  10. Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: Closing the gap to human-level performance in face verification. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 1701–1708

  11. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823

  12. Yao T, Long F, Mei T, Rui Y (2016) Deep semantic-preserving and ranking-based hashing for image retrieval. In: IJCAI, vol. 1, p. 4

  13. Noh H, Araujo A, Sim J, Weyand T, Han B (2017) Large-scale image retrieval with attentive deep local features. In: proceedings of the IEEE international conference on computer vision, pp 3456–3465

  14. Girshick R (2015) Fast r-cnn. In: proceedings of the IEEE international conference on computer vision, pp 1440–1448

  15. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  16. Jiang Q-Y, Li W-J (2017) Deep cross-modal hashing. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 3232–3240

  17. Yang E, Deng C, Liu W, Liu X, Tao D, Gao X (2017) Pairwise relationship guided deep hashing for cross-modal retrieval. In: proceedings of the AAAI conference on artificial intelligence, vol. 31

  18. Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018) Self-supervised adversarial hashing networks for cross-modal retrieval. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 4242–4251

  19. Deng C, Chen Z, Liu X, Gao X, Tao D (2018) Triplet-based deep hashing network for cross-modal retrieval. IEEE Trans Image Process 27(8):3893–3903

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  20. Zhang X, Lai H, Feng J (2018) Attention-aware deep adversarial hashing for cross-modal retrieval. In: proceedings of the european conference on computer vision (ECCV), pp 591–606

  21. Xu R, Li C, Yan J, Deng C, Liu X (2019) Graph convolutional network hashing for cross-modal retrieval. In: Ijcai, pp 982–988

  22. Qiang H, Wan Y, Liu Z, Xiang L, Meng X (2020) Discriminative deep asymmetric supervised hashing for cross-modal retrieval. Knowl-Based Syst 204:106188

    Article  Google Scholar 

  23. Su S, Zhong Z, Zhang C (2019) Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: proceedings of the IEEE/CVF international conference on computer vision, pp 3027–3035

  24. Yang D, Wu D, Zhang W, Zhang H, Li B, Wang W (2020) Deep semantic-alignment hashing for unsupervised cross-modal retrieval. In: proceedings of the 2020 international conference on multimedia retrieval, pp 44–52

  25. Hu H, Xie L, Hong R, Tian Q (2020) Creating something from nothing: Unsupervised knowledge distillation for cross-modal hashing. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3123–3132

  26. Zhang H, Liu L, Long Y, Shao L (2017) Unsupervised deep hashing with pseudo labels for scalable image retrieval. IEEE Trans Image Process 27(4):1626–1638

    Article  ADS  MathSciNet  Google Scholar 

  27. Hu Q, Wu J, Cheng J, Wu L, Lu H (2017) Pseudo label based unsupervised deep discriminative hashing for image retrieval. In: proceedings of the 25th ACM international conference on multimedia, pp 1584–1590

  28. Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: twenty-second international joint conference on artificial intelligence

  29. Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: proceedings of the 2013 ACM SIGMOD international conference on management of data, pp 785–796

  30. Zhu X, Huang Z, Shen HT, Zhao X (2013) Linear cross-modal hashing for efficient multimedia search. In: proceedings of the 21st ACM international conference on multimedia, pp 143–152

  31. Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, pp 415–424

  32. Irie G, Arai H, Taniguchi Y (2015) Alternating co-quantization for cross-modal hashing. In: proceedings of the IEEE international conference on computer vision, pp 1886–1894

  33. Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 2075–2082

  34. Wu G, Lin Z, Han J, Liu L, Ding G, Zhang B, Shen J (2018) Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In: IJCAI, pp 2854–2860

  35. Zhang J, Peng Y, Yuan M (2018) Unsupervised generative adversarial cross-modal hashing. In: proceedings of the AAAI conference on artificial intelligence, vol. 32

  36. Liu S, Qian S, Guan Y, Zhan J, Ying L (2020) Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. In: proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 1379–1388

  37. Yu J, Zhou H, Zhan Y, Tao D (2021) Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing

  38. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  39. Wang T, Zhu L, Cheng Z, Li J, Gao Z (2020) Unsupervised deep cross-modal hashing with virtual label regression. Neurocomputing 386:84–96

    Article  Google Scholar 

  40. Xie Q, Luong M-T, Hovy E, Le QV (2020) Self-training with noisy student improves imagenet classification. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10687–10698

  41. Ge Y, Chen D, Li H (2020) Mutual mean-teaching: pseudo label refinery for unsupervised domain adaptation on person re-identification. arXiv preprint arXiv:2001.01526

  42. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  43. Song J, He T, Gao L, Xu X, Hanjalic A, Shen HT (2018) Binary generative adversarial networks for image retrieval. In: proceedings of the AAAI conference on artificial intelligence, vol. 32

  44. Deng C, Yang E, Liu T, Li J, Liu W, Tao D (2019) Unsupervised semantic-preserving adversarial hashing for image search. IEEE Trans Image Process 28(8):4032–4044

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  45. Pereira JC, Coviello E, Doyle G, Rasiwasia N, Lanckriet GR, Levy R, Vasconcelos N (2013) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535

    Article  Google Scholar 

  46. Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: proceedings of the 1st ACM international conference on multimedia information retrieval, pp 39–43

  47. Chua T-S, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of singapore. In: proceedings of the ACM international conference on image and video retrieval, pp 1–9

Download references

Acknowledgements

This work is supported by the Fundamental Research Funds for the Central Universities (WUT: 212274015).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuan Wan.

Ethics declarations

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, C., Wan, Y. & Qiang, H. Deep noise mitigation and semantic reconstruction hashing for unsupervised cross-modal retrieval. Neural Comput & Applic 36, 5383–5397 (2024). https://doi.org/10.1007/s00521-023-09331-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09331-0

Keywords

Navigation