Deep noise mitigation and semantic reconstruction hashing for unsupervised cross-modal retrieval

Zhang, Cheng; Wan, Yuan; Qiang, Haopeng

doi:10.1007/s00521-023-09331-0

Deep noise mitigation and semantic reconstruction hashing for unsupervised cross-modal retrieval

Original Article
Published: 03 January 2024

Volume 36, pages 5383–5397, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Cheng Zhang¹,
Yuan Wan¹ &
Haopeng Qiang¹

234 Accesses
Explore all metrics

Abstract

Cross-modal hashing has attracted much attention due to low storage cost and high retrieval efficiency. Compared with the supervised counterparts, the unsupervised cross-modal hashing methods suffer from severe performance degradation without label guidance. Pseudo label-based unsupervised methods have been proved to be an effective way to improve the discriminative ability of hash codes. However, there are varies of noises during the process of creating pseudo labels by clustering algorithms. To mitigate the effects of noise, in this paper, we propose a novel deep noise mitigation and semantic reconstruction hashing (DNMSRH) for unsupervised cross-modal retrieval. Specifically, an unsupervised data balancing strategy is used to search the equivalent training data in each cluster satisfying the distribution of the minimum variance within the class and the maximum variance between classes, which effectively mitigates the data noise caused by the misclassification of outliers. Meanwhile, a joint symmetric multi-metric similarity reconstruction framework is constructed, which cannot only joint the semantic information of heterogeneous modalities, but also preserve and extend the pairwise instance correlation of original features. Furthermore, offline hard and online soft pseudo labels are introduced to mitigate the effects of noisy labels, where soft pseudo labels are generated by the collaborative training of heterogeneous image and text networks. Extensive experiments on three benchmark datasets for unsupervised cross-modal retrieval demonstrate that DNMSRH significantly outperforms the state-of-the-art competitors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning with Noisy Correspondence

Article 13 April 2024

Source bias reduction for source-free domain adaptation

Article 18 April 2024

Robust zero-shot discrete hashing with noisy labels for cross-modal retrieval

Article 13 April 2024

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Jia Y, Salzmann M, Darrell T (2011) Learning cross-modality similarity for multinomial data. In: 2011 international conference on computer vision, pp 2407–2414. IEEE
Wang C, Yang H, Meinel C (2015) Deep semantic mapping for cross-modal retrieval. In: 2015 IEEE 27th international conference on tools with artificial intelligence (ICTAI), pp 234–241. IEEE
Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans Image Process 25(7):3157–3166
Article ADS MathSciNet PubMed Google Scholar
Xu X, Shen F, Yang Y, Shen HT, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans Image Process 26(5):2494–2507
Article ADS MathSciNet PubMed Google Scholar
Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 3594–3601. IEEE
Zhang D, Li W-J (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: proceedings of the AAAI conference on artificial intelligence, vol. 28
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3864–3872
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110
Article Google Scholar
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vision 42(3):145–175
Article Google Scholar
Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: Closing the gap to human-level performance in face verification. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 1701–1708
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823
Yao T, Long F, Mei T, Rui Y (2016) Deep semantic-preserving and ranking-based hashing for image retrieval. In: IJCAI, vol. 1, p. 4
Noh H, Araujo A, Sim J, Weyand T, Han B (2017) Large-scale image retrieval with attentive deep local features. In: proceedings of the IEEE international conference on computer vision, pp 3456–3465
Girshick R (2015) Fast r-cnn. In: proceedings of the IEEE international conference on computer vision, pp 1440–1448
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Jiang Q-Y, Li W-J (2017) Deep cross-modal hashing. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 3232–3240
Yang E, Deng C, Liu W, Liu X, Tao D, Gao X (2017) Pairwise relationship guided deep hashing for cross-modal retrieval. In: proceedings of the AAAI conference on artificial intelligence, vol. 31
Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018) Self-supervised adversarial hashing networks for cross-modal retrieval. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 4242–4251
Deng C, Chen Z, Liu X, Gao X, Tao D (2018) Triplet-based deep hashing network for cross-modal retrieval. IEEE Trans Image Process 27(8):3893–3903
Article ADS MathSciNet PubMed Google Scholar
Zhang X, Lai H, Feng J (2018) Attention-aware deep adversarial hashing for cross-modal retrieval. In: proceedings of the european conference on computer vision (ECCV), pp 591–606
Xu R, Li C, Yan J, Deng C, Liu X (2019) Graph convolutional network hashing for cross-modal retrieval. In: Ijcai, pp 982–988
Qiang H, Wan Y, Liu Z, Xiang L, Meng X (2020) Discriminative deep asymmetric supervised hashing for cross-modal retrieval. Knowl-Based Syst 204:106188
Article Google Scholar
Su S, Zhong Z, Zhang C (2019) Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: proceedings of the IEEE/CVF international conference on computer vision, pp 3027–3035
Yang D, Wu D, Zhang W, Zhang H, Li B, Wang W (2020) Deep semantic-alignment hashing for unsupervised cross-modal retrieval. In: proceedings of the 2020 international conference on multimedia retrieval, pp 44–52
Hu H, Xie L, Hong R, Tian Q (2020) Creating something from nothing: Unsupervised knowledge distillation for cross-modal hashing. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3123–3132
Zhang H, Liu L, Long Y, Shao L (2017) Unsupervised deep hashing with pseudo labels for scalable image retrieval. IEEE Trans Image Process 27(4):1626–1638
Article ADS MathSciNet Google Scholar
Hu Q, Wu J, Cheng J, Wu L, Lu H (2017) Pseudo label based unsupervised deep discriminative hashing for image retrieval. In: proceedings of the 25th ACM international conference on multimedia, pp 1584–1590
Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: twenty-second international joint conference on artificial intelligence
Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: proceedings of the 2013 ACM SIGMOD international conference on management of data, pp 785–796
Zhu X, Huang Z, Shen HT, Zhao X (2013) Linear cross-modal hashing for efficient multimedia search. In: proceedings of the 21st ACM international conference on multimedia, pp 143–152
Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, pp 415–424
Irie G, Arai H, Taniguchi Y (2015) Alternating co-quantization for cross-modal hashing. In: proceedings of the IEEE international conference on computer vision, pp 1886–1894
Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 2075–2082
Wu G, Lin Z, Han J, Liu L, Ding G, Zhang B, Shen J (2018) Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In: IJCAI, pp 2854–2860
Zhang J, Peng Y, Yuan M (2018) Unsupervised generative adversarial cross-modal hashing. In: proceedings of the AAAI conference on artificial intelligence, vol. 32
Liu S, Qian S, Guan Y, Zhan J, Ying L (2020) Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. In: proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 1379–1388
Yu J, Zhou H, Zhan Y, Tao D (2021) Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Wang T, Zhu L, Cheng Z, Li J, Gao Z (2020) Unsupervised deep cross-modal hashing with virtual label regression. Neurocomputing 386:84–96
Article Google Scholar
Xie Q, Luong M-T, Hovy E, Le QV (2020) Self-training with noisy student improves imagenet classification. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10687–10698
Ge Y, Chen D, Li H (2020) Mutual mean-teaching: pseudo label refinery for unsupervised domain adaptation on person re-identification. arXiv preprint arXiv:2001.01526
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Google Scholar
Song J, He T, Gao L, Xu X, Hanjalic A, Shen HT (2018) Binary generative adversarial networks for image retrieval. In: proceedings of the AAAI conference on artificial intelligence, vol. 32
Deng C, Yang E, Liu T, Li J, Liu W, Tao D (2019) Unsupervised semantic-preserving adversarial hashing for image search. IEEE Trans Image Process 28(8):4032–4044
Article ADS MathSciNet PubMed Google Scholar
Pereira JC, Coviello E, Doyle G, Rasiwasia N, Lanckriet GR, Levy R, Vasconcelos N (2013) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535
Article Google Scholar
Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: proceedings of the 1st ACM international conference on multimedia information retrieval, pp 39–43
Chua T-S, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of singapore. In: proceedings of the ACM international conference on image and video retrieval, pp 1–9

Download references

Acknowledgements

This work is supported by the Fundamental Research Funds for the Central Universities (WUT: 212274015).

Author information

Authors and Affiliations

Mathematical Department, Wuhan University of Technology, Wuhan, 430070, China
Cheng Zhang, Yuan Wan & Haopeng Qiang

Authors

Cheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Wan
View author publications
You can also search for this author in PubMed Google Scholar
Haopeng Qiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuan Wan.

Ethics declarations

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, C., Wan, Y. & Qiang, H. Deep noise mitigation and semantic reconstruction hashing for unsupervised cross-modal retrieval. Neural Comput & Applic 36, 5383–5397 (2024). https://doi.org/10.1007/s00521-023-09331-0

Download citation

Received: 14 June 2022
Accepted: 26 November 2023
Published: 03 January 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s00521-023-09331-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep noise mitigation and semantic reconstruction hashing for unsupervised cross-modal retrieval

Abstract

Access this article

Similar content being viewed by others

Learning with Noisy Correspondence

Source bias reduction for source-free domain adaptation

Robust zero-shot discrete hashing with noisy labels for cross-modal retrieval

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep noise mitigation and semantic reconstruction hashing for unsupervised cross-modal retrieval

Abstract

Access this article

Similar content being viewed by others

Learning with Noisy Correspondence

Source bias reduction for source-free domain adaptation

Robust zero-shot discrete hashing with noisy labels for cross-modal retrieval

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation