Abstract
Cross-modal hashing has attracted much attention due to low storage cost and high retrieval efficiency. Compared with the supervised counterparts, the unsupervised cross-modal hashing methods suffer from severe performance degradation without label guidance. Pseudo label-based unsupervised methods have been proved to be an effective way to improve the discriminative ability of hash codes. However, there are varies of noises during the process of creating pseudo labels by clustering algorithms. To mitigate the effects of noise, in this paper, we propose a novel deep noise mitigation and semantic reconstruction hashing (DNMSRH) for unsupervised cross-modal retrieval. Specifically, an unsupervised data balancing strategy is used to search the equivalent training data in each cluster satisfying the distribution of the minimum variance within the class and the maximum variance between classes, which effectively mitigates the data noise caused by the misclassification of outliers. Meanwhile, a joint symmetric multi-metric similarity reconstruction framework is constructed, which cannot only joint the semantic information of heterogeneous modalities, but also preserve and extend the pairwise instance correlation of original features. Furthermore, offline hard and online soft pseudo labels are introduced to mitigate the effects of noisy labels, where soft pseudo labels are generated by the collaborative training of heterogeneous image and text networks. Extensive experiments on three benchmark datasets for unsupervised cross-modal retrieval demonstrate that DNMSRH significantly outperforms the state-of-the-art competitors.
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Jia Y, Salzmann M, Darrell T (2011) Learning cross-modality similarity for multinomial data. In: 2011 international conference on computer vision, pp 2407–2414. IEEE
Wang C, Yang H, Meinel C (2015) Deep semantic mapping for cross-modal retrieval. In: 2015 IEEE 27th international conference on tools with artificial intelligence (ICTAI), pp 234–241. IEEE
Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans Image Process 25(7):3157–3166
Xu X, Shen F, Yang Y, Shen HT, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans Image Process 26(5):2494–2507
Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 3594–3601. IEEE
Zhang D, Li W-J (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: proceedings of the AAAI conference on artificial intelligence, vol. 28
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3864–3872
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vision 42(3):145–175
Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: Closing the gap to human-level performance in face verification. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 1701–1708
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823
Yao T, Long F, Mei T, Rui Y (2016) Deep semantic-preserving and ranking-based hashing for image retrieval. In: IJCAI, vol. 1, p. 4
Noh H, Araujo A, Sim J, Weyand T, Han B (2017) Large-scale image retrieval with attentive deep local features. In: proceedings of the IEEE international conference on computer vision, pp 3456–3465
Girshick R (2015) Fast r-cnn. In: proceedings of the IEEE international conference on computer vision, pp 1440–1448
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Jiang Q-Y, Li W-J (2017) Deep cross-modal hashing. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 3232–3240
Yang E, Deng C, Liu W, Liu X, Tao D, Gao X (2017) Pairwise relationship guided deep hashing for cross-modal retrieval. In: proceedings of the AAAI conference on artificial intelligence, vol. 31
Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018) Self-supervised adversarial hashing networks for cross-modal retrieval. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 4242–4251
Deng C, Chen Z, Liu X, Gao X, Tao D (2018) Triplet-based deep hashing network for cross-modal retrieval. IEEE Trans Image Process 27(8):3893–3903
Zhang X, Lai H, Feng J (2018) Attention-aware deep adversarial hashing for cross-modal retrieval. In: proceedings of the european conference on computer vision (ECCV), pp 591–606
Xu R, Li C, Yan J, Deng C, Liu X (2019) Graph convolutional network hashing for cross-modal retrieval. In: Ijcai, pp 982–988
Qiang H, Wan Y, Liu Z, Xiang L, Meng X (2020) Discriminative deep asymmetric supervised hashing for cross-modal retrieval. Knowl-Based Syst 204:106188
Su S, Zhong Z, Zhang C (2019) Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: proceedings of the IEEE/CVF international conference on computer vision, pp 3027–3035
Yang D, Wu D, Zhang W, Zhang H, Li B, Wang W (2020) Deep semantic-alignment hashing for unsupervised cross-modal retrieval. In: proceedings of the 2020 international conference on multimedia retrieval, pp 44–52
Hu H, Xie L, Hong R, Tian Q (2020) Creating something from nothing: Unsupervised knowledge distillation for cross-modal hashing. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3123–3132
Zhang H, Liu L, Long Y, Shao L (2017) Unsupervised deep hashing with pseudo labels for scalable image retrieval. IEEE Trans Image Process 27(4):1626–1638
Hu Q, Wu J, Cheng J, Wu L, Lu H (2017) Pseudo label based unsupervised deep discriminative hashing for image retrieval. In: proceedings of the 25th ACM international conference on multimedia, pp 1584–1590
Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: twenty-second international joint conference on artificial intelligence
Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: proceedings of the 2013 ACM SIGMOD international conference on management of data, pp 785–796
Zhu X, Huang Z, Shen HT, Zhao X (2013) Linear cross-modal hashing for efficient multimedia search. In: proceedings of the 21st ACM international conference on multimedia, pp 143–152
Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, pp 415–424
Irie G, Arai H, Taniguchi Y (2015) Alternating co-quantization for cross-modal hashing. In: proceedings of the IEEE international conference on computer vision, pp 1886–1894
Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 2075–2082
Wu G, Lin Z, Han J, Liu L, Ding G, Zhang B, Shen J (2018) Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In: IJCAI, pp 2854–2860
Zhang J, Peng Y, Yuan M (2018) Unsupervised generative adversarial cross-modal hashing. In: proceedings of the AAAI conference on artificial intelligence, vol. 32
Liu S, Qian S, Guan Y, Zhan J, Ying L (2020) Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. In: proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 1379–1388
Yu J, Zhou H, Zhan Y, Tao D (2021) Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Wang T, Zhu L, Cheng Z, Li J, Gao Z (2020) Unsupervised deep cross-modal hashing with virtual label regression. Neurocomputing 386:84–96
Xie Q, Luong M-T, Hovy E, Le QV (2020) Self-training with noisy student improves imagenet classification. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10687–10698
Ge Y, Chen D, Li H (2020) Mutual mean-teaching: pseudo label refinery for unsupervised domain adaptation on person re-identification. arXiv preprint arXiv:2001.01526
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Song J, He T, Gao L, Xu X, Hanjalic A, Shen HT (2018) Binary generative adversarial networks for image retrieval. In: proceedings of the AAAI conference on artificial intelligence, vol. 32
Deng C, Yang E, Liu T, Li J, Liu W, Tao D (2019) Unsupervised semantic-preserving adversarial hashing for image search. IEEE Trans Image Process 28(8):4032–4044
Pereira JC, Coviello E, Doyle G, Rasiwasia N, Lanckriet GR, Levy R, Vasconcelos N (2013) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535
Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: proceedings of the 1st ACM international conference on multimedia information retrieval, pp 39–43
Chua T-S, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of singapore. In: proceedings of the ACM international conference on image and video retrieval, pp 1–9
Acknowledgements
This work is supported by the Fundamental Research Funds for the Central Universities (WUT: 212274015).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, C., Wan, Y. & Qiang, H. Deep noise mitigation and semantic reconstruction hashing for unsupervised cross-modal retrieval. Neural Comput & Applic 36, 5383–5397 (2024). https://doi.org/10.1007/s00521-023-09331-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-09331-0