Abstract
Given the proliferation of multimodal data in search engines and social networks, unsupervised cross-modal hashing has gained traction for its low storage consumption and fast retrieval speed. Despite the great success achieved, unsupervised cross-modal hashing still suffers from lacking reliable similarity supervision and struggles with reducing information loss caused by quantization. In this paper, we propose a novel deep consistency preserving network (DCPN) for unsupervised cross-modal hashing, which sufficiently utilizes the semantic information in different modalities. Specifically, we gain consistent features to fully exploit the co-occurrence information and alleviate the heterogeneity between different modalities. Then, a fusion similarity matrix construction method is proposed to capture the semantic relationship between instances. Finally, a fusion hash code reconstruction strategy is designed to fit the gap between different modalities and reduce the quantization error. Experimental results demonstrate the effectiveness of the proposed DCPN on unsupervised cross-modal retrieval tasks.
This work is supported in part by the National Natural Science Foundation of China (No. 62106037, No. 62076052), in part by the Major Program of the National Social Science Foundation of China (No.19ZDA127), and in part by the Fundamental Research Funds for the Central Universities (No. DUT22YG205).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from national university of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 1–9 (2009). https://doi.org/10.1145/1646396.1646452
Ding, G., Guo, Y., Zhou, J.: Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2075–2082 (2014). https://doi.org/10.1109/CVPR.2014.267
Hu, H., Xie, L., Hong, R., Tian, Q.: Creating something from nothing: unsupervised knowledge distillation for cross-modal hashing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3123–3132 (2020). https://doi.org/10.1109/cvpr42600.2020.00319
Hu, P., Zhu, H., Lin, J., Peng, D., Zhao, Y.P., Peng, X.: Unsupervised contrastive cross-modal hashing. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 3877–3889 (2023). https://doi.org/10.1109/TPAMI.2022.3177356
Huiskes, M.J., Lew, M.S.: The MIR flickr retrieval evaluation. In: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, pp. 39–43 (2008). https://doi.org/10.1145/1460096.1460104
Li, C., Deng, C., Wang, L., Xie, D., Liu, X.: Coupled cyclegan: unsupervised hashing network for cross-modal retrieval. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 176–183 (2019). https://doi.org/10.1609/aaai.v33i01.3301176
Li, T., Yang, X., Wang, B., Xi, C., Zheng, H., Zhou, X.: Bi-CMR: bidirectional reinforcement guided hashing for effective cross-modal retrieval. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 10275–10282 (2022). https://doi.org/10.1609/aaai.v36i9.21268
Liu, S., Qian, S., Guan, Y., Zhan, J., Ying, L.: Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1379–1388 (2020). https://doi.org/10.1145/3397271.3401086
Mingyong, L., Yewen, L., Mingyuan, G., Longfei, M.: Clip-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval. Int. J. Multimed. Inf. Retr. 12(1), 2 (2023)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Shi, Y., et al.: Deep adaptively-enhanced hashing with discriminative similarity guidance for unsupervised cross-modal retrieval. IEEE Trans. Circuits Syst. Video Technol. 32(10), 7255–7268 (2022)
Song, J., Yang, Y., Yang, Y., Huang, Z., Shen, H.T.: Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 785–796 (2013). https://doi.org/10.1145/2463676.2465274
Su, S., Zhong, Z., Zhang, C.: Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3027–3035 (2019). https://doi.org/10.1109/ICCV.2019.00312
Tu, R.C., Jiang, J., Lin, Q., Cai, C., Tian, S., Wang, H., Liu, W.: Unsupervised cross-modal hashing with modality-interaction. IEEE Trans. Circuits Syst. Video Technol. 1 (2023). https://doi.org/10.1109/TCSVT.2023.3251395
Wang, D., Wang, Q., Gao, X.: Robust and flexible discrete hashing for cross-modal similarity search. IEEE Trans. Circuits Syst. Video Technol. 28(10), 2703–2715 (2017)
Yang, D., Wu, D., Zhang, W., Zhang, H., Li, B., Wang, W.: Deep semantic-alignment hashing for unsupervised cross-modal retrieval. In: Proceedings of the 2020 International Conference on Multimedia Retrieval, pp. 44–52 (2020). https://doi.org/10.1145/3372278.3390673
Yu, H., Ding, S., Li, L., Wu, J.: Self-attentive clip hashing for unsupervised cross-modal retrieval. In: Proceedings of the 4th ACM International Conference on Multimedia in Asia, pp. 1–7 (2022). https://doi.org/10.1145/3551626.3564945
Yu, J., Zhou, H., Zhan, Y., Tao, D.: Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 4626–4634 (2021). https://doi.org/10.1609/aaai.v35i5.16592
Zhang, J., Peng, Y., Yuan, M.: Unsupervised generative adversarial cross-modal hashing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018). https://doi.org/10.1609/aaai.v32i1.11263
Zhang, P.F., Luo, Y., Huang, Z., Xu, X.S., Song, J.: High-order nonlocal hashing for unsupervised cross-modal retrieval. World Wide Web 24, 563–583 (2021)
Zhou, J., Ding, G., Guo, Y.: Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 415–424 (2014). https://doi.org/10.1145/2600428.2609610
Zhuo, Y., Li, Y., Hsiao, J., Ho, C., Li, B.: Clip4hashing: unsupervised deep hashing for cross-modal video-text retrieval. In: Proceedings of the 2022 International Conference on Multimedia Retrieval, pp. 158–166 (2022). https://doi.org/10.1145/3512527.3531381
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, M., Guo, Y., Fu, H., Li, Y., Su, H. (2024). Deep Consistency Preserving Network for Unsupervised Cross-Modal Hashing. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14425. Springer, Singapore. https://doi.org/10.1007/978-981-99-8429-9_19
Download citation
DOI: https://doi.org/10.1007/978-981-99-8429-9_19
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8428-2
Online ISBN: 978-981-99-8429-9
eBook Packages: Computer ScienceComputer Science (R0)