Deep Consistency Preserving Network for Unsupervised Cross-Modal Hashing

Li, Mengluan; Guo, Yanqing; Fu, Haiyan; Li, Yi; Su, Hong

doi:10.1007/978-981-99-8429-9_19

Mengluan Li¹⁵,
Yanqing Guo^15,16,
Haiyan Fu¹⁵,
Yi Li¹⁶ &
…
Hong Su¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14425))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

942 Accesses

Abstract

Given the proliferation of multimodal data in search engines and social networks, unsupervised cross-modal hashing has gained traction for its low storage consumption and fast retrieval speed. Despite the great success achieved, unsupervised cross-modal hashing still suffers from lacking reliable similarity supervision and struggles with reducing information loss caused by quantization. In this paper, we propose a novel deep consistency preserving network (DCPN) for unsupervised cross-modal hashing, which sufficiently utilizes the semantic information in different modalities. Specifically, we gain consistent features to fully exploit the co-occurrence information and alleviate the heterogeneity between different modalities. Then, a fusion similarity matrix construction method is proposed to capture the semantic relationship between instances. Finally, a fusion hash code reconstruction strategy is designed to fit the gap between different modalities and reduce the quantization error. Experimental results demonstrate the effectiveness of the proposed DCPN on unsupervised cross-modal retrieval tasks.

This work is supported in part by the National Natural Science Foundation of China (No. 62106037, No. 62076052), in part by the Major Program of the National Social Science Foundation of China (No.19ZDA127), and in part by the Fundamental Research Funds for the Central Universities (No. DUT22YG205).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from national university of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 1–9 (2009). https://doi.org/10.1145/1646396.1646452
Ding, G., Guo, Y., Zhou, J.: Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2075–2082 (2014). https://doi.org/10.1109/CVPR.2014.267
Hu, H., Xie, L., Hong, R., Tian, Q.: Creating something from nothing: unsupervised knowledge distillation for cross-modal hashing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3123–3132 (2020). https://doi.org/10.1109/cvpr42600.2020.00319
Hu, P., Zhu, H., Lin, J., Peng, D., Zhao, Y.P., Peng, X.: Unsupervised contrastive cross-modal hashing. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 3877–3889 (2023). https://doi.org/10.1109/TPAMI.2022.3177356
Article Google Scholar
Huiskes, M.J., Lew, M.S.: The MIR flickr retrieval evaluation. In: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, pp. 39–43 (2008). https://doi.org/10.1145/1460096.1460104
Li, C., Deng, C., Wang, L., Xie, D., Liu, X.: Coupled cyclegan: unsupervised hashing network for cross-modal retrieval. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 176–183 (2019). https://doi.org/10.1609/aaai.v33i01.3301176
Li, T., Yang, X., Wang, B., Xi, C., Zheng, H., Zhou, X.: Bi-CMR: bidirectional reinforcement guided hashing for effective cross-modal retrieval. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 10275–10282 (2022). https://doi.org/10.1609/aaai.v36i9.21268
Liu, S., Qian, S., Guan, Y., Zhan, J., Ying, L.: Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1379–1388 (2020). https://doi.org/10.1145/3397271.3401086
Mingyong, L., Yewen, L., Mingyuan, G., Longfei, M.: Clip-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval. Int. J. Multimed. Inf. Retr. 12(1), 2 (2023)
Article Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Shi, Y., et al.: Deep adaptively-enhanced hashing with discriminative similarity guidance for unsupervised cross-modal retrieval. IEEE Trans. Circuits Syst. Video Technol. 32(10), 7255–7268 (2022)
Article Google Scholar
Song, J., Yang, Y., Yang, Y., Huang, Z., Shen, H.T.: Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 785–796 (2013). https://doi.org/10.1145/2463676.2465274
Su, S., Zhong, Z., Zhang, C.: Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3027–3035 (2019). https://doi.org/10.1109/ICCV.2019.00312
Tu, R.C., Jiang, J., Lin, Q., Cai, C., Tian, S., Wang, H., Liu, W.: Unsupervised cross-modal hashing with modality-interaction. IEEE Trans. Circuits Syst. Video Technol. 1 (2023). https://doi.org/10.1109/TCSVT.2023.3251395
Wang, D., Wang, Q., Gao, X.: Robust and flexible discrete hashing for cross-modal similarity search. IEEE Trans. Circuits Syst. Video Technol. 28(10), 2703–2715 (2017)
Article Google Scholar
Yang, D., Wu, D., Zhang, W., Zhang, H., Li, B., Wang, W.: Deep semantic-alignment hashing for unsupervised cross-modal retrieval. In: Proceedings of the 2020 International Conference on Multimedia Retrieval, pp. 44–52 (2020). https://doi.org/10.1145/3372278.3390673
Yu, H., Ding, S., Li, L., Wu, J.: Self-attentive clip hashing for unsupervised cross-modal retrieval. In: Proceedings of the 4th ACM International Conference on Multimedia in Asia, pp. 1–7 (2022). https://doi.org/10.1145/3551626.3564945
Yu, J., Zhou, H., Zhan, Y., Tao, D.: Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 4626–4634 (2021). https://doi.org/10.1609/aaai.v35i5.16592
Zhang, J., Peng, Y., Yuan, M.: Unsupervised generative adversarial cross-modal hashing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018). https://doi.org/10.1609/aaai.v32i1.11263
Zhang, P.F., Luo, Y., Huang, Z., Xu, X.S., Song, J.: High-order nonlocal hashing for unsupervised cross-modal retrieval. World Wide Web 24, 563–583 (2021)
Article Google Scholar
Zhou, J., Ding, G., Guo, Y.: Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 415–424 (2014). https://doi.org/10.1145/2600428.2609610
Zhuo, Y., Li, Y., Hsiao, J., Ho, C., Li, B.: Clip4hashing: unsupervised deep hashing for cross-modal video-text retrieval. In: Proceedings of the 2022 International Conference on Multimedia Retrieval, pp. 158–166 (2022). https://doi.org/10.1145/3512527.3531381

Download references

Author information

Authors and Affiliations

School of Information and Communication Engineering, Dalian University of Technology, Dalian, China
Mengluan Li, Yanqing Guo & Haiyan Fu
School of Future Technology/School of Artificial Intelligence, Dalian University of Technology, Dalian, China
Yanqing Guo & Yi Li
Science and Technology on Communication Security Laboratory, Chengdu, China
Hong Su

Authors

Mengluan Li
View author publications
You can also search for this author in PubMed Google Scholar
Yanqing Guo
View author publications
You can also search for this author in PubMed Google Scholar
Haiyan Fu
View author publications
You can also search for this author in PubMed Google Scholar
Yi Li
View author publications
You can also search for this author in PubMed Google Scholar
Hong Su
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Li .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, M., Guo, Y., Fu, H., Li, Y., Su, H. (2024). Deep Consistency Preserving Network for Unsupervised Cross-Modal Hashing. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14425. Springer, Singapore. https://doi.org/10.1007/978-981-99-8429-9_19

Download citation

DOI: https://doi.org/10.1007/978-981-99-8429-9_19
Published: 24 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8428-2
Online ISBN: 978-981-99-8429-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deep Consistency Preserving Network for Unsupervised Cross-Modal Hashing