Skip to main content

Deep Consistency Preserving Network for Unsupervised Cross-Modal Hashing

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14425))

Included in the following conference series:

  • 942 Accesses

Abstract

Given the proliferation of multimodal data in search engines and social networks, unsupervised cross-modal hashing has gained traction for its low storage consumption and fast retrieval speed. Despite the great success achieved, unsupervised cross-modal hashing still suffers from lacking reliable similarity supervision and struggles with reducing information loss caused by quantization. In this paper, we propose a novel deep consistency preserving network (DCPN) for unsupervised cross-modal hashing, which sufficiently utilizes the semantic information in different modalities. Specifically, we gain consistent features to fully exploit the co-occurrence information and alleviate the heterogeneity between different modalities. Then, a fusion similarity matrix construction method is proposed to capture the semantic relationship between instances. Finally, a fusion hash code reconstruction strategy is designed to fit the gap between different modalities and reduce the quantization error. Experimental results demonstrate the effectiveness of the proposed DCPN on unsupervised cross-modal retrieval tasks.

This work is supported in part by the National Natural Science Foundation of China (No. 62106037, No. 62076052), in part by the Major Program of the National Social Science Foundation of China (No.19ZDA127), and in part by the Fundamental Research Funds for the Central Universities (No. DUT22YG205).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.: NUS-WIDE: a real-world web image database from national university of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, pp. 1–9 (2009). https://doi.org/10.1145/1646396.1646452

  2. Ding, G., Guo, Y., Zhou, J.: Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2075–2082 (2014). https://doi.org/10.1109/CVPR.2014.267

  3. Hu, H., Xie, L., Hong, R., Tian, Q.: Creating something from nothing: unsupervised knowledge distillation for cross-modal hashing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3123–3132 (2020). https://doi.org/10.1109/cvpr42600.2020.00319

  4. Hu, P., Zhu, H., Lin, J., Peng, D., Zhao, Y.P., Peng, X.: Unsupervised contrastive cross-modal hashing. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 3877–3889 (2023). https://doi.org/10.1109/TPAMI.2022.3177356

    Article  Google Scholar 

  5. Huiskes, M.J., Lew, M.S.: The MIR flickr retrieval evaluation. In: Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, pp. 39–43 (2008). https://doi.org/10.1145/1460096.1460104

  6. Li, C., Deng, C., Wang, L., Xie, D., Liu, X.: Coupled cyclegan: unsupervised hashing network for cross-modal retrieval. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 176–183 (2019). https://doi.org/10.1609/aaai.v33i01.3301176

  7. Li, T., Yang, X., Wang, B., Xi, C., Zheng, H., Zhou, X.: Bi-CMR: bidirectional reinforcement guided hashing for effective cross-modal retrieval. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 10275–10282 (2022). https://doi.org/10.1609/aaai.v36i9.21268

  8. Liu, S., Qian, S., Guan, Y., Zhan, J., Ying, L.: Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1379–1388 (2020). https://doi.org/10.1145/3397271.3401086

  9. Mingyong, L., Yewen, L., Mingyuan, G., Longfei, M.: Clip-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval. Int. J. Multimed. Inf. Retr. 12(1), 2 (2023)

    Article  Google Scholar 

  10. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  11. Shi, Y., et al.: Deep adaptively-enhanced hashing with discriminative similarity guidance for unsupervised cross-modal retrieval. IEEE Trans. Circuits Syst. Video Technol. 32(10), 7255–7268 (2022)

    Article  Google Scholar 

  12. Song, J., Yang, Y., Yang, Y., Huang, Z., Shen, H.T.: Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 785–796 (2013). https://doi.org/10.1145/2463676.2465274

  13. Su, S., Zhong, Z., Zhang, C.: Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3027–3035 (2019). https://doi.org/10.1109/ICCV.2019.00312

  14. Tu, R.C., Jiang, J., Lin, Q., Cai, C., Tian, S., Wang, H., Liu, W.: Unsupervised cross-modal hashing with modality-interaction. IEEE Trans. Circuits Syst. Video Technol. 1 (2023). https://doi.org/10.1109/TCSVT.2023.3251395

  15. Wang, D., Wang, Q., Gao, X.: Robust and flexible discrete hashing for cross-modal similarity search. IEEE Trans. Circuits Syst. Video Technol. 28(10), 2703–2715 (2017)

    Article  Google Scholar 

  16. Yang, D., Wu, D., Zhang, W., Zhang, H., Li, B., Wang, W.: Deep semantic-alignment hashing for unsupervised cross-modal retrieval. In: Proceedings of the 2020 International Conference on Multimedia Retrieval, pp. 44–52 (2020). https://doi.org/10.1145/3372278.3390673

  17. Yu, H., Ding, S., Li, L., Wu, J.: Self-attentive clip hashing for unsupervised cross-modal retrieval. In: Proceedings of the 4th ACM International Conference on Multimedia in Asia, pp. 1–7 (2022). https://doi.org/10.1145/3551626.3564945

  18. Yu, J., Zhou, H., Zhan, Y., Tao, D.: Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 4626–4634 (2021). https://doi.org/10.1609/aaai.v35i5.16592

  19. Zhang, J., Peng, Y., Yuan, M.: Unsupervised generative adversarial cross-modal hashing. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018). https://doi.org/10.1609/aaai.v32i1.11263

  20. Zhang, P.F., Luo, Y., Huang, Z., Xu, X.S., Song, J.: High-order nonlocal hashing for unsupervised cross-modal retrieval. World Wide Web 24, 563–583 (2021)

    Article  Google Scholar 

  21. Zhou, J., Ding, G., Guo, Y.: Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 415–424 (2014). https://doi.org/10.1145/2600428.2609610

  22. Zhuo, Y., Li, Y., Hsiao, J., Ho, C., Li, B.: Clip4hashing: unsupervised deep hashing for cross-modal video-text retrieval. In: Proceedings of the 2022 International Conference on Multimedia Retrieval, pp. 158–166 (2022). https://doi.org/10.1145/3512527.3531381

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, M., Guo, Y., Fu, H., Li, Y., Su, H. (2024). Deep Consistency Preserving Network for Unsupervised Cross-Modal Hashing. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14425. Springer, Singapore. https://doi.org/10.1007/978-981-99-8429-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8429-9_19

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8428-2

  • Online ISBN: 978-981-99-8429-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics