Skip to main content

Joint Modal Heterogeneous Balance Hashing for Unsupervised Cross-Modal Retrieval

  • Conference paper
  • First Online:
Pattern Recognition (ICPR 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15326))

Included in the following conference series:

  • 243 Accesses

Abstract

Existing cross-modal hashing methods have made progress in enhancing retrieval capabilities and reducing model size, but they struggle to balance retrieval performance across different channels, leading to increased robustness.These methods often show low integration of multi-channel semantic information and fail to address image-text heterogeneity balance, focusing solely on enhancing retrieval accuracy, which can lead to high model robustness issues. We propose the Joint Modal Heterogeneous Balance Hashing for Unsupervised Cross-Modal Retrieval (JMBH) to address this. We utilise the large model CLIP to process raw data, facilitating multi-channel semantic integration. We then design multi-channel fusion modalities to explore co-occurrence information across channels and develop intra- and inter-channel constraints to mine this information. Extensive experiments on three datasets validate JMBH’s effectiveness in balancing image-text heterogeneity and reducing robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cai, H., Zhang, B., Li, J., Hu, B., Chen, J.: Unsupervised dual hashing coding (udc) on semantic tagging and sample content for cross-modal retrieval. IEEE Transactions on Multimedia (2024)

    Google Scholar 

  2. Ding, G., Guo, Y., Zhou, J.: Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2075–2082 (2014)

    Google Scholar 

  3. Liang, M., Du, J., Liang, Z., Xing, Y., Huang, W., Xue, Z.: Self-supervised multi-modal knowledge graph contrastive hashing for cross-modal search. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 13744–13753 (2024)

    Google Scholar 

  4. Liang, X., Yang, E., Yang, Y., Deng, C.: Multi-relational deep hashing for cross-modal search. IEEE Transactions on Image Processing (2024)

    Google Scholar 

  5. Liu, S., Qian, S., Guan, Y., Zhan, J., Ying, L.: Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. pp. 1379–1388 (2020)

    Google Scholar 

  6. Liu, X., Li, J., Nie, X., Zhang, X., Wang, S., Yin, Y.: Scalable unsupervised hashing via exploiting robust cross-modal consistency. IEEE Transactions on Big Data (2024)

    Google Scholar 

  7. Ma, H., Zhao, H., Lin, Z., Kale, A., Wang, Z., Yu, T., Gu, J., Choudhary, S., Xie, X.: Ei-clip: Entity-aware interventional contrastive learning for e-commerce cross-modal retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 18051–18061 (2022)

    Google Scholar 

  8. Mikriukov, G., Ravanbakhsh, M., Demir, B.: Deep unsupervised contrastive hashing for large-scale cross-modal text-image retrieval in remote sensing. arXiv preprint arXiv:2201.08125 (2022)

  9. Mingyong, L., Yewen, L., Mingyuan, G., Longfei, M.: Clip-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval. International Journal of Multimedia Information Retrieval 12(1), 2 (2023)

    Article  Google Scholar 

  10. Shi, Y., Zhao, Y., Liu, X., Zheng, F., Ou, W., You, X., Peng, Q.: Deep adaptively-enhanced hashing with discriminative similarity guidance for unsupervised cross-modal retrieval. IEEE Trans. Circuits Syst. Video Technol. 32(10), 7255–7268 (2022)

    Article  MATH  Google Scholar 

  11. Song, J., Yang, Y., Yang, Y., Huang, Z., Shen, H.T.: Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data. pp. 785–796 (2013)

    Google Scholar 

  12. Su, S., Zhong, Z., Zhang, C.: Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 3027–3035 (2019)

    Google Scholar 

  13. Sun, Y., Dai, J., Ren, Z., Chen, Y., Peng, D., Hu, P.: Dual self-paced cross-modal hashing. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 15184–15192 (2024)

    Google Scholar 

  14. Sun, Y., Dai, J., Ren, Z., Li, Q., Peng, D.: Relaxed energy preserving hashing for image retrieval. IEEE Transactions on Intelligent Transportation Systems (2024)

    Google Scholar 

  15. Sun, Y., Xue, H., Song, R., Liu, B., Yang, H., Fu, J.: Long-form video-language pre-training with multimodal temporal contrastive learning. Adv. Neural. Inf. Process. Syst. 35, 38032–38045 (2022)

    Google Scholar 

  16. Tu, R.C., Jiang, J., Lin, Q., Cai, C., Tian, S., Wang, H., Liu, W.: Unsupervised cross-modal hashing with modality-interaction. IEEE Transactions on Circuits and Systems for Video Technology (2023)

    Google Scholar 

  17. Wang, D., Gao, X., Wang, X., He, L.: Semantic topic multimodal hashing for cross-media retrieval. In: Twenty-fourth international joint conference on artificial intelligence (2015)

    Google Scholar 

  18. Wang, D., Wang, Q., Gao, X.: Robust and flexible discrete hashing for cross-modal similarity search. IEEE Trans. Circuits Syst. Video Technol. 28(10), 2703–2715 (2017)

    Article  MATH  Google Scholar 

  19. Wang, D., Wang, Q., He, L., Gao, X., Tian, Y.: Joint and individual matrix factorization hashing for large-scale cross-modal retrieval. Pattern Recogn. 107, 107479 (2020)

    Article  MATH  Google Scholar 

  20. Xia, X., Dong, G., Li, F., Zhu, L., Ying, X.: When clip meets cross-modal hashing retrieval: A new strong baseline. Information Fusion 100, 101968 (2023)

    Article  Google Scholar 

  21. Yang, D., Wu, D., Zhang, W., Zhang, H., Li, B., Wang, W.: Deep semantic-alignment hashing for unsupervised cross-modal retrieval. In: Proceedings of the 2020 international conference on multimedia retrieval. pp. 44–52 (2020)

    Google Scholar 

  22. Yu, H., Ding, S., Li, L., Wu, J.: Self-attentive clip hashing for unsupervised cross-modal retrieval. In: Proceedings of the 4th ACM International Conference on Multimedia in Asia. pp. 1–7 (2022)

    Google Scholar 

  23. Yu, J., Zhou, H., Zhan, Y., Tao, D.: Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing. In: Proceedings of the AAAI conference on artificial intelligence. vol. 35, pp. 4626–4634 (2021)

    Google Scholar 

  24. Zhang, J., Peng, Y.: Multi-pathway generative adversarial hashing for unsupervised cross-modal retrieval. IEEE Trans. Multimedia 22(1), 174–187 (2019)

    Article  MATH  Google Scholar 

  25. Zhang, P.F., Li, Y., Huang, Z., Xu, X.S.: Aggregation-based graph convolutional hashing for unsupervised cross-modal retrieval. IEEE Trans. Multimedia 24, 466–479 (2021)

    Article  MATH  Google Scholar 

  26. Zhang, P.F., Luo, Y., Huang, Z., Xu, X.S., Song, J.: High-order nonlocal hashing for unsupervised cross-modal retrieval. World Wide Web 24, 563–583 (2021)

    Article  MATH  Google Scholar 

  27. Zhou, J., Ding, G., Guo, Y.: Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. pp. 415–424 (2014)

    Google Scholar 

  28. Zhu, L., Wu, X., Li, J., Zhang, Z., Guan, W., Shen, H.T.: Work together: correlation-identity reconstruction hashing for unsupervised cross-modal retrieval. IEEE Transactions on Knowledge and Data Engineering (2022)

    Google Scholar 

  29. Zhuo, Y., Li, Y., Hsiao, J., Ho, C., Li, B.: Clip4hashing: Unsupervised deep hashing for cross-modal video-text retrieval. In: Proceedings of the 2022 international conference on multimedia retrieval. pp. 158–166 (2022)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Chongqing social science planning project(Grant No. 2023BS085).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingyong Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, J., Li, M. (2025). Joint Modal Heterogeneous Balance Hashing for Unsupervised Cross-Modal Retrieval. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15326. Springer, Cham. https://doi.org/10.1007/978-3-031-78395-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-78395-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-78394-4

  • Online ISBN: 978-3-031-78395-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics