Abstract
Existing cross-modal hashing methods have made progress in enhancing retrieval capabilities and reducing model size, but they struggle to balance retrieval performance across different channels, leading to increased robustness.These methods often show low integration of multi-channel semantic information and fail to address image-text heterogeneity balance, focusing solely on enhancing retrieval accuracy, which can lead to high model robustness issues. We propose the Joint Modal Heterogeneous Balance Hashing for Unsupervised Cross-Modal Retrieval (JMBH) to address this. We utilise the large model CLIP to process raw data, facilitating multi-channel semantic integration. We then design multi-channel fusion modalities to explore co-occurrence information across channels and develop intra- and inter-channel constraints to mine this information. Extensive experiments on three datasets validate JMBH’s effectiveness in balancing image-text heterogeneity and reducing robustness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cai, H., Zhang, B., Li, J., Hu, B., Chen, J.: Unsupervised dual hashing coding (udc) on semantic tagging and sample content for cross-modal retrieval. IEEE Transactions on Multimedia (2024)
Ding, G., Guo, Y., Zhou, J.: Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2075–2082 (2014)
Liang, M., Du, J., Liang, Z., Xing, Y., Huang, W., Xue, Z.: Self-supervised multi-modal knowledge graph contrastive hashing for cross-modal search. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 13744–13753 (2024)
Liang, X., Yang, E., Yang, Y., Deng, C.: Multi-relational deep hashing for cross-modal search. IEEE Transactions on Image Processing (2024)
Liu, S., Qian, S., Guan, Y., Zhan, J., Ying, L.: Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. pp. 1379–1388 (2020)
Liu, X., Li, J., Nie, X., Zhang, X., Wang, S., Yin, Y.: Scalable unsupervised hashing via exploiting robust cross-modal consistency. IEEE Transactions on Big Data (2024)
Ma, H., Zhao, H., Lin, Z., Kale, A., Wang, Z., Yu, T., Gu, J., Choudhary, S., Xie, X.: Ei-clip: Entity-aware interventional contrastive learning for e-commerce cross-modal retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 18051–18061 (2022)
Mikriukov, G., Ravanbakhsh, M., Demir, B.: Deep unsupervised contrastive hashing for large-scale cross-modal text-image retrieval in remote sensing. arXiv preprint arXiv:2201.08125 (2022)
Mingyong, L., Yewen, L., Mingyuan, G., Longfei, M.: Clip-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval. International Journal of Multimedia Information Retrieval 12(1), 2 (2023)
Shi, Y., Zhao, Y., Liu, X., Zheng, F., Ou, W., You, X., Peng, Q.: Deep adaptively-enhanced hashing with discriminative similarity guidance for unsupervised cross-modal retrieval. IEEE Trans. Circuits Syst. Video Technol. 32(10), 7255–7268 (2022)
Song, J., Yang, Y., Yang, Y., Huang, Z., Shen, H.T.: Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data. pp. 785–796 (2013)
Su, S., Zhong, Z., Zhang, C.: Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 3027–3035 (2019)
Sun, Y., Dai, J., Ren, Z., Chen, Y., Peng, D., Hu, P.: Dual self-paced cross-modal hashing. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 15184–15192 (2024)
Sun, Y., Dai, J., Ren, Z., Li, Q., Peng, D.: Relaxed energy preserving hashing for image retrieval. IEEE Transactions on Intelligent Transportation Systems (2024)
Sun, Y., Xue, H., Song, R., Liu, B., Yang, H., Fu, J.: Long-form video-language pre-training with multimodal temporal contrastive learning. Adv. Neural. Inf. Process. Syst. 35, 38032–38045 (2022)
Tu, R.C., Jiang, J., Lin, Q., Cai, C., Tian, S., Wang, H., Liu, W.: Unsupervised cross-modal hashing with modality-interaction. IEEE Transactions on Circuits and Systems for Video Technology (2023)
Wang, D., Gao, X., Wang, X., He, L.: Semantic topic multimodal hashing for cross-media retrieval. In: Twenty-fourth international joint conference on artificial intelligence (2015)
Wang, D., Wang, Q., Gao, X.: Robust and flexible discrete hashing for cross-modal similarity search. IEEE Trans. Circuits Syst. Video Technol. 28(10), 2703–2715 (2017)
Wang, D., Wang, Q., He, L., Gao, X., Tian, Y.: Joint and individual matrix factorization hashing for large-scale cross-modal retrieval. Pattern Recogn. 107, 107479 (2020)
Xia, X., Dong, G., Li, F., Zhu, L., Ying, X.: When clip meets cross-modal hashing retrieval: A new strong baseline. Information Fusion 100, 101968 (2023)
Yang, D., Wu, D., Zhang, W., Zhang, H., Li, B., Wang, W.: Deep semantic-alignment hashing for unsupervised cross-modal retrieval. In: Proceedings of the 2020 international conference on multimedia retrieval. pp. 44–52 (2020)
Yu, H., Ding, S., Li, L., Wu, J.: Self-attentive clip hashing for unsupervised cross-modal retrieval. In: Proceedings of the 4th ACM International Conference on Multimedia in Asia. pp. 1–7 (2022)
Yu, J., Zhou, H., Zhan, Y., Tao, D.: Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing. In: Proceedings of the AAAI conference on artificial intelligence. vol. 35, pp. 4626–4634 (2021)
Zhang, J., Peng, Y.: Multi-pathway generative adversarial hashing for unsupervised cross-modal retrieval. IEEE Trans. Multimedia 22(1), 174–187 (2019)
Zhang, P.F., Li, Y., Huang, Z., Xu, X.S.: Aggregation-based graph convolutional hashing for unsupervised cross-modal retrieval. IEEE Trans. Multimedia 24, 466–479 (2021)
Zhang, P.F., Luo, Y., Huang, Z., Xu, X.S., Song, J.: High-order nonlocal hashing for unsupervised cross-modal retrieval. World Wide Web 24, 563–583 (2021)
Zhou, J., Ding, G., Guo, Y.: Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. pp. 415–424 (2014)
Zhu, L., Wu, X., Li, J., Zhang, Z., Guan, W., Shen, H.T.: Work together: correlation-identity reconstruction hashing for unsupervised cross-modal retrieval. IEEE Transactions on Knowledge and Data Engineering (2022)
Zhuo, Y., Li, Y., Hsiao, J., Ho, C., Li, B.: Clip4hashing: Unsupervised deep hashing for cross-modal video-text retrieval. In: Proceedings of the 2022 international conference on multimedia retrieval. pp. 158–166 (2022)
Acknowledgements
This work was supported by the Chongqing social science planning project(Grant No. 2023BS085).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, J., Li, M. (2025). Joint Modal Heterogeneous Balance Hashing for Unsupervised Cross-Modal Retrieval. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15326. Springer, Cham. https://doi.org/10.1007/978-3-031-78395-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-78395-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78394-4
Online ISBN: 978-3-031-78395-1
eBook Packages: Computer ScienceComputer Science (R0)