research-article

Distribution Consistency Guided Hashing for Cross-Modal Retrieval

Authors:

Dezhong PengAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 5623 - 5632

https://doi.org/10.1145/3664647.3680633

Published: 28 October 2024 Publication History

Abstract

With the massive emergence of multi-modal data, cross-modal retrieval (CMR) has become one of the hot topics. Thanks to fast retrieval and efficient storage, cross-modal hashing (CMH) provides a feasible solution for large-scale multi-modal data. Previous CMH methods always directly learn common hash codes to fuse different modalities. Although they have obtained some success, there are still some limitations: 1) These approaches often prioritize reducing the heterogeneity in multi-modal data by learning consensus hash codes, yet they could sacrifice modality-specific information. 2) They frequently utilize pairwise similarities to guide hashing learning and neglect class distribution correlations. To overcome these two issues, we propose a novel Distribution Consistency Guided Hashing (DCGH) framework. Specifically, we first learn the modality-specific representation to extract the private discriminative information. Further, we learn consensus hash codes from the private representation by consensus hashing learning, thereby merging the specifics with consistency. Finally, we propose distribution consistency learning to guide hash codes following a similar class distribution principle between multi-modal data, thereby exploring more consistent information. Lots of experimental results on four benchmark datasets demonstrate the effectiveness of our DCGH on both fully paired and partially paired CMR tasks. The code can be available at: https://github.com/sunyuan-cs/2024-MM-DCGH.

References

[1]

Yong Chen, Hui Zhang, Zhibao Tian, Jun Wang, Dell Zhang, and Xuelong Li. 2022. Enhanced Discrete Multi-Modal Hashing: More Constraints Yet Less Time to Learn. IEEE Transactions on Knowledge and Data Engineering, Vol. 34, 3 (2022), 1177--1190. https://doi.org/10.1109/TKDE.2020.2995195

[2]

Miaomiao Cheng, Liping Jing, and Michael K Ng. 2020. Robust unsupervised cross-modal hashing for multimedia retrieval. ACM Transactions on Information Systems (TOIS), Vol. 38, 3 (2020), 1--25.

Digital Library

[3]

Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. Nus-wide: a real-world web image database from national university of singapore. In Proceedings of the ACM international conference on image and video retrieval. 1--9.

Digital Library

[4]

Hugo Jair Escalante, Carlos A Hernández, Jesus A Gonzalez, Aurelio López-López, Manuel Montes, Eduardo F Morales, L Enrique Sucar, Luis Villasenor, and Michael Grubinger. 2010. The segmented and annotated IAPR TC-12 benchmark. Computer vision and image understanding, Vol. 114, 4 (2010), 419--428.

[5]

Yixian Fang, Bin Li, Xiaozhou Li, and Yuwei Ren. 2021. Unsupervised cross-modal similarity via latent structure discrete hashing factorization. Knowledge-Based Systems, Vol. 218 (2021), 106857.

Digital Library

[6]

Peng Hu, Hongyuan Zhu, Jie Lin, Dezhong Peng, Yin-Ping Zhao, and Xi Peng. 2023. Unsupervised Contrastive Cross-Modal Hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, 3 (2023), 3877--3889.

[7]

Hua-Junjie Huang, Rui Yang, Chuan-Xiang Li, Yuliang Shi, Shanqing Guo, and Xin-Shun Xu. 2017. Supervised cross-modal hashing without relaxation. In 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1159--1164.

[8]

Mark J Huiskes and Michael S Lew. 2008. The mir flickr retrieval evaluation. In Proceedings of the 1st ACM international conference on Multimedia information retrieval. 39--43.

Digital Library

[9]

Rushi Lan, Yu Tan, Xiaoqin Wang, Zhenbing Liu, and Xiaonan Luo. 2022. Label Guided Discrete Hashing for Cross-Modal Retrieval. IEEE Transactions on Intelligent Transportation Systems, Vol. 23, 12 (2022), 25236--25248.

[10]

Huaxiong Li, Chao Zhang, Xiuyi Jia, Yang Gao, and Chunlin Chen. 2023. Adaptive Label Correlation Based Asymmetric Discrete Hashing for Cross-Modal Retrieval. IEEE Transactions on Knowledge and Data Engineering, Vol. 35, 2 (2023), 1185--1199.

[11]

Xingfeng Li, Yuangang Pan Pan, Yinghui Sun, Quansen Sun Sun, Ivor W. Tsang, and Zhenwen Ren. 2024. Fast Unpaired Multi-view Clustering. Proceedings of the 33rd International Joint Conference on Artificial Intelligence.

Digital Library

[12]

Xingfeng Li, Yinghui Sun, Quansen Sun, Zhenwen Ren, and Yuan Sun. 2023. Cross-view graph matching guided anchor alignment for incomplete multi-view clustering. Information Fusion, Vol. 100 (2023), 101941.

Digital Library

[13]

Xin Liu, Zhikai Hu, Haibin Ling, and Yiu-ming Cheung. 2019. MTFH: A matrix tri-factorization hashing framework for efficient cross-modal retrieval. IEEE transactions on pattern analysis and machine intelligence, Vol. 43, 3 (2019), 964--981.

[14]

Xin Liu, Xingzhi Wang, and Yiu-Ming Cheung. 2022. FDDH: Fast Discriminative Discrete Hashing for Large-Scale Cross-Modal Retrieval. IEEE Transactions on Neural Networks and Learning Systems, Vol. 33, 11 (2022), 6306--6320.

[15]

Kaiyi Luo, Chao Zhang, Huaxiong Li, Xiuyi Jia, and Chunlin Chen. 2023. Adaptive marginalized semantic hashing for unpaired cross-modal retrieval. IEEE Transactions on Multimedia (2023).

Digital Library

[16]

Nikhil Rasiwasia, Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Gert RG Lanckriet, Roger Levy, and Nuno Vasconcelos. 2010. A new approach to cross-modal multimedia retrieval. In Proceedings of the 18th ACM international conference on Multimedia. 251--260.

Digital Library

[17]

Heng Tao Shen, Luchen Liu, Yang Yang, Xing Xu, Zi Huang, Fumin Shen, and Richang Hong. 2020. Exploiting subspace relation in semantic labels for cross-modal hashing. IEEE Transactions on Knowledge and Data Engineering, Vol. 33, 10 (2020), 3351--3365.

[18]

Xiaobo Shen, Fumin Shen, Quan-Sen Sun, Yang Yang, Yun-Hao Yuan, and Heng Tao Shen. 2016. Semi-paired discrete hashing: Learning latent hash codes for semi-paired cross-view retrieval. IEEE transactions on cybernetics, Vol. 47, 12 (2016), 4275--4288.

[19]

Yufeng Shi, Yue Zhao, Xin Liu, Feng Zheng, Weihua Ou, Xinge You, and Qinmu Peng. 2022. Deep Adaptively-Enhanced Hashing With Discriminative Similarity Guidance for Unsupervised Cross-Modal Retrieval. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 10 (2022), 7255--7268.

[20]

Yuan Sun, Jian Dai, Zhenwen Ren, Yingke Chen, Dezhong Peng, and Peng Hu. 2024. Dual Self-Paced Cross-Modal Hashing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 15184--15192.

Digital Library

[21]

Yuan Sun, Jian Dai, Zhenwen Ren, Qilin Li, and Dezhong Peng. 2024. Relaxed Energy Preserving Hashing for Image Retrieval. IEEE Transactions on Intelligent Transportation Systems (2024).

Digital Library

[22]

Yuan Sun, Dezhong Peng, Haixiao Huang, and Zhenwen Ren. 2022. Feature and semantic views consensus hashing for image set classification. In Proceedings of the 30th ACM International conference on multimedia. 2097--2105.

Digital Library

[23]

Yuan Sun, Dezhong Peng, and Zhenwen Ren. 2024. Discrete aggregation hashing for image set classification. Expert Systems with Applications, Vol. 237 (2024), 121615.

Digital Library

[24]

Yuan Sun, Zhenwen Ren, Peng Hu, Dezhong Peng, and Xu Wang. 2024 d. Hierarchical Consensus Hashing for Cross-Modal Retrieval. IEEE Transactions on Multimedia, Vol. 26 (2024), 824--836. https://doi.org/10.1109/TMM.2023.3272169

Digital Library

[25]

Yuan Sun, Xu Wang, Dezhong Peng, Zhenwen Ren, and Xiaobo Shen. 2023. Hierarchical hashing learning for image set classification. IEEE Transactions on Image Processing, Vol. 32 (2023), 1732--1744.

Digital Library

[26]

Di Wang, Xinbo Gao, Xiumei Wang, and Lihuo He. 2018. Label consistent matrix factorization hashing for large-scale cross-modal similarity search. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 10 (2018), 2466--2479.

[27]

Dan Wang, Heyan Huang, Chi Lu, Bo-Si Feng, Guihua Wen, Liqiang Nie, and Xian-Ling Mao. 2018. Supervised deep hashing for hierarchical labeled data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[28]

Di Wang, Quan Wang, and Xinbo Gao. 2017. Robust and flexible discrete hashing for cross-modal similarity search. IEEE transactions on circuits and systems for video technology, Vol. 28, 10 (2017), 2703--2715.

[29]

Jingdong Wang, Ting Zhang, Nicu Sebe, Heng Tao Shen, et al. 2017. A survey on learning to hash. IEEE transactions on pattern analysis and machine intelligence, Vol. 40, 4 (2017), 769--790.

[30]

Kaiye Wang, Qiyue Yin, Wei Wang, Shu Wu, and Liang Wang. 2016. A comprehensive survey on cross-modal retrieval. arXiv preprint arXiv:1607.06215 (2016).

[31]

Lu Wang, Jie Yang, Masoumeh Zareapoor, and Zhonglong Zheng. 2021. Cluster-wise unsupervised hashing for cross-modal similarity search. Pattern Recognition, Vol. 111 (2021), 107732.

[32]

Lu Wang, Masoumeh Zareapoor, Jie Yang, and Zhonglong Zheng. 2021. Asymmetric correlation quantization hashing for cross-modal retrieval. IEEE Transactions on Multimedia, Vol. 24 (2021), 3665--3678.

[33]

Yongxin Wang, Zhen-Duo Chen, Xin Luo, Rui Li, and Xin-Shun Xu. 2022. Fast Cross-Modal Hashing With Global and Local Similarity Embedding. IEEE Transactions on Cybernetics, Vol. 52, 10 (2022), 10064--10077.

[34]

Yongxin Wang, Xin Luo, Liqiang Nie, Jingkuan Song, Wei Zhang, and Xin-Shun Xu. 2021. BATCH: A Scalable Asymmetric Discrete Cross-Modal Hashing. IEEE Transactions on Knowledge and Data Engineering, Vol. 33, 11 (2021), 3507--3519.

Digital Library

[35]

Liang Xie, Lei Zhu, and Guoqi Chen. 2016. Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval. Multimedia Tools and Applications, Vol. 75 (2016), 9185--9204.

Digital Library

[36]

Fan Yang, Xiaojian Ding, Yufeng Liu, Fumin Ma, and Jie Cao. 2022. Scalable semantic-enhanced supervised hashing for cross-modal retrieval. Knowledge-Based Systems, Vol. 251 (2022), 109176.

Digital Library

[37]

Xihong Yang, Jin Jiaqi, Siwei Wang, Ke Liang, Yue Liu, Yi Wen, Suyuan Liu, Sihang Zhou, Xinwang Liu, and En Zhu. 2023. Dealmvc: Dual contrastive calibration for multi-view clustering. In Proceedings of the 31st ACM International Conference on Multimedia. 337--346.

Digital Library

[38]

Zhan Yang, Xiyin Deng, Lin Guo, and Jun Long. 2023. Asymmetric Supervised Fusion-Oriented Hashing for Cross-Modal Retrieval. IEEE Transactions on Cybernetics (2023).

[39]

Hong-Lei Yao, Yu-Wei Zhan, Zhen-Duo Chen, Xin Luo, and Xin-Shun Xu. 2021. Teach: attention-aware deep cross-modal hashing. In Proceedings of the 2021 International Conference on Multimedia Retrieval. 376--384.

Digital Library

[40]

Tao Yao, Xiangwei Kong, Haiyan Fu, and Qi Tian. 2016. Semantic consistency hashing for cross-modal retrieval. Neurocomputing, Vol. 193 (2016), 250--259.

Digital Library

[41]

Chao Zhang, Huaxiong Li, Yang Gao, and Chunlin Chen. 2023. Weakly-Supervised Enhanced Semantic-Aware Hashing for Cross-Modal Retrieval. IEEE Transactions on Knowledge and Data Engineering, Vol. 35, 6 (2023), 6475--6488.

Digital Library

[42]

Chengyuan Zhang, Zhi Zhong, Lei Zhu, Shichao Zhang, Da Cao, and Jianfeng Zhang. 2021. M2guda: Multi-metrics graph-based unsupervised domain adaptation for cross-modal Hashing. In Proceedings of the 2021 International Conference on Multimedia Retrieval. 674--681.

Digital Library

[43]

Donglin Zhang, Xiao-Jun Wu, Tianyang Xu, and He-Feng Yin. 2023. DAH: Discrete Asymmetric Hashing for Efficient Cross-Media Retrieval. IEEE Transactions on Knowledge and Data Engineering, Vol. 35, 2 (2023), 1365--1378.

[44]

Xiang Zhang, Guohua Dong, Yimo Du, Chengkun Wu, Zhigang Luo, and Canqun Yang. 2018. Collaborative subspace graph hashing for cross-modal retrieval. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval. 213--221.

Digital Library

[45]

Zheng Zhang, Zhihui Lai, Zi Huang, Wai Keung Wong, Guo-Sen Xie, Li Liu, and Ling Shao. 2019. Scalable supervised asymmetric hashing with semantic and latent factor embedding. IEEE Transactions on Image Processing, Vol. 28, 10 (2019), 4803--4818.

[46]

Zheng Zhang, Jianning Wang, Lei Zhu, Yadan Luo, and Guangming Lu. 2023. Deep collaborative graph hashing for discriminative image retrieval. Pattern Recognition, Vol. 139 (2023), 109462.

Digital Library

[47]

Lei Zhu, Xize Wu, Jingjing Li, Zheng Zhang, Weili Guan, and Heng Tao Shen. 2023. Work Together: Correlation-Identity Reconstruction Hashing for Unsupervised Cross-Modal Retrieval. IEEE Transactions on Knowledge and Data Engineering, Vol. 35, 9 (2023), 8838--8851.

Digital Library

[48]

Lei Zhu, Chaoqun Zheng, Weili Guan, Jingjing Li, Yang Yang, and Heng Tao Shen. 2023. Multi-modal Hashing for Efficient Multimedia Retrieval: A Survey. IEEE Transactions on Knowledge and Data Engineering (2023).

Cited By

Li BWu YLi Z(2025)Efficient Parameter-free Adaptive Hashing for Large-Scale Cross-Modal RetrievalInternational Journal of Approximate Reasoning10.1016/j.ijar.2025.109383(109383)Online publication date: Feb-2025
https://doi.org/10.1016/j.ijar.2025.109383

Index Terms

Distribution Consistency Guided Hashing for Cross-Modal Retrieval
1. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Unsupervised learning and clustering

Recommendations

Label Distribution Guided Hashing for Cross-Modal Retrieval
Hashing methods have recently attracted extensive attention in cross-modal retrieval. Most supervised hashing methods attempt to preserve the semantic information into hash codes by leveraging the original logical label matrix. However, they generally ...
Self-supervised incomplete cross-modal hashing retrieval
Abstract
Benefiting from fast retrieval speed and low storage costs, cross-modal hashing retrieval has become a widely-used approximate nearest-neighbor technique in large-scale data retrieval. Most existing cross-modal hashing methods assume that the ...
Highlights
- Propose a novel unsupervised cross-modal hashing retrieval on incomplete cross-modal data.
- Self-supervised semantic mining for refining pseudo-label semantic information.
- A data recovery network for recovering missing data.
- An ...
Discrete Fusion Adversarial Hashing for cross-modal retrieval
Abstract
Deep cross-modal hashing enables a flexible and efficient way for large-scale cross-modal retrieval. Existing cross-modal retrieval methods based on deep hashing aim to learn the unified hashing representation for different modalities ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Sichuan Science and Technology Program

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
214
Total Downloads

Downloads (Last 12 months)214
Downloads (Last 6 weeks)88

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li BWu YLi Z(2025)Efficient Parameter-free Adaptive Hashing for Large-Scale Cross-Modal RetrievalInternational Journal of Approximate Reasoning10.1016/j.ijar.2025.109383(109383)Online publication date: Feb-2025
https://doi.org/10.1016/j.ijar.2025.109383

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten