skip to main content
10.1145/3664647.3680633acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Distribution Consistency Guided Hashing for Cross-Modal Retrieval

Published: 28 October 2024 Publication History

Abstract

With the massive emergence of multi-modal data, cross-modal retrieval (CMR) has become one of the hot topics. Thanks to fast retrieval and efficient storage, cross-modal hashing (CMH) provides a feasible solution for large-scale multi-modal data. Previous CMH methods always directly learn common hash codes to fuse different modalities. Although they have obtained some success, there are still some limitations: 1) These approaches often prioritize reducing the heterogeneity in multi-modal data by learning consensus hash codes, yet they could sacrifice modality-specific information. 2) They frequently utilize pairwise similarities to guide hashing learning and neglect class distribution correlations. To overcome these two issues, we propose a novel Distribution Consistency Guided Hashing (DCGH) framework. Specifically, we first learn the modality-specific representation to extract the private discriminative information. Further, we learn consensus hash codes from the private representation by consensus hashing learning, thereby merging the specifics with consistency. Finally, we propose distribution consistency learning to guide hash codes following a similar class distribution principle between multi-modal data, thereby exploring more consistent information. Lots of experimental results on four benchmark datasets demonstrate the effectiveness of our DCGH on both fully paired and partially paired CMR tasks. The code can be available at: https://github.com/sunyuan-cs/2024-MM-DCGH.

References

[1]
Yong Chen, Hui Zhang, Zhibao Tian, Jun Wang, Dell Zhang, and Xuelong Li. 2022. Enhanced Discrete Multi-Modal Hashing: More Constraints Yet Less Time to Learn. IEEE Transactions on Knowledge and Data Engineering, Vol. 34, 3 (2022), 1177--1190. https://doi.org/10.1109/TKDE.2020.2995195
[2]
Miaomiao Cheng, Liping Jing, and Michael K Ng. 2020. Robust unsupervised cross-modal hashing for multimedia retrieval. ACM Transactions on Information Systems (TOIS), Vol. 38, 3 (2020), 1--25.
[3]
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. Nus-wide: a real-world web image database from national university of singapore. In Proceedings of the ACM international conference on image and video retrieval. 1--9.
[4]
Hugo Jair Escalante, Carlos A Hernández, Jesus A Gonzalez, Aurelio López-López, Manuel Montes, Eduardo F Morales, L Enrique Sucar, Luis Villasenor, and Michael Grubinger. 2010. The segmented and annotated IAPR TC-12 benchmark. Computer vision and image understanding, Vol. 114, 4 (2010), 419--428.
[5]
Yixian Fang, Bin Li, Xiaozhou Li, and Yuwei Ren. 2021. Unsupervised cross-modal similarity via latent structure discrete hashing factorization. Knowledge-Based Systems, Vol. 218 (2021), 106857.
[6]
Peng Hu, Hongyuan Zhu, Jie Lin, Dezhong Peng, Yin-Ping Zhao, and Xi Peng. 2023. Unsupervised Contrastive Cross-Modal Hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, 3 (2023), 3877--3889.
[7]
Hua-Junjie Huang, Rui Yang, Chuan-Xiang Li, Yuliang Shi, Shanqing Guo, and Xin-Shun Xu. 2017. Supervised cross-modal hashing without relaxation. In 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1159--1164.
[8]
Mark J Huiskes and Michael S Lew. 2008. The mir flickr retrieval evaluation. In Proceedings of the 1st ACM international conference on Multimedia information retrieval. 39--43.
[9]
Rushi Lan, Yu Tan, Xiaoqin Wang, Zhenbing Liu, and Xiaonan Luo. 2022. Label Guided Discrete Hashing for Cross-Modal Retrieval. IEEE Transactions on Intelligent Transportation Systems, Vol. 23, 12 (2022), 25236--25248.
[10]
Huaxiong Li, Chao Zhang, Xiuyi Jia, Yang Gao, and Chunlin Chen. 2023. Adaptive Label Correlation Based Asymmetric Discrete Hashing for Cross-Modal Retrieval. IEEE Transactions on Knowledge and Data Engineering, Vol. 35, 2 (2023), 1185--1199.
[11]
Xingfeng Li, Yuangang Pan Pan, Yinghui Sun, Quansen Sun Sun, Ivor W. Tsang, and Zhenwen Ren. 2024. Fast Unpaired Multi-view Clustering. Proceedings of the 33rd International Joint Conference on Artificial Intelligence.
[12]
Xingfeng Li, Yinghui Sun, Quansen Sun, Zhenwen Ren, and Yuan Sun. 2023. Cross-view graph matching guided anchor alignment for incomplete multi-view clustering. Information Fusion, Vol. 100 (2023), 101941.
[13]
Xin Liu, Zhikai Hu, Haibin Ling, and Yiu-ming Cheung. 2019. MTFH: A matrix tri-factorization hashing framework for efficient cross-modal retrieval. IEEE transactions on pattern analysis and machine intelligence, Vol. 43, 3 (2019), 964--981.
[14]
Xin Liu, Xingzhi Wang, and Yiu-Ming Cheung. 2022. FDDH: Fast Discriminative Discrete Hashing for Large-Scale Cross-Modal Retrieval. IEEE Transactions on Neural Networks and Learning Systems, Vol. 33, 11 (2022), 6306--6320.
[15]
Kaiyi Luo, Chao Zhang, Huaxiong Li, Xiuyi Jia, and Chunlin Chen. 2023. Adaptive marginalized semantic hashing for unpaired cross-modal retrieval. IEEE Transactions on Multimedia (2023).
[16]
Nikhil Rasiwasia, Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Gert RG Lanckriet, Roger Levy, and Nuno Vasconcelos. 2010. A new approach to cross-modal multimedia retrieval. In Proceedings of the 18th ACM international conference on Multimedia. 251--260.
[17]
Heng Tao Shen, Luchen Liu, Yang Yang, Xing Xu, Zi Huang, Fumin Shen, and Richang Hong. 2020. Exploiting subspace relation in semantic labels for cross-modal hashing. IEEE Transactions on Knowledge and Data Engineering, Vol. 33, 10 (2020), 3351--3365.
[18]
Xiaobo Shen, Fumin Shen, Quan-Sen Sun, Yang Yang, Yun-Hao Yuan, and Heng Tao Shen. 2016. Semi-paired discrete hashing: Learning latent hash codes for semi-paired cross-view retrieval. IEEE transactions on cybernetics, Vol. 47, 12 (2016), 4275--4288.
[19]
Yufeng Shi, Yue Zhao, Xin Liu, Feng Zheng, Weihua Ou, Xinge You, and Qinmu Peng. 2022. Deep Adaptively-Enhanced Hashing With Discriminative Similarity Guidance for Unsupervised Cross-Modal Retrieval. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 10 (2022), 7255--7268.
[20]
Yuan Sun, Jian Dai, Zhenwen Ren, Yingke Chen, Dezhong Peng, and Peng Hu. 2024. Dual Self-Paced Cross-Modal Hashing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 15184--15192.
[21]
Yuan Sun, Jian Dai, Zhenwen Ren, Qilin Li, and Dezhong Peng. 2024. Relaxed Energy Preserving Hashing for Image Retrieval. IEEE Transactions on Intelligent Transportation Systems (2024).
[22]
Yuan Sun, Dezhong Peng, Haixiao Huang, and Zhenwen Ren. 2022. Feature and semantic views consensus hashing for image set classification. In Proceedings of the 30th ACM International conference on multimedia. 2097--2105.
[23]
Yuan Sun, Dezhong Peng, and Zhenwen Ren. 2024. Discrete aggregation hashing for image set classification. Expert Systems with Applications, Vol. 237 (2024), 121615.
[24]
Yuan Sun, Zhenwen Ren, Peng Hu, Dezhong Peng, and Xu Wang. 2024 d. Hierarchical Consensus Hashing for Cross-Modal Retrieval. IEEE Transactions on Multimedia, Vol. 26 (2024), 824--836. https://doi.org/10.1109/TMM.2023.3272169
[25]
Yuan Sun, Xu Wang, Dezhong Peng, Zhenwen Ren, and Xiaobo Shen. 2023. Hierarchical hashing learning for image set classification. IEEE Transactions on Image Processing, Vol. 32 (2023), 1732--1744.
[26]
Di Wang, Xinbo Gao, Xiumei Wang, and Lihuo He. 2018. Label consistent matrix factorization hashing for large-scale cross-modal similarity search. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 10 (2018), 2466--2479.
[27]
Dan Wang, Heyan Huang, Chi Lu, Bo-Si Feng, Guihua Wen, Liqiang Nie, and Xian-Ling Mao. 2018. Supervised deep hashing for hierarchical labeled data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[28]
Di Wang, Quan Wang, and Xinbo Gao. 2017. Robust and flexible discrete hashing for cross-modal similarity search. IEEE transactions on circuits and systems for video technology, Vol. 28, 10 (2017), 2703--2715.
[29]
Jingdong Wang, Ting Zhang, Nicu Sebe, Heng Tao Shen, et al. 2017. A survey on learning to hash. IEEE transactions on pattern analysis and machine intelligence, Vol. 40, 4 (2017), 769--790.
[30]
Kaiye Wang, Qiyue Yin, Wei Wang, Shu Wu, and Liang Wang. 2016. A comprehensive survey on cross-modal retrieval. arXiv preprint arXiv:1607.06215 (2016).
[31]
Lu Wang, Jie Yang, Masoumeh Zareapoor, and Zhonglong Zheng. 2021. Cluster-wise unsupervised hashing for cross-modal similarity search. Pattern Recognition, Vol. 111 (2021), 107732.
[32]
Lu Wang, Masoumeh Zareapoor, Jie Yang, and Zhonglong Zheng. 2021. Asymmetric correlation quantization hashing for cross-modal retrieval. IEEE Transactions on Multimedia, Vol. 24 (2021), 3665--3678.
[33]
Yongxin Wang, Zhen-Duo Chen, Xin Luo, Rui Li, and Xin-Shun Xu. 2022. Fast Cross-Modal Hashing With Global and Local Similarity Embedding. IEEE Transactions on Cybernetics, Vol. 52, 10 (2022), 10064--10077.
[34]
Yongxin Wang, Xin Luo, Liqiang Nie, Jingkuan Song, Wei Zhang, and Xin-Shun Xu. 2021. BATCH: A Scalable Asymmetric Discrete Cross-Modal Hashing. IEEE Transactions on Knowledge and Data Engineering, Vol. 33, 11 (2021), 3507--3519.
[35]
Liang Xie, Lei Zhu, and Guoqi Chen. 2016. Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval. Multimedia Tools and Applications, Vol. 75 (2016), 9185--9204.
[36]
Fan Yang, Xiaojian Ding, Yufeng Liu, Fumin Ma, and Jie Cao. 2022. Scalable semantic-enhanced supervised hashing for cross-modal retrieval. Knowledge-Based Systems, Vol. 251 (2022), 109176.
[37]
Xihong Yang, Jin Jiaqi, Siwei Wang, Ke Liang, Yue Liu, Yi Wen, Suyuan Liu, Sihang Zhou, Xinwang Liu, and En Zhu. 2023. Dealmvc: Dual contrastive calibration for multi-view clustering. In Proceedings of the 31st ACM International Conference on Multimedia. 337--346.
[38]
Zhan Yang, Xiyin Deng, Lin Guo, and Jun Long. 2023. Asymmetric Supervised Fusion-Oriented Hashing for Cross-Modal Retrieval. IEEE Transactions on Cybernetics (2023).
[39]
Hong-Lei Yao, Yu-Wei Zhan, Zhen-Duo Chen, Xin Luo, and Xin-Shun Xu. 2021. Teach: attention-aware deep cross-modal hashing. In Proceedings of the 2021 International Conference on Multimedia Retrieval. 376--384.
[40]
Tao Yao, Xiangwei Kong, Haiyan Fu, and Qi Tian. 2016. Semantic consistency hashing for cross-modal retrieval. Neurocomputing, Vol. 193 (2016), 250--259.
[41]
Chao Zhang, Huaxiong Li, Yang Gao, and Chunlin Chen. 2023. Weakly-Supervised Enhanced Semantic-Aware Hashing for Cross-Modal Retrieval. IEEE Transactions on Knowledge and Data Engineering, Vol. 35, 6 (2023), 6475--6488.
[42]
Chengyuan Zhang, Zhi Zhong, Lei Zhu, Shichao Zhang, Da Cao, and Jianfeng Zhang. 2021. M2guda: Multi-metrics graph-based unsupervised domain adaptation for cross-modal Hashing. In Proceedings of the 2021 International Conference on Multimedia Retrieval. 674--681.
[43]
Donglin Zhang, Xiao-Jun Wu, Tianyang Xu, and He-Feng Yin. 2023. DAH: Discrete Asymmetric Hashing for Efficient Cross-Media Retrieval. IEEE Transactions on Knowledge and Data Engineering, Vol. 35, 2 (2023), 1365--1378.
[44]
Xiang Zhang, Guohua Dong, Yimo Du, Chengkun Wu, Zhigang Luo, and Canqun Yang. 2018. Collaborative subspace graph hashing for cross-modal retrieval. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval. 213--221.
[45]
Zheng Zhang, Zhihui Lai, Zi Huang, Wai Keung Wong, Guo-Sen Xie, Li Liu, and Ling Shao. 2019. Scalable supervised asymmetric hashing with semantic and latent factor embedding. IEEE Transactions on Image Processing, Vol. 28, 10 (2019), 4803--4818.
[46]
Zheng Zhang, Jianning Wang, Lei Zhu, Yadan Luo, and Guangming Lu. 2023. Deep collaborative graph hashing for discriminative image retrieval. Pattern Recognition, Vol. 139 (2023), 109462.
[47]
Lei Zhu, Xize Wu, Jingjing Li, Zheng Zhang, Weili Guan, and Heng Tao Shen. 2023. Work Together: Correlation-Identity Reconstruction Hashing for Unsupervised Cross-Modal Retrieval. IEEE Transactions on Knowledge and Data Engineering, Vol. 35, 9 (2023), 8838--8851.
[48]
Lei Zhu, Chaoqun Zheng, Weili Guan, Jingjing Li, Yang Yang, and Heng Tao Shen. 2023. Multi-modal Hashing for Efficient Multimedia Retrieval: A Survey. IEEE Transactions on Knowledge and Data Engineering (2023).

Cited By

View all
  • (2025)Efficient Parameter-free Adaptive Hashing for Large-Scale Cross-Modal RetrievalInternational Journal of Approximate Reasoning10.1016/j.ijar.2025.109383(109383)Online publication date: Feb-2025

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
October 2024
11719 pages
ISBN:9798400706868
DOI:10.1145/3664647
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cross-modal retrieval
  2. distribution consistency
  3. hashing learning.

Qualifiers

  • Research-article

Funding Sources

Conference

MM '24
Sponsor:
MM '24: The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)214
  • Downloads (Last 6 weeks)88
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Efficient Parameter-free Adaptive Hashing for Large-Scale Cross-Modal RetrievalInternational Journal of Approximate Reasoning10.1016/j.ijar.2025.109383(109383)Online publication date: Feb-2025

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media