research-article

Data-Aware Proxy Hashing for Cross-modal Retrieval

Authors:

Heyan HuangAuthors Info & Claims

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 686 - 696

https://doi.org/10.1145/3539618.3591660

Published: 18 July 2023 Publication History

Abstract

Recently, numerous proxy hash code based methods, which sufficiently exploit the label information of data to supervise the training of hashing models, have been proposed. Although these methods have made impressive progress, their generating processes of proxy hash codes are based only on the class information of the dataset or labels of data but do not take the data themselves into account. Therefore, these methods will probably generate some inappropriate proxy hash codes, thus damaging the retrieval performance of the hash models. To solve the aforementioned problem, we propose a novel Data-Aware Proxy Hashing for cross-modal retrieval, called DAPH. Specifically, our proposed method first train a data-aware proxy network that takes the data points, label vectors of data, and the class vectors of the dataset as inputs to generate class-based data-aware proxy hash codes, label-fused image-aware proxy hash codes and label-fused text-aware proxy hash codes. Then, we propose a novel hash loss that exploits the three types of data-aware proxy hash codes to supervise the training of modality-specific hashing networks. After training, DAPH is able to generate discriminate hash codes with the semantic information preserved adequately. Extensive experiments on three benchmark datasets show that the proposed DAPH outperforms the state-of-the-art baselines in cross-modal retrieval tasks.

References

[1]

Yue Cao, Bin Liu, Mingsheng Long, and Jianmin Wang. 2018. Cross-modal Hamming hashing. In Proceedings of the European Conference on Computer Vision. 202--218.

Digital Library

[2]

Miaomiao Cheng, Liping Jing, and Michael K Ng. 2020. Robust unsupervised cross-modal hashing for multimedia retrieval. ACM Transactions on Information Systems (TOIS), Vol. 38, 3 (2020), 1--25.

Digital Library

[3]

Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: a real-world web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. ACM, 48.

Digital Library

[4]

Guiguang Ding, Yuchen Guo, and Jile Zhou. 2014. Collective matrix factorization hashing for multimodal data. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2075--2082.

Digital Library

[5]

Venice Erin Liong, Jiwen Lu, Yap-Peng Tan, and Jie Zhou. 2017. Cross-modal deep variational hashing. In Proceedings of the IEEE International Conference on Computer Vision. 4077--4085.

[6]

Hugo Jair Escalante, Carlos A Hernández, Jesus A Gonzalez, Aurelio López-López, Manuel Montes, Eduardo F Morales, L Enrique Sucar, Luis Villase nor, and Michael Grubinger. 2010. The segmented and annotated IAPR TC-12 benchmark. Computer Vision and Image Understanding, Vol. 114, 4 (2010), 419--428.

Digital Library

[7]

Yixian Fang and Yuwei Ren. 2020. Supervised discrete cross-modal hashing based on kernel discriminant analysis. Pattern Recognition, Vol. 98 (2020), 107062.

Digital Library

[8]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth International Conference on Artificial Intelligence and Statistics. 249--256.

[9]

Hengtong Hu, Lingxi Xie, Richang Hong, and Qi Tian. 2020. Creating something from nothing: Unsupervised knowledge distillation for cross-modal hashing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3123--3132.

[10]

Mengqiu Hu, Yang Yang, Fumin Shen, Ning Xie, Richang Hong, and Heng Tao Shen. 2018. Collective reconstructive embeddings for cross-modal hashing. IEEE Transactions on Image Processing, Vol. 28, 6 (2018), 2770--2784.

[11]

Qing-Yuan Jiang and Wu-Jun Li. 2017. Deep cross-modal hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3232--3240.

[12]

Qing-Yuan Jiang and Wu-Jun Li. 2019. Discrete Latent Factor Model for Cross-Modal Hashing. IEEE Transactions on Image Processing (2019).

Digital Library

[13]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, Vol. 25 (2012).

Digital Library

[14]

Shaishav Kumar and Raghavendra Udupa. 2011. Learning hash functions for cross-view similarity search. In Twenty-Second International Joint Conference on Artificial Intelligence.

[15]

Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. 2018. Self-supervised adversarial hashing networks for cross-modal retrieval. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4242--4251.

[16]

Chao Li, Cheng Deng, Lei Wang, De Xie, and Xianglong Liu. 2019. Coupled cyclegan: Unsupervised hashing network for cross-modal retrieval. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 176--183.

Digital Library

[17]

Xuelong Li, Di Hu, and Feiping Nie. 2017. Deep binary reconstruction for cross-modal hashing. In Proceedings of the 25th ACM international conference on Multimedia. 1398--1406.

Digital Library

[18]

Qiubin Lin, Wenming Cao, Zhihai He, and Zhiquan He. 2020. Semantic deep cross-modal hashing. Neurocomputing (2020).

[19]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European Conference on Computer Vision. Springer, 740--755.

[20]

Venice Erin Liong, Jiwen Lu, and Yap-Peng Tan. 2018. Cross-modal discrete hashing. Pattern Recognition, Vol. 79 (2018), 114--129.

[21]

Xin Liu, Xingzhi Wang, and Yiu-ming Cheung. 2021. FDDH: Fast Discriminative Discrete Hashing for Large-Scale Cross-Modal Retrieval. IEEE Transactions on Neural Networks and Learning Systems (2021).

[22]

Xin Luo, Xiao-Ya Yin, Liqiang Nie, Xuemeng Song, Yongxin Wang, and Xin-Shun Xu. 2018. SDMCH: Supervised Discrete Manifold-Embedded Cross-Modal Hashing. In Twenty-Seventh International Ioint Conference on Artificial Intelligence. 2518--2524.

[23]

Herbert Robbins and Sutton Monro. 1951. A stochastic approximation method. The annals of mathematical statistics (1951), 400--407.

[24]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, Vol. 115, 3 (2015), 211--252.

Digital Library

[25]

Yufeng Shi, Xinge You, Feng Zheng, Shuo Wang, and Qinmu Peng. 2019. Equally-Guided Discriminative Hashing for Cross-modal Retrieval. In Twenty-Eighth International Ioint Conference on Artificial Intelligence. 4767--4773.

[26]

Jingkuan Song, Yang Yang, Yi Yang, Zi Huang, and Heng Tao Shen. 2013. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 785--796.

Digital Library

[27]

Shupeng Su, Zhisheng Zhong, and Chao Zhang. 2019. Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 3027--3035.

[28]

Changchang Sun, Xuemeng Song, Fuli Feng, Wayne Xin Zhao, Hao Zhang, and Liqiang Nie. 2019. Supervised hierarchical cross-modal hashing. In Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval. 725--734.

Digital Library

[29]

Rong-Cheng Tu, Xianling Mao, and Wei Wei. 2020a. MLS3RDUH: Deep Unsupervised Hashing via Manifold based Local Semantic Similarity Structure Reconstructing. In IJCAI. 3466--3472.

[30]

Rong-Cheng Tu, Xian-Ling Mao, Jia-Nan Guo, Wei Wei, and Heyan Huang. 2021a. Partial-Softmax Loss based Deep Hashing. Proceedings of The Web Conference 2021 (2021).

Digital Library

[31]

Rong-Cheng Tu, Xian-Ling Mao, Cihang Kong, Zihang Shao, Ze-Lin Li, Wei Wei, and Heyan Huang. 2021b. Weighted gaussian loss based hamming hashing. In Proceedings of the 29th ACM International Conference on Multimedia. 3409--3417.

Digital Library

[32]

Rong-Cheng Tu, Xian-Ling Mao, Kevin Qinghong Lin, Chengfei Cai, Weize Qin, Hongfa Wang, Wei Wei, and Heyan Huang. 2022a. Unsupervised Hashing with Semantic Concept Mining. arXiv preprint arXiv:2209.11475 (2022).

[33]

Rong-Cheng Tu, Xian-Ling Mao, Qinghong Lin, Wenjin Ji, Weize Qin, Wei Wei, and Heyan Huang. 2023. Unsupervised Cross-modal Hashing via Semantic Text Mining. IEEE Transactions on Multimedia (2023).

Digital Library

[34]

Rong-Cheng Tu, Xian-Ling Mao, Bing Ma, Yong Hu, Tan Yan, Wei Wei, and Heyan Huang. 2020c. Deep Cross-Modal Hashing with Hashing Functions and Unified Hash Codes Jointly Learning. IEEE Transactions on Knowledge and Data Engineering (2020).

Digital Library

[35]

Rong-Cheng Tu, Xian-Ling Mao, Rong-Xin Tu, Binbin Bian, Chengfei Cai, Wei Wei, Heyan Huang, et al. 2022b. Deep cross-modal proxy hashing. IEEE Transactions on Knowledge and Data Engineering (2022).

[36]

Rong-Cheng Tu, Xian-Ling Mao, and Wei Wei. 2020b. MLS3RDUH: Deep Unsupervised Hashing via Manifold based Local Semantic Similarity Structure Reconstructing. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence.

[37]

Bokun Wang, Yang Yang, Xing Xu, Alan Hanjalic, and Heng Tao Shen. 2017b. Adversarial cross-modal retrieval. In Proceedings of the 25th ACM International Conference on Multimedia. ACM, 154--162.

Digital Library

[38]

Di Wang, Xinbo Gao, Xiumei Wang, and Lihuo He. 2015. Semantic topic multimodal hashing for cross-media retrieval. In Twenty-Fourth International Ioint Conference on Artificial Intelligence.

[39]

Di Wang, Xinbo Gao, Xiumei Wang, and Lihuo He. 2018. Label consistent matrix factorization hashing for large-scale cross-modal similarity search. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 10 (2018), 2466--2479.

Digital Library

[40]

Di Wang, Quan Wang, and Xinbo Gao. 2017a. Robust and flexible discrete hashing for cross-modal similarity search. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 28, 10 (2017), 2703--2715.

Digital Library

[41]

Tong Wang, Lei Zhu, Zhiyong Cheng, Jingjing Li, and Zan Gao. 2020b. Unsupervised deep cross-modal hashing with virtual label regression. Neurocomputing, Vol. 386 (2020), 84--96.

[42]

Weiwei Wang, Yuming Shen, Haofeng Zhang, Yazhou Yao, and Li Liu. 2021. Set and rebase: determining the semantic graph connectivity for unsupervised cross-modal hashing. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 853--859.

[43]

Xinzhi Wang, Xitao Zou, Erwin M Bakker, and Song Wu. 2020c. Self-constraining and attention-based hashing network for bit-scalable cross-modal retrieval. Neurocomputing, Vol. 400 (2020), 255--271.

[44]

Yongxin Wang, Xin Luo, Liqiang Nie, Jingkuan Song, Wei Zhang, and Xin-Shun Xu. 2020a. BATCH: A scalable asymmetric discrete cross-modal hashing. IEEE Transactions on Knowledge and Data Engineering, Vol. 33, 11 (2020), 3507--3519.

Digital Library

[45]

Hongfa Wu, Lisai Zhang, Qingcai Chen, Yimeng Deng, Joanna Siebert, Yunpeng Han, Zhonghua Li, Dejiang Kong, and Zhao Cao. 2022. Contrastive Label Correlation Enhanced Unified Hashing Encoder for Cross-modal Retrieval. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2158--2168.

Digital Library

[46]

Xing Xu, Fumin Shen, Yang Yang, Heng Tao Shen, and Xuelong Li. 2017. Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Transactions on Image Processing, Vol. 26, 5 (2017), 2494--2507.

Digital Library

[47]

Dejie Yang, Dayan Wu, Wanqian Zhang, Haisu Zhang, Bo Li, and Weiping Wang. 2020. Deep Semantic-Alignment Hashing for Unsupervised Cross-Modal Retrieval. In Proceedings of the 2020 International Conference on Multimedia Retrieval. 44--52.

Digital Library

[48]

Erkun Yang, Cheng Deng, Wei Liu, Xianglong Liu, Dacheng Tao, and Xinbo Gao. 2017. Pairwise relationship guided deep hashing for cross-modal retrieval. In proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31.

[49]

Erkun Yang, Dongren Yao, Tongliang Liu, and Cheng Deng. 2022. Mutual Quantization for Cross-Modal Search With Noisy Labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7551--7560.

[50]

Jun Yu, Hao Zhou, Yibing Zhan, and Dacheng Tao. 2021. Deep Graph-neighbor Coherence Preserving Network for Unsupervised Cross-modal Hashing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4626--4634.

[51]

Yu-Wei Zhan, Yongxin Wang, Yu Sun, Xiao-Ming Wu, Xin Luo, and Xin-Shun Xu. 2022. Discrete online cross-modal hashing. Pattern Recognition, Vol. 122 (2022), 108262.

Digital Library

[52]

Dongqing Zhang and Wu-Jun Li. 2014. Large-scale supervised multimodal hashing with semantic correlation maximization. In Twenty-Eighth AAAI Conference on Artificial Intelligence.

Digital Library

[53]

Donglin Zhang, Xiao-Jun Wu, and Jun Yu. 2021. Label consistent flexible matrix factorization hashing for efficient cross-modal retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 17, 3 (2021), 1--18.

Digital Library

[54]

Jian Zhang, Yuxin Peng, and Mingkuan Yuan. 2018. Unsupervised generative adversarial cross-modal hashing. In Thirty-Second AAAI Conference on Artificial Intelligence.

[55]

Jile Zhou, Guiguang Ding, and Yuchen Guo. 2014. Latent semantic sparse hashing for cross-modal similarity search. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. 415--424.

Digital Library

[56]

Xiaofeng Zhu, Zi Huang, Heng Tao Shen, and Xin Zhao. 2013. Linear cross-modal hashing for efficient multimedia search. In Proceedings of the 21st ACM international conference on Multimedia. 143--152.

Digital Library

Cited By

Chen HZou ZZhu X(2025)Proxy-Based Semi-Supervised Cross-Modal HashingApplied Sciences10.3390/app1505239015:5(2390)Online publication date: 23-Feb-2025
https://doi.org/10.3390/app15052390
Tu RKang XWei Tan CChi CLam K(2025)All Points Guided Adversarial Generator for Targeted Attack Against Deep Hashing RetrievalIEEE Transactions on Information Forensics and Security10.1109/TIFS.2025.353458520(1695-1709)Online publication date: 2025
https://doi.org/10.1109/TIFS.2025.3534585
Li BWu YLi Z(2025)Efficient Parameter-free Adaptive Hashing for Large-Scale Cross-Modal RetrievalInternational Journal of Approximate Reasoning10.1016/j.ijar.2025.109383(109383)Online publication date: Feb-2025
https://doi.org/10.1016/j.ijar.2025.109383
Show More Cited By

Index Terms

Data-Aware Proxy Hashing for Cross-modal Retrieval
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Top-k retrieval in databases

Recommendations

Deep Cross-Modal Proxy Hashing
Due to the high retrieval efficiency and low storage cost for cross-modal search tasks, cross-modal hashing methods have attracted considerable attention from the researchers. For the supervised cross-modal hashing methods, how to make the learned hash ...
Supervised Discriminative Discrete Hashing for Cross-Modal Retrieval
Advanced Data Mining and Applications
Abstract
With the growing interest in cross-modal retrieval technology, cross-modal hashing has become a mainstream trend for comparing and searching between different modalities. However, when faced with multi-label information, existing research has ... $^{}$ $^{}$
Specific class center guided deep hashing for cross-modal retrieval
Abstract
Hashing approaches show excellent retrieval efficiency and low storage usage in search tasks. In general, most existing deep hashing approaches mainly focus on constructing the pairwise similarity matrix by exploiting the supervised ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2023

3567 pages

ISBN:9781450394086

DOI:10.1145/3539618

General Chairs:
Hsin-Hsi Chen
National Taiwan University
,
Wei-Jou (Edward) Duh
National Taiwan University
,
Hen-Hsen Huang
Academia Sinica
,
Program Chairs:
Makoto P. Kato
Spotify
,
Josiane Mothe
Universite de Toulouse
,
Barbara Poblete
University of Chile and Amazon Visiting Academic

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key R\&D Plan
CCF-AFSG Research Fund under Grant
the fund of Joint Laboratory of HUST and Pingan Property \& Casualty Research (HPL)
National Natural Science Foundation of China

Conference

SIGIR '23

Sponsor:

SIGIR

SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 23 - 27, 2023

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
543
Total Downloads

Downloads (Last 12 months)199
Downloads (Last 6 weeks)7

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen HZou ZZhu X(2025)Proxy-Based Semi-Supervised Cross-Modal HashingApplied Sciences10.3390/app1505239015:5(2390)Online publication date: 23-Feb-2025
https://doi.org/10.3390/app15052390
Tu RKang XWei Tan CChi CLam K(2025)All Points Guided Adversarial Generator for Targeted Attack Against Deep Hashing RetrievalIEEE Transactions on Information Forensics and Security10.1109/TIFS.2025.353458520(1695-1709)Online publication date: 2025
https://doi.org/10.1109/TIFS.2025.3534585
Li BWu YLi Z(2025)Efficient Parameter-free Adaptive Hashing for Large-Scale Cross-Modal RetrievalInternational Journal of Approximate Reasoning10.1016/j.ijar.2025.109383(109383)Online publication date: Feb-2025
https://doi.org/10.1016/j.ijar.2025.109383
Ding GLi ZRen Y(2025)Modality-Specific Hashing: Transform Cross-Modal Retrieval Into Single-Modal RetrievalMultiMedia Modeling10.1007/978-981-96-2061-6_32(438-451)Online publication date: 9-Jan-2025
https://dl.acm.org/doi/10.1007/978-981-96-2061-6_32
Zou QCheng SDu AChen J(2024)Text-Enhanced Graph Attention Hashing for Cross-Modal RetrievalEntropy10.3390/e2611091126:11(911)Online publication date: 27-Oct-2024
https://doi.org/10.3390/e26110911
Zuo RZheng CLi FZhu LZhang Z(2024)Privacy-Enhanced Prototype-Based Federated Cross-Modal Hashing for Cross-Modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367450720:9(1-19)Online publication date: 23-Sep-2024
https://dl.acm.org/doi/10.1145/3674507
Wu QZhang ZLiu YZhang JNie L(2024)Contrastive Multi-Bit Collaborative Learning for Deep Cross-Modal HashingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.341957736:11(5835-5848)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TKDE.2024.3419577
Cao MBai YCao ZNie LZhang M(2024)Efficient Image-Text Retrieval via Keyword-Guided Pre-ScreeningIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.333948934:6(5132-5145)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1109/TCSVT.2023.3339489
Li YZheng CZuo RLu W(2024)Semantic Reconstruction Guided Missing Cross-modal Hashing2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650222(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10650222
Ji WMao XLiu JTu RHuang H(2024)Data-Focus Proxy Hashing2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD61410.2024.10580005(3152-3157)Online publication date: 8-May-2024
https://doi.org/10.1109/CSCWD61410.2024.10580005
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten