research-article

Adversary Guided Asymmetric Hashing for Cross-Modal Retrieval

Authors:

Weiping WangAuthors Info & Claims

ICMR '19: Proceedings of the 2019 on International Conference on Multimedia Retrieval

Pages 159 - 167

https://doi.org/10.1145/3323873.3325045

Published: 05 June 2019 Publication History

Abstract

Cross-modal hashing has attracted considerable attention for large-scale multimodal retrieval task. A majority of hashing methods have been proposed for cross-modal retrieval. However, these methods inadequately focus on feature learning process and cannot fully preserve higher-ranking correlation of various item pairs as well as the multi-label semantics of each item, so that the quality of binary codes may be downgraded. To tackle these problems, in this paper, we propose a novel deep cross-modal hashing method, called Adversary Guided Asymmetric Hashing (AGAH). Specifically, it employs an adversarial learning guided multi-label attention module to enhance the feature learning part which can learn discriminative feature representations and keep the cross-modal invariability. Furthermore, in order to generate hash codes which can fully preserve the multi-label semantics of all items, we propose an asymmetric hashing method which utilizes a multi-label binary code map that can equip the hash codes with multi-label semantic information. In addition, to ensure higher-ranking correlation of all similar item pairs than those of dissimilar ones, we adopt a new triplet-margin constraint and a cosine quantization technique for Hamming space similarity preservation. Extensive empirical studies show that AGAH outperforms several state-of-the-art methods for cross-modal retrieval.

References

[1]

Michael M. Bronstein, Alexander M. Bronstein, Fabrice Michel, and Nikos Paragios. 2010. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In CVPR. 3594--3601.

[2]

Yue Cao, Mingsheng Long, and Jianmin Wang. 2017. Correlation Hashing Network for Efficient Cross-Modal Retrieval. In BMVC.

[3]

Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Return of the Devil in the Details: Delving Deep into Convolutional Nets. In BMVC.

[4]

Zhen-Duo Chen, Wan-Jin Yu, Chuan-Xiang Li, Liqiang Nie, and Xin-Shun Xu. 2018. Dual Deep Neural Networks Cross-Modal Hashing. In AAAI. 274--281.

[5]

Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: a real-world web image database from National University of Singapore. In ACM CIVR.

Digital Library

[6]

Cheng Da, Shibiao Xu, Kun Ding, Gaofeng Meng, Shiming Xiang, and Chunhong Pan. 2017. AMVH: Asymmetric Multi-Valued hashing. In CVPR. 898--906.

[7]

Chuang Gan, Tianbao Yang, and Boqing Gong. 2016. Learning Attributes Equals Multi-Source Domain Generalization. In CVPR. 87--97.

[8]

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In NeurIPS. 2672--2680.

Digital Library

[9]

Li He, Xing Xu, Huimin Lu, Yang Yang, Fumin Shen, and Heng Tao Shen. 2017. Unsupervised cross-modal retrieval through adversarial learning. In IEEE ICME . 1153--1158.

[10]

Mark J. Huiskes and Michael S. Lew. 2008. The MIR flickr retrieval evaluation. In ACM SIGMM. 39--43.

Digital Library

[11]

Qing-Yuan Jiang and Wu-Jun Li. 2017. Deep Cross-Modal Hashing. In CVPR. 3270--3278.

[12]

Qing-Yuan Jiang and Wu-Jun Li. 2018. Asymmetric Deep Supervised Hashing. In AAAI. 3342--3349.

[13]

Shaishav Kumar and Raghavendra Udupa. 2011. Learning Hash Functions for Cross-View Similarity Search. In IJCAI. 1360--1365.

Digital Library

[14]

Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. 2018. Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval. In CVPR. 4242--4251.

[15]

Zijia Lin, Guiguang Ding, Mingqing Hu, and Jianmin Wang. 2015. Semantics-preserving hashing for cross-view retrieval. In CVPR. 3864--3872.

[16]

Wu Liu, Huadong Ma, Heng Qi, Dong Zhao, and Zhineng Chen. 2017. Deep learning hashing for mobile visual search. EURASIP J. Image and Video Processing, Vol. 2017 (2017), 17.

[17]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, Vol. 115, 3 (2015), 211--252.

Digital Library

[18]

Fumin Shen, Xin Gao, Li Liu, Yang Yang, and Heng Tao Shen. 2017. Deep Asymmetric Pairwise Hashing. In ACM MM. 1522--1530.

Digital Library

[19]

Jingkuan Song, Tao He, Lianli Gao, Xing Xu, Alan Hanjalic, and Heng Tao Shen. 2018. Binary Generative Adversarial Networks for Image Retrieval. In AAAI. 394--401.

[20]

Bokun Wang, Yang Yang, Xing Xu, Alan Hanjalic, and Heng Tao Shen. 2017. Adversarial Cross-Modal Retrieval. In ACM MM. 154--162.

Digital Library

[21]

Di Wang, Xinbo Gao, Xiumei Wang, and Lihuo He. 2015. Semantic Topic Multimodal Hashing for Cross-Media Retrieval. In IJCAI. 3890--3896.

Digital Library

[22]

Jingdong Wang, Ting Zhang, Jingkuan Song, Nicu Sebe, and Heng Tao Shen. 2018. A Survey on Learning to Hash. IEEE TPAMI, Vol. 40, 4 (2018), 769--790.

[23]

Wei Wang, Beng Chin Ooi, Xiaoyan Yang, Dongxiang Zhang, and Yueting Zhuang. 2014. Effective Multi-Modal Retrieval based on Stacked Auto-Encoders. PVLDB, Vol. 7, 8 (2014), 649--660.

Digital Library

[24]

Botong Wu, Qiang Yang, Wei-Shi Zheng, Yizhou Wang, and Jingdong Wang. 2015. Quantized Correlation Hashing for Fast Cross-Modal Search. In IJCAI. 3946--3952.

Digital Library

[25]

Xing Xu, Fumin Shen, Yang Yang, Heng Tao Shen, and Xuelong Li. 2017. Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval. IEEE TIP, Vol. 26, 5 (2017), 2494--2507.

Digital Library

[26]

Erkun Yang, Cheng Deng, Wei Liu, Xianglong Liu, Dacheng Tao, and Xinbo Gao. 2017. Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval. In AAAI. 1618--1625.

Digital Library

[27]

Zhaoda Ye and Yuxin Peng. 2018. Multi-Scale Correlation for Sequential Cross-modal Hashing Learning. In ACM MM . 852--860.

[28]

Zhou Yu, Fei Wu, Yi Yang, Qi Tian, Jiebo Luo, and Yueting Zhuang. 2014. Discriminative coupled dictionary hashing for fast cross-media retrieval. In ACM SIGIR. 395--404.

Digital Library

[29]

Dongqing Zhang and Wu-Jun Li. 2014. Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization. In AAAI. 2177--2183.

Digital Library

[30]

Jian Zhang and Yuxin Peng. 2018. Query-Adaptive Image Retrieval by Deep-Weighted Hashing. IEEE Trans. Multimedia, Vol. 20, 9 (2018), 2400--2414.

Digital Library

[31]

Jian Zhang, Yuxin Peng, and Mingkuan Yuan. 2018b. SCH-GAN: Semi-supervised Cross-modal Hashing by Generative Adversarial Network. CoRR, Vol. abs/1802.02488 (2018). arxiv: 1802.02488

[32]

Jian Zhang, Yuxin Peng, and Mingkuan Yuan. 2018c. Unsupervised Generative Adversarial Cross-Modal Hashing. In AAAI. 539--546.

[33]

Xi Zhang, Hanjiang Lai, and Jiashi Feng. 2018a. Attention-Aware Deep Adversarial Hashing for Cross-Modal Retrieval. In ECCV. 614--629.

Cited By

Dai YFeng YMa NZhao XGao Y(2025)Cross-Modal 3D Shape Retrieval via Heterogeneous Dynamic Graph RepresentationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.352444047:4(2370-2387)Online publication date: Apr-2025
https://doi.org/10.1109/TPAMI.2024.3524440
Zhang JYu YTang SLi W(2025)Adversarial Contrastive Autoencoder With Shared Attention for Audio-Visual Correlation LearningIEEE Access10.1109/ACCESS.2025.354661013(39753-39764)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3546610
Fei LHe ZWong WZhu QZhao SWen J(2025)Semantic decomposition and enhancement hashing for deep cross-modal retrievalPattern Recognition10.1016/j.patcog.2024.111225160:COnline publication date: 1-Apr-2025
https://dl.acm.org/doi/10.1016/j.patcog.2024.111225
Show More Cited By

Index Terms

Adversary Guided Asymmetric Hashing for Cross-Modal Retrieval
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval

Recommendations

Asymmetric Deep Cross-modal Hashing
Computational Science – ICCS 2019
Abstract
Cross-modal retrieval has attracted increasing attention in recent years. Deep supervised hashing methods have been widely used for cross-modal similarity retrieval on large-scale datasets, because the deep architectures can generate more ...
Semi-supervised cross-modal hashing via modality-specific and cross-modal graph convolutional networks
Highlights
- MCGCN for the first time builds cross-modal graph and jointly learns modality-specific and modality-shared features for semi-supervised cross-modal hashing.
- MCGCN provides a three-channel network architecture, including two modality-...
Abstract
Cross-modal hashing maps heterogeneous multimedia data into Hamming space for retrieving relevant samples across modalities, which has received great research interests due to its rapid retrieval and low storage cost. In real-world applications, ...
Semi-supervised semantic factorization hashing for fast cross-modal retrieval

Cross-modal hashing can effectively solve the large-scale cross-modal retrieval by integrating the advantages of traditional cross-modal analysis and hashing techniques. In cross-modal hashing, preserving semantic correlation is important and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '19: Proceedings of the 2019 on International Conference on Multimedia Retrieval

June 2019

427 pages

ISBN:9781450367653

DOI:10.1145/3323873

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada
,
Alberto Del Bimbo
University of Florence, Italy
,
Zhongfei Zhang
Binghamton University, State University of New York, USA
,
Program Chairs:
Alexander Hauptmann
Carnegie Mellon University, USA
,
K. Selcuk Candan
Arizona State University, USA
,
Marco Bertini
University of Florence, Italy
,
Lexing Xie
Australia National University, Australia
,
Xiao-Yong Wei
Sichuan University, China

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Best Student Paper

Author Tags

Qualifiers

Research-article

Funding Sources

Chinese Academy of Sciences

Conference

ICMR '19

Sponsor:

SIGMM

ICMR '19: International Conference on Multimedia Retrieval

June 10 - 13, 2019

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

106
Total Citations
View Citations
959
Total Downloads

Downloads (Last 12 months)72
Downloads (Last 6 weeks)8

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dai YFeng YMa NZhao XGao Y(2025)Cross-Modal 3D Shape Retrieval via Heterogeneous Dynamic Graph RepresentationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.352444047:4(2370-2387)Online publication date: Apr-2025
https://doi.org/10.1109/TPAMI.2024.3524440
Zhang JYu YTang SLi W(2025)Adversarial Contrastive Autoencoder With Shared Attention for Audio-Visual Correlation LearningIEEE Access10.1109/ACCESS.2025.354661013(39753-39764)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3546610
Fei LHe ZWong WZhu QZhao SWen J(2025)Semantic decomposition and enhancement hashing for deep cross-modal retrievalPattern Recognition10.1016/j.patcog.2024.111225160:COnline publication date: 1-Apr-2025
https://dl.acm.org/doi/10.1016/j.patcog.2024.111225
Wu LQin QHou JDai JHuang LZhang W(2025)Deep multi-similarity hashing via label-guided network for cross-modal retrievalNeurocomputing10.1016/j.neucom.2024.128830616(128830)Online publication date: Feb-2025
https://doi.org/10.1016/j.neucom.2024.128830
Zhang JYu YTang SQi GWu HHachiya H(2025)Enhancing semantic audio-visual representation learning with supervised multi-scale attentionPattern Analysis and Applications10.1007/s10044-025-01414-z28:2Online publication date: 11-Feb-2025
https://doi.org/10.1007/s10044-025-01414-z
Lu BZhao TLiang GLi JDuan X(2025)Adversarial Graph Convolutional Network Hashing for Cross-Modal RetrievalWeb and Big Data. APWeb-WAIM 2024 International Workshops10.1007/978-981-96-0055-7_6(69-80)Online publication date: 31-Jan-2025
https://doi.org/10.1007/978-981-96-0055-7_6
Zou QCheng SDu AChen J(2024)Text-Enhanced Graph Attention Hashing for Cross-Modal RetrievalEntropy10.3390/e2611091126:11(911)Online publication date: 27-Oct-2024
https://doi.org/10.3390/e26110911
Wu RZhu XYi ZZou ZLiu YZhu L(2024)Multi-Grained Similarity Preserving and Updating for Unsupervised Cross-Modal HashingApplied Sciences10.3390/app1402087014:2(870)Online publication date: 19-Jan-2024
https://doi.org/10.3390/app14020870
Gan CTu YLi YLin WCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and CorrectionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680859(4217-4226)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680859
Zeng DWang YIkeda KYu YGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Anchor-aware Deep Metric Learning for Audio-visual RetrievalProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658067(211-219)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658067
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten