research-article

Dense Auto-Encoder Hashing for Robust Cross-Modality Retrieval

Authors:

Shengchuan Zhang,

Rongrong JiAuthors Info & Claims

MM '18: Proceedings of the 26th ACM international conference on Multimedia

Pages 1589 - 1597

https://doi.org/10.1145/3240508.3240684

Published: 15 October 2018 Publication History

Abstract

Cross-modality retrieval has been widely studied, which aims to search images as response to text queries or vice versa. When faced with large-scale dataset, cross-modality hashing serves as an efficient and effective solution, which learns binary codes to approximate the cross-modality similarity in the Hamming space. Most recent cross-modality hashing schemes focus on learning the hash functions from data instances with fully modalities. However, how to learn robust binary codes when facing incomplete modality (i.e., with one modality missed or partially observed), is left unexploited, which however widely occurs in real-world applications. In this paper, we propose a novel cross-modality hashing, termed Dense Auto-encoder Hashing (DAH), which can explicitly impute the missed modality and produce robust binary codes by leveraging the relatedness among different modalities. To that effect, we propose a novel Dense Auto-encoder Network (DAN) to impute the missing modalities, which densely connects each layer to every other layer in a feed-forward fashion. For each layer, a noisy auto-encoder block is designed to calculate the residue between the current prediction and original data. Finally, a hash-layer is added to the end of DAN, which serves as a special binary encoder model to deal with the incomplete modality input. Quantitative experiments on three cross-modality visual search benchmarks, i.e., the Wiki, NUS-WIDE, and FLICKR-25K, have shown that the proposed DAH has superior performance over the state-of-the-art approaches.

References

[1]

Dan Zhang, Fei Wang, and Luo Si. Composite hashing with multiple information sources. In Proceedings of the SIGIR, 2011.

Digital Library

[2]

Jingkuan Song, Yang Yang, Yi Yang, Zi Huang, and Heng Tao Shen. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In Proceedings of the SIGMOD, 2013.

Digital Library

[3]

Dongqing Zhang and Wu-Jun Li. Large-scale supervised multimodal hashing with semantic correlation maximization. In Proceedings of the AAAI, 2014.

Digital Library

[4]

Hong Liu, Rongrong Ji, Yongjian Wu, and Gang Hua. Supervised matrix factorization for cross-modality hashing. In Proceedings of the IJCAI, 2016.

Digital Library

[5]

Li Liu, Zijia Lin, Ling Shao, Fumin Shen, Guiguang Ding, and Jungong Han. Sequential discrete hashing for scalable cross-modality similarity retrieval. IEEE Transactions on Image Processing, 2017.

Digital Library

[6]

Hong Liu, Rongrong Ji, Yongjian Wu, Feiyue Huang, and Baochang Zhang. Cross-modality binary code learning via fusion similarity hashing. In Proceedings of the CVPR, 2017.

[7]

Ting Yao, Tao Mei, and Chong-Wah Ngo. Learning query and image similarities with ranking canonical correlation analysis. In Proceedings of the ICCV, 2015.

Digital Library

[8]

Yi Zhen, Yue Gao, Dit-Yan Yeung, Hongyuan Zha, and Xuelong Li. Spectral multimodal hashing and its application to multimedia retrieval. IEEE Transactions on cybernetics, 2016.

[9]

Li Liu, Mengyang Yu, and Ling Shao. Multiview alignment hashing for efficient image search. IEEE Transactions on image processing, 2015.

[10]

Yao Hu, Zhongming Jin, Hongyi Ren, Deng Cai, and Xiaofei He. Iterative multi-view hashing for cross media indexing. In Proceedings of the ACM MM, 2014.

Digital Library

[11]

Di Wang, Xinbo Gao, Xiumei Wang, and Lihuo He. Semantic topic multimodal hashing for cross-media retrieval. In Proceedings of the IJCAI, 2015.

Digital Library

[12]

Botong Wu, Qiang Yang, Wei-Shi Zheng, Yizhou Wang, and Jingdong Wang. Quantized correlation hashing for fast cross-modal search. In Proceedings of the IJCAI, 2015.

Digital Library

[13]

Xiaobo Shen, Fumin Shen, Quan-Sen Sun, Yun-Hao Yuan, and Heng Tao Shen. Robust cross-view hashing for multimedia retrieval. IEEE Signal Processing Letters, 2016.

[14]

Shaishav Kumar and Raghavendra Udupa. Learning hash functions for cross-view similarity search. In Proceedings of the IJCAI, 2011.

Digital Library

[15]

Mohammad Rastegari, Jonghyun Choi, Shobeir Fakhraei, Daume Hal, and Larry Davis. Predictable dual-view hashing. In Proceedings of the ICML, 2013.

Digital Library

[16]

Guiguang Ding, Yuchen Guo, and Jile Zhou. Collective matrix factorization hashing for multimodal data. In Proceedings of the CVPR, 2014.

Digital Library

[17]

Yi Zhen and Dit-Yan Yeung. Co-regularized hashing for multimodal data. In Proceedings of the NIPS, 2012.

Digital Library

[18]

Ying Wei, Yangqiu Song, Yi Zhen, Bo Liu, and Qiang Yang. Scalable heterogeneous translated hashing. In Proceedings of the SIGKDD, 2014.

Digital Library

[19]

Zijia Lin, Guiguang Ding, Mingqing Hu, and Jianmin Wang. Semantics-preserving hashing for cross-view retrieval. In Proceedings of the CVPR, 2015.

[20]

Devraj Mandal, Kunal N Chaudhury, and Soma Biswas. Generalized semantic preserving hashing for n-label cross-modal retrieval. In Proceedings of the CVPR, 2017.

[21]

Qifan Wang, Luo Si, and Bin Shen. Learning to hash on partial multi-modal data. In Proceedings of the IJCAI, 2015.

Digital Library

[22]

Xiaobo Shen, Fumin Shen, Quan-Sen Sun, Yang Yang, Yun-Hao Yuan, and Heng Tao Shen. Semi-paired discrete hashing: Learning latent hash codes for semi-paired cross-view retrieval. IEEE transactions on cybernetics, 2017.

[23]

Luan Tran, Xiaoming Liu, Jiayu Zhou, and Rong Jin. Missing modalities imputation via cascaded residual autoencoder. In Proceedings of the CVPR, 2017.

[24]

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and year=2017 booktitle=CVPR. Densely connected convolutional networks.

[25]

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the ICML, 2008.

Digital Library

[26]

Kun Ding, Chunlei Huo, Bin Fan, Shiming Xiang, and Chunhong Pan. In defense of locality-sensitive hashing. IEEE Transactions on Neural Networks and Learning Systems, 2016.

[27]

Qing-Yuan Jiang and Wu-Jun Li. Deep cross-modal hashing. In Proceedings of the CVPR, 2017.

[28]

Fumin Shen, Chunhua Shen, Wei Liu, and Heng Tao Shen. Supervised discrete hashing. In Proceedings of the CVPR, 2015.

[29]

Jie Gui, Tongliang Liu, Zhenan Sun, Dacheng Tao, and Tieniu Tan. Fast supervised discrete hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.

Digital Library

[30]

Bahadir Ozdemir and Larry S Davis. A probabilistic framework for multimodal retrieval using integrative indian buffet process. In Proceedings of the NIPS, 2014.

Digital Library

[31]

Diederik P Kingma. Variational inference & deep learning: A new synthesis. 2017.

[32]

Jile Zhou, Guiguang Ding, and Yuchen Guo. Latent semantic sparse hashing for cross-modal similarity search. In Proceedings of the SIGIR, 2014.

Digital Library

[33]

Xiao Cai, Feiping Nie, and Heng Huang. Multi-view k-means clustering on big data. In Proceeding of the IJCAI, 2013.

Digital Library

[34]

Arthur Zimek, Erich Schubert, and Hans-Peter Kriegel. A survey on unsupervised outlier detection in high-dimensional numerical data. Statistical Analysis and Data Mining, 2012.

Digital Library

[35]

Wei Liu, Gang Hua, and John R Smith. Unsupervised one-class learning for automatic outlier removal. In Proceedings of the CVPR, 2014.

Digital Library

[36]

Jingdong Wang, Ting Zhang, Nicu Sebe, Heng Tao Shen, et al. A survey on learning to hash. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.

[37]

Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, and Shih-Fu Chang. Supervised hashing with kernels. In Proceedings of the CVPR, 2012.

Digital Library

[38]

Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013.

Digital Library

Cited By

Fu TZhan YZhang CLuo XChen ZWang YYang XXu XCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)FedCAFE: Federated Cross-Modal Hashing with Adaptive Feature EnhancementProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681319(9670-9679)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681319
Zhu LZheng CGuan WLi JYang YShen H(2024)Multi-Modal Hashing for Efficient Multimedia Retrieval: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.328292136:1(239-260)Online publication date: Jan-2024
https://doi.org/10.1109/TKDE.2023.3282921
Li YZheng CZuo RLu W(2024)Semantic Reconstruction Guided Missing Cross-modal Hashing2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650222(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10650222
Show More Cited By

Index Terms

Dense Auto-Encoder Hashing for Robust Cross-Modality Retrieval
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Semi-Relaxation Supervised Hashing for Cross-Modal Retrieval
MM '17: Proceedings of the 25th ACM international conference on Multimedia

Recently, some cross-modal hashing methods have been devised for cross-modal search task. Essentially, given a similarity matrix, most of these methods tackle a discrete optimization problem by separating it into two stages, i.e., first relaxing the ...
Structure-aware contrastive hashing for unsupervised cross-modal retrieval
Abstract
Cross-modal hashing has attracted a lot of attention and achieved remarkable success in large-scale cross-media similarity retrieval applications because of its superior computational efficiency and low storage overhead. However, constructing ...
Highlights
- We propose an informative multimodal correlation matrix construction approach.
- We propose a multimodal structure-aware alignment network to bridge heterogeneous gaps.
- Extensive experiments show SACH’s superiority in cross-modal ...
Unsupervised cross-modal retrieval via Multi-modal graph regularized Smooth Matrix Factorization Hashing
Abstract
The existing cross-modal hashing methods often encounter quantization loss which is caused by relaxing discrete hash codes in the process of cross-modal retrieval. To counter this problem, a Multi-modal graph regularized Smooth matrix ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '18: Proceedings of the 26th ACM international conference on Multimedia

October 2018

2167 pages

ISBN:9781450356657

DOI:10.1145/3240508

General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key R&D Program
Post Doctoral Innovative Talent Support Program
Nature Science Foundation of Fujian Province, China
Scientific Research Project of National Language Committee of China
China Post-Doctoral Science Foundation
Nature Science Foundation of China

Conference

MM '18

Sponsor:

SIGMM

MM '18: ACM Multimedia Conference

October 22 - 26, 2018

Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
672
Total Downloads

Downloads (Last 12 months)46
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fu TZhan YZhang CLuo XChen ZWang YYang XXu XCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)FedCAFE: Federated Cross-Modal Hashing with Adaptive Feature EnhancementProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681319(9670-9679)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681319
Zhu LZheng CGuan WLi JYang YShen H(2024)Multi-Modal Hashing for Efficient Multimedia Retrieval: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.328292136:1(239-260)Online publication date: Jan-2024
https://doi.org/10.1109/TKDE.2023.3282921
Li YZheng CZuo RLu W(2024)Semantic Reconstruction Guided Missing Cross-modal Hashing2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650222(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10650222
Shi DZhu LLi JDong GZhang H(2023)Incomplete Cross-Modal Retrieval with Deep Correlation TransferACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3637442Online publication date: 13-Dec-2023
https://doi.org/10.1145/3637442
Shen XChen YPan SLiu WZheng YEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Graph Convolutional Incomplete Multi-modal HashingProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612282(7029-7037)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612282
Tan WZhu LLi JZhang ZZhang H(2023)Partial Multi-Modal Hashing via Neighbor-Aware Completion LearningIEEE Transactions on Multimedia10.1109/TMM.2023.323830825(8499-8510)Online publication date: 2023
https://doi.org/10.1109/TMM.2023.3238308
Tan WZhu LLi JZhang HHan J(2023)Teacher-Student Learning: Efficient Hierarchical Message Aggregation Hashing for Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2022.317790125(4520-4532)Online publication date: 2023
https://doi.org/10.1109/TMM.2022.3177901
Mingyong LYewen LMingyuan GLongfei M(2023)CLIP-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrievalInternational Journal of Multimedia Information Retrieval10.1007/s13735-023-00268-712:1Online publication date: 22-Feb-2023
https://doi.org/10.1007/s13735-023-00268-7
Zeng XXu KXie Y(2023)Pseudo-label driven deep hashing for unsupervised cross-modal retrievalInternational Journal of Machine Learning and Cybernetics10.1007/s13042-023-01842-514:10(3437-3456)Online publication date: 11-May-2023
https://doi.org/10.1007/s13042-023-01842-5
Shi YZhao YLiu XZheng FOu WYou XPeng Q(2022)Deep Adaptively-Enhanced Hashing With Discriminative Similarity Guidance for Unsupervised Cross-Modal RetrievalIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.317271632:10(7255-7268)Online publication date: Oct-2022
https://doi.org/10.1109/TCSVT.2022.3172716
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten