research-article

Correlation Autoencoder Hashing for Supervised Cross-Modal Search

Authors:

Mingsheng Long,

Han ZhuAuthors Info & Claims

ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval

Pages 197 - 204

https://doi.org/10.1145/2911996.2912000

Published: 06 June 2016 Publication History

Abstract

Due to its storage and query efficiency, hashing has been widely applied to approximate nearest neighbor search from large-scale datasets. While there is increasing interest in cross-modal hashing which facilitates cross-media retrieval by embedding data from different modalities into a common Hamming space, how to distill the cross-modal correlation structure effectively remains a challenging problem. In this paper, we propose a novel supervised cross-modal hashing method, Correlation Autoencoder Hashing (CAH), to learn discriminative and compact binary codes based on deep autoencoders. Specifically, CAH jointly maximizes the feature correlation revealed by bimodal data and the semantic correlation conveyed in similarity labels, while embeds them into hash codes by nonlinear deep autoencoders. Extensive experiments clearly show the superior effectiveness and efficiency of CAH against the state-of-the-art hashing methods on standard cross-modal retrieval benchmarks.

References

[1]

A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In IEEE Symposium on Foundations of Computer Science, 2006.

Digital Library

[2]

Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8):1798--1828, Aug 2013.

Digital Library

[3]

M. Bronstein, A. Bronstein, F. Michel, and N. Paragios. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In Computer Vision and Pattern Recognition, 2010 IEEE Conference on, CVPR'10, pages 3594--3601. IEEE, June 2010.

[4]

Y. Cao, M. Long, J. Wang, H. Zhu, and Q. Wen. Deep quantization network for efficient image retrieval. In AAAI Conference on Artificial Intelligence, 2016.

[5]

T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y.-T. Zheng. Nus-wide: A real-world web image database from national university of singapore. In Proc. of ACM Conf. on Image and Video Retrieval, Santorini, Greece, July 8--10, 2009.

Digital Library

[6]

F. Feng, X. Wang, and R. Li. Cross-modal retrieval with correspondence autoencoder. In Proceedings of the ACM International Conference on Multimedia. ACM, 2014.

Digital Library

[7]

Y. Hu, Z. Jin, H. Ren, D. Cai, and X. He. Iterative multi-view hashing for cross media indexing. In Proceedings of the ACM International Conference on Multimedia, MM '14, pages 527--536. ACM, 2014.

Digital Library

[8]

M. J. Huiskes and M. S. Lew. The mir flickr retrieval evaluation. In MIR '08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval. ACM, 2008.

Digital Library

[9]

H. Jegou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1):117--128, Jan 2011.

Digital Library

[10]

Z. Jiang, G. Zhang, and L. S. Davis. Submodular dictionary learning for sparse coding. In CVPR. IEEE, 2012.

Digital Library

[11]

A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.

[12]

S. Kumar and R. Udupa. Learning hash functions for cross-view similarity search. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI '11, pages 1360--1365, 2011.

Digital Library

[13]

W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang. Supervised hashing with kernels. In CVPR. IEEE, 2012.

[14]

X. Liu, J. He, C. Deng, and B. Lang. Collaborative hashing. In CVPR. IEEE, 2014.

Digital Library

[15]

M. Long, Y. Cao, J. Wang, and P. S. Yu. Composite correlation quantization for efficient multimodal retrieval. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '16. ACM, 2016.

Digital Library

[16]

X. Lu, F. Wu, S. Tang, Z. Zhang, X. He, and Y. Zhuang. A low rank structural large margin method for cross-modal ranking. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '13. ACM, 2013.

Digital Library

[17]

M. Norouzi and D. J. Fleet. Cartesian k-means. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, CVPR '13. IEEE, 2013.

Digital Library

[18]

J. C. Pereira, E. Coviello, G. Doyle, N. Rasiwasia, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos. On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3):521--535, 2014.

Digital Library

[19]

A. W. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. TPAMI, 22(12):1349--1380, 2000.

Digital Library

[20]

J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13, pages 785--796. ACM, 2013.

Digital Library

[21]

N. Srivastava and R. Salakhutdinov. Multimodal learning with deep boltzmann machines. Journal of Machine Learning Research, 15:2949--2980, 2014.

Digital Library

[22]

M. Sugiyama. Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. JMLR, 8:1027--1061, 2007.

Digital Library

[23]

P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. JMLR, 11:3371--3408, 2010.

Digital Library

[24]

J. Wan, D. Wang, S. C. H. Hoi, P. Wu, J. Zhu, Y. Zhang, and J. Li. Deep learning for content-based image retrieval: A comprehensive study. In MM. ACM, 2014.

Digital Library

[25]

D. Wang, P. Cui, M. Ou, and W. Zhu. Deep multimodal hashing with orthogonal regularization. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, 2015.

Digital Library

[26]

J. Wang, S. Kumar, and S.-F. Chang. Semi-supervised hashing for large-scale search. TPAMI, 34(12):2393--2406, 2012.

Digital Library

[27]

J. Wang, H. T. Shen, J. Song, and J. Ji. Hashing for similarity search: A survey. Arxiv, 2014.

[28]

W. Wang, B. C. Ooi, X. Yang, D. Zhang, and Y. Zhuang. Effective multi-modal retrieval based on stacked auto-encoders. In Proceedings of the VLDB Endowment, VLDB '14, pages 649--660. ACM, 2014.

Digital Library

[29]

Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In Advances in Neural Information Processing Systems 21, pages 1753--1760. Curran Associates, Inc., 2009.

[30]

Z. Yu, F. Wu, Y. Yang, Q. Tian, J. Luo, and Y. Zhuang. Discriminative coupled dictionary hashing for fast cross-media retrieval. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2014.

Digital Library

[31]

D. Zhang and W.-J. Li. Large-scale supervised multimodal hashing with semantic correlation maximization. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014.

Digital Library

[32]

P. Zhang, W. Zhang, W.-J. Li, and M. Guo. Supervised hashing with latent factor models. In SIGIR, 2014.

Digital Library

[33]

Y. Zhen and D.-Y. Yeung. Co-regularized hashing for multimodal data. In Advances in Neural Information Processing Systems 24, 2012.

[34]

Y. Zhen and D.-Y. Yeung. A probabilistic model for multimodal hash function learning. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12. ACM, 2012.

Digital Library

[35]

X. Zhu, Z. Huang, H. T. Shen, and X. Zhao. Linear cross-modal hashing for efficient multimedia search. In Proceedings of the 21st ACM International Conference on Multimedia, MM '13, pages 143--152. ACM, 2013.

Digital Library

Cited By

Li MGe M(2025)Enhanced-Similarity Attention Fusion for Unsupervised Cross-Modal Hashing RetrievalData Science and Engineering10.1007/s41019-024-00274-7Online publication date: 30-Jan-2025
https://doi.org/10.1007/s41019-024-00274-7
Wang TLi FZhu LLi JZhang ZShen H(2024)Cross-Modal Retrieval: A Systematic Review of Methods and Future DirectionsProceedings of the IEEE10.1109/JPROC.2024.3525147112:11(1716-1754)Online publication date: Nov-2024
https://doi.org/10.1109/JPROC.2024.3525147
Ma YWang MLu GSun Y(2024)Deep Hashing Similarity Learning for Cross-Modal RetrievalIEEE Access10.1109/ACCESS.2024.335243412(8609-8618)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3352434
Show More Cited By

Index Terms

Correlation Autoencoder Hashing for Supervised Cross-Modal Search
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval

Recommendations

Latent semantic-enhanced discrete hashing for cross-modal retrieval
Abstract
Hashing methods have been proposed for the cross-modal retrieval tasks due to their flexibility and effectiveness. The main idea of cross-modal hashing is to embed heterogeneous multimedia data into common Hamming space. How to effectively exploit ...
Supervised Discriminative Discrete Hashing for Cross-Modal Retrieval
Advanced Data Mining and Applications
Abstract
With the growing interest in cross-modal retrieval technology, cross-modal hashing has become a mainstream trend for comparing and searching between different modalities. However, when faced with multi-label information, existing research has ... $^{}$ $^{}$
Latent semantic sparse hashing for cross-modal similarity search
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

Similarity search methods based on hashing for effective and efficient cross-modal retrieval on large-scale multimedia databases with massive text and images have attracted considerable attention. The core problem of cross-modal hashing is how to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval

June 2016

452 pages

ISBN:9781450343596

DOI:10.1145/2911996

General Chairs:
John R. Kender
Columbia University, USA
,
John R. Smith
IBM Research, USA
,
Program Chairs:
Jiebo Luo
University of Rochester, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Winston Hsu
National Taiwan University, Taiwan

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science and Technology Supporting Program
China Postdoctoral Science Foundation
Tsinghua National Laboratory (TNList) Special Fund for Big Data Science and Technology
National Natural Science Foundation of China

Conference

ICMR'16

Sponsor:

SIGMM

ICMR'16: International Conference on Multimedia Retrieval

June 6 - 9, 2016

New York, New York, USA

Acceptance Rates

ICMR '16 Paper Acceptance Rate 20 of 120 submissions, 17%;

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

72
Total Citations
View Citations
596
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)7

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li MGe M(2025)Enhanced-Similarity Attention Fusion for Unsupervised Cross-Modal Hashing RetrievalData Science and Engineering10.1007/s41019-024-00274-7Online publication date: 30-Jan-2025
https://doi.org/10.1007/s41019-024-00274-7
Wang TLi FZhu LLi JZhang ZShen H(2024)Cross-Modal Retrieval: A Systematic Review of Methods and Future DirectionsProceedings of the IEEE10.1109/JPROC.2024.3525147112:11(1716-1754)Online publication date: Nov-2024
https://doi.org/10.1109/JPROC.2024.3525147
Ma YWang MLu GSun Y(2024)Deep Hashing Similarity Learning for Cross-Modal RetrievalIEEE Access10.1109/ACCESS.2024.335243412(8609-8618)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3352434
Zhang MLi JZheng X(2024)Semantic embedding based online cross-modal hashing methodScientific Reports10.1038/s41598-023-50242-w14:1Online publication date: 6-Jan-2024
https://doi.org/10.1038/s41598-023-50242-w
Li LShu ZYu ZWu X(2024)Robust online hashing with label semantic enhancement for cross-modal retrievalPattern Recognition10.1016/j.patcog.2023.109972145(109972)Online publication date: Jan-2024
https://doi.org/10.1016/j.patcog.2023.109972
Shu ZLi LYu Z(2024)Online hashing with partially known labels for cross-modal retrievalEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109367138(109367)Online publication date: Dec-2024
https://doi.org/10.1016/j.engappai.2024.109367
Ji YZhang XZhou GZheng XZeng D(2024)Learning Hash Subspace from Large-Scale Multi-modal Pre-Training: A CLIP-Based Cross-modal Hashing FrameworkProceedings of 2023 11th China Conference on Command and Control10.1007/978-981-99-9021-4_48(514-526)Online publication date: 4-Feb-2024
https://doi.org/10.1007/978-981-99-9021-4_48
Tu RMao XLin QJi WQin WWei WHuang H(2023)Unsupervised Cross-Modal Hashing via Semantic Text MiningIEEE Transactions on Multimedia10.1109/TMM.2023.324360825(8946-8957)Online publication date: 2023
https://doi.org/10.1109/TMM.2023.3243608
Song LShang XYang CSun M(2023)Attribute-Guided Multiple Instance Hashing Network for Cross-Modal Zero-Shot HashingIEEE Transactions on Multimedia10.1109/TMM.2022.319022225(5305-5318)Online publication date: 2023
https://doi.org/10.1109/TMM.2022.3190222
Chen YZhang SLiu FWu CGuo KQi Z(2023)DVHN: A Deep Hashing Framework for Large-Scale Vehicle Re-IdentificationIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.327797424:9(9268-9280)Online publication date: Sep-2023
https://doi.org/10.1109/TITS.2023.3277974
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten