skip to main content
10.1145/2911996.2912000acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Correlation Autoencoder Hashing for Supervised Cross-Modal Search

Published: 06 June 2016 Publication History

Abstract

Due to its storage and query efficiency, hashing has been widely applied to approximate nearest neighbor search from large-scale datasets. While there is increasing interest in cross-modal hashing which facilitates cross-media retrieval by embedding data from different modalities into a common Hamming space, how to distill the cross-modal correlation structure effectively remains a challenging problem. In this paper, we propose a novel supervised cross-modal hashing method, Correlation Autoencoder Hashing (CAH), to learn discriminative and compact binary codes based on deep autoencoders. Specifically, CAH jointly maximizes the feature correlation revealed by bimodal data and the semantic correlation conveyed in similarity labels, while embeds them into hash codes by nonlinear deep autoencoders. Extensive experiments clearly show the superior effectiveness and efficiency of CAH against the state-of-the-art hashing methods on standard cross-modal retrieval benchmarks.

References

[1]
A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In IEEE Symposium on Foundations of Computer Science, 2006.
[2]
Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8):1798--1828, Aug 2013.
[3]
M. Bronstein, A. Bronstein, F. Michel, and N. Paragios. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In Computer Vision and Pattern Recognition, 2010 IEEE Conference on, CVPR'10, pages 3594--3601. IEEE, June 2010.
[4]
Y. Cao, M. Long, J. Wang, H. Zhu, and Q. Wen. Deep quantization network for efficient image retrieval. In AAAI Conference on Artificial Intelligence, 2016.
[5]
T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y.-T. Zheng. Nus-wide: A real-world web image database from national university of singapore. In Proc. of ACM Conf. on Image and Video Retrieval, Santorini, Greece, July 8--10, 2009.
[6]
F. Feng, X. Wang, and R. Li. Cross-modal retrieval with correspondence autoencoder. In Proceedings of the ACM International Conference on Multimedia. ACM, 2014.
[7]
Y. Hu, Z. Jin, H. Ren, D. Cai, and X. He. Iterative multi-view hashing for cross media indexing. In Proceedings of the ACM International Conference on Multimedia, MM '14, pages 527--536. ACM, 2014.
[8]
M. J. Huiskes and M. S. Lew. The mir flickr retrieval evaluation. In MIR '08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval. ACM, 2008.
[9]
H. Jegou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1):117--128, Jan 2011.
[10]
Z. Jiang, G. Zhang, and L. S. Davis. Submodular dictionary learning for sparse coding. In CVPR. IEEE, 2012.
[11]
A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
[12]
S. Kumar and R. Udupa. Learning hash functions for cross-view similarity search. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI '11, pages 1360--1365, 2011.
[13]
W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang. Supervised hashing with kernels. In CVPR. IEEE, 2012.
[14]
X. Liu, J. He, C. Deng, and B. Lang. Collaborative hashing. In CVPR. IEEE, 2014.
[15]
M. Long, Y. Cao, J. Wang, and P. S. Yu. Composite correlation quantization for efficient multimodal retrieval. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '16. ACM, 2016.
[16]
X. Lu, F. Wu, S. Tang, Z. Zhang, X. He, and Y. Zhuang. A low rank structural large margin method for cross-modal ranking. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '13. ACM, 2013.
[17]
M. Norouzi and D. J. Fleet. Cartesian k-means. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, CVPR '13. IEEE, 2013.
[18]
J. C. Pereira, E. Coviello, G. Doyle, N. Rasiwasia, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos. On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(3):521--535, 2014.
[19]
A. W. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. TPAMI, 22(12):1349--1380, 2000.
[20]
J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13, pages 785--796. ACM, 2013.
[21]
N. Srivastava and R. Salakhutdinov. Multimodal learning with deep boltzmann machines. Journal of Machine Learning Research, 15:2949--2980, 2014.
[22]
M. Sugiyama. Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. JMLR, 8:1027--1061, 2007.
[23]
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. JMLR, 11:3371--3408, 2010.
[24]
J. Wan, D. Wang, S. C. H. Hoi, P. Wu, J. Zhu, Y. Zhang, and J. Li. Deep learning for content-based image retrieval: A comprehensive study. In MM. ACM, 2014.
[25]
D. Wang, P. Cui, M. Ou, and W. Zhu. Deep multimodal hashing with orthogonal regularization. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, 2015.
[26]
J. Wang, S. Kumar, and S.-F. Chang. Semi-supervised hashing for large-scale search. TPAMI, 34(12):2393--2406, 2012.
[27]
J. Wang, H. T. Shen, J. Song, and J. Ji. Hashing for similarity search: A survey. Arxiv, 2014.
[28]
W. Wang, B. C. Ooi, X. Yang, D. Zhang, and Y. Zhuang. Effective multi-modal retrieval based on stacked auto-encoders. In Proceedings of the VLDB Endowment, VLDB '14, pages 649--660. ACM, 2014.
[29]
Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In Advances in Neural Information Processing Systems 21, pages 1753--1760. Curran Associates, Inc., 2009.
[30]
Z. Yu, F. Wu, Y. Yang, Q. Tian, J. Luo, and Y. Zhuang. Discriminative coupled dictionary hashing for fast cross-media retrieval. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2014.
[31]
D. Zhang and W.-J. Li. Large-scale supervised multimodal hashing with semantic correlation maximization. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014.
[32]
P. Zhang, W. Zhang, W.-J. Li, and M. Guo. Supervised hashing with latent factor models. In SIGIR, 2014.
[33]
Y. Zhen and D.-Y. Yeung. Co-regularized hashing for multimodal data. In Advances in Neural Information Processing Systems 24, 2012.
[34]
Y. Zhen and D.-Y. Yeung. A probabilistic model for multimodal hash function learning. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12. ACM, 2012.
[35]
X. Zhu, Z. Huang, H. T. Shen, and X. Zhao. Linear cross-modal hashing for efficient multimedia search. In Proceedings of the 21st ACM International Conference on Multimedia, MM '13, pages 143--152. ACM, 2013.

Cited By

View all

Index Terms

  1. Correlation Autoencoder Hashing for Supervised Cross-Modal Search

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval
    June 2016
    452 pages
    ISBN:9781450343596
    DOI:10.1145/2911996
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 June 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. correlation analysis
    2. cross-modal retrieval
    3. deep autoencoder
    4. hashing

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ICMR'16
    Sponsor:
    ICMR'16: International Conference on Multimedia Retrieval
    June 6 - 9, 2016
    New York, New York, USA

    Acceptance Rates

    ICMR '16 Paper Acceptance Rate 20 of 120 submissions, 17%;
    Overall Acceptance Rate 254 of 830 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Enhanced-Similarity Attention Fusion for Unsupervised Cross-Modal Hashing RetrievalData Science and Engineering10.1007/s41019-024-00274-7Online publication date: 30-Jan-2025
    • (2024)Cross-Modal Retrieval: A Systematic Review of Methods and Future DirectionsProceedings of the IEEE10.1109/JPROC.2024.3525147112:11(1716-1754)Online publication date: Nov-2024
    • (2024)Deep Hashing Similarity Learning for Cross-Modal RetrievalIEEE Access10.1109/ACCESS.2024.335243412(8609-8618)Online publication date: 2024
    • (2024)Semantic embedding based online cross-modal hashing methodScientific Reports10.1038/s41598-023-50242-w14:1Online publication date: 6-Jan-2024
    • (2024)Robust online hashing with label semantic enhancement for cross-modal retrievalPattern Recognition10.1016/j.patcog.2023.109972145(109972)Online publication date: Jan-2024
    • (2024)Online hashing with partially known labels for cross-modal retrievalEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109367138(109367)Online publication date: Dec-2024
    • (2024)Learning Hash Subspace from Large-Scale Multi-modal Pre-Training: A CLIP-Based Cross-modal Hashing FrameworkProceedings of 2023 11th China Conference on Command and Control10.1007/978-981-99-9021-4_48(514-526)Online publication date: 4-Feb-2024
    • (2023)Unsupervised Cross-Modal Hashing via Semantic Text MiningIEEE Transactions on Multimedia10.1109/TMM.2023.324360825(8946-8957)Online publication date: 2023
    • (2023)Attribute-Guided Multiple Instance Hashing Network for Cross-Modal Zero-Shot HashingIEEE Transactions on Multimedia10.1109/TMM.2022.319022225(5305-5318)Online publication date: 2023
    • (2023)DVHN: A Deep Hashing Framework for Large-Scale Vehicle Re-IdentificationIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.327797424:9(9268-9280)Online publication date: Sep-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media