research-article

Linear cross-modal hashing for efficient multimedia search

Authors:

Xin ZhaoAuthors Info & Claims

MM '13: Proceedings of the 21st ACM international conference on Multimedia

Pages 143 - 152

https://doi.org/10.1145/2502081.2502107

Published: 21 October 2013 Publication History

Abstract

Most existing cross-modal hashing methods suffer from the scalability issue in the training phase. In this paper, we propose a novel cross-modal hashing approach with a linear time complexity to the training data size, to enable scalable indexing for multimedia search across multiple modals. Taking both the intra-similarity in each modal and the inter-similarity across different modals into consideration, the proposed approach aims at effectively learning hash functions from large-scale training datasets. More specifically, for each modal, we first partition the training data into $k$ clusters and then represent each training data point with its distances to $k$ centroids of the clusters. Interestingly, such a k-dimensional data representation can reduce the time complexity of the training phase from traditional O(n²) or higher to O(n), where $n$ is the training data size, leading to practical learning on large-scale datasets. We further prove that this new representation preserves the intra-similarity in each modal. To preserve the inter-similarity among data points across different modals, we transform the derived data representations into a common binary subspace in which binary codes from all the modals are "consistent" and comparable. nThe transformation simultaneously outputs the hash functions for all modals, which are used to convert unseen data into binary codes. Given a query of one modal, it is first mapped into the binary codes using the modal's hash functions, followed by matching the database binary codes of any other modals. Experimental results on two benchmark datasets confirm the scalability and the effectiveness of the proposed approach in comparison with the state of the art.

References

[1]

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003.

[2]

M.M. Bronstein, A.M. Bronstein, F. Michel, and N. Paragios. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In CVPR, pages 3594--3601, 2010.

[3]

R. Chaudhry and Y. Ivanov. Fast approximate nearest neighbor methods for non-euclidean manifolds with applications to human activity analysis in videos. In ECCV, pages 735--748, 2010.

Digital Library

[4]

M. Chen, K. Q. Weinberger, and J. C. Blitzer. Co-training for domain adaptation. In NIPS, pages 1--9, 2011.

[5]

X. Chen and D. Cai. Large scale spectral clustering with landmark-based representation. In AAAI, pages 313--318, 2011.

Digital Library

[6]

T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. Nus-wide: a real-world web image database from national university of singapore. In CIVR, pages 48--56, 2009.

Digital Library

[7]

M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In SOCG, pages 253--262, 2004.

Digital Library

[8]

A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, pages 518--529, 1999.

Digital Library

[9]

J. Goldberger, S.T. Roweis, G.E. Hinton, and R. Salakhutdinov. Neighbourhood components analysis. In NIPS, pages 1--9, 2004.

[10]

Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell., page accepted, 2012.

[11]

R. Gopalan, R. Li, and R. Chellappa. Domain adaptation for object recognition: An unsupervised approach. In ICCV, pages 999--1006, 2011.

Digital Library

[12]

P. Jain, B. Kulis, and K. Grauman. Fast image search for learned metrics. In CVPR, pages 1--8, 2008.

[13]

H. Jegou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. In CVPR, pages 117--128, 2011.

Digital Library

[14]

B. Kulis and T. Darrell. Learning to hash with binary reconstructive embeddings. In NIPS, pages 1042--1050, 2009.

Digital Library

[15]

B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalable image search. In ICCV, pages 2130--2137, 2009.

[16]

S. Kumar and R. Udupa. Learning hash functions for cross-view similarity search. In IJCAI, pages 1360--1365, 2011.

Digital Library

[17]

W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang. Supervised hashing with kernels. In CVPR, pages 2074--2081, 2012.

Digital Library

[18]

W. Liu, J. Wang, S. Kumar, and S.-F. Chang. Hashing with graphs. In ICML, pages 1--8, 2011.

Digital Library

[19]

M. Norouzi and D. J. Fleet. Minimal loss hashing for compact binary codes. In ICML, pages 353--360, 2011.

Digital Library

[20]

M. Norouzi, A. Punjani, and D. J. Fleet. Fast search in hamming space with multi-index hashing. In CVPR, pages 3108--3115, 2012.

Digital Library

[21]

M. Raginsky and S. Lazebnik. Locality-sensitive binary codes from shift-invariant kernels. In NIPS, pages 1509--1517, 2009.

Digital Library

[22]

N. Rasiwasia, J. C. Pereira, E. Coviello, and G. Doyle. A new approach to cross-modal multimedia retrieval. In ACM MM, pages 251--260, 2010.

Digital Library

[23]

S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323--2326, 2000.

[24]

R. Salakhutdinov and G. E. Hinton. Semantic hashing. Int. J. Approx. Reasoning, 50(7):969--978, 2009.

Digital Library

[25]

L. K. Saul and S. T. Roweis. Think globally, fit locally: Unsupervised learning of low dimensional manifold. J.Mach.Learn.Res., 4:119--155, 2003.

Digital Library

[26]

J. Song, Y. Yang, Z. Huang, H. T. Shen, and R. Hong. Multiple feature hashing for real-time large scale near-duplicate video retrieval. In ACM MM, pages 423--432, 2011.

Digital Library

[27]

J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-media hashing for large-scale retrieval from heterogenous data sources. In SIGMOD, pages 785--796, 2013.

Digital Library

[28]

C. Strecha, A. A. Bronstein, M. M. Bronstein, and P. Fua. Ldahash: Improved matching with smaller descriptors. IEEE Trans. Pattern Anal. Mach. Intell., 34(1):66--78, 2012.

Digital Library

[29]

A. Torralba, R. Fergus, and Y. Weiss. Small codes and large image databases for recognition. In CVPR, pages 1--8, 2008.

[30]

J. Wang, O. Kumar, and S.-F. Chang. Semi-supervised hashing for scalable image retrieval. In CVPR, pages 3424--3431, 2010.

[31]

J. Wang, S. Kumar, and S.-F. Chang. Sequential projection learning for hashing with compact codes. In ICML, pages 1127--1134, 2010.

Digital Library

[32]

K. Q. Weinberger, B. D. Packer, and L. K. Saul. Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In AISTATS, pages 381--388, 2005.

[33]

Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, pages 1753--1760, 2008.

Digital Library

[34]

C. Wu, J. Zhu, D. Cai, C. Chen, and J. Bu. Semi-supervised nonlinear hashing using bootstrap sequential projection learning. IEEE Trans. Knowl. Data Eng., 99:1, 2012.

Digital Library

[35]

Y. Yang, D. Xu, F. Nie, J. Luo, and Y. Zhuang. Ranking with local regression and global alignment for cross media retrieval. In ACM MM, pages 175--184, 2009.

Digital Library

[36]

D. Zhang, F. Wang, and L. Si. Composite hashing with multiple information sources. In SIGIR, pages 225--234, 2011.

Digital Library

[37]

Y. Zhen and D.-Y. Yeung. Co-regularized hashing for multimodal data. In NIPS, pages 2559--2567, 2012.

[38]

Y. Zhen and D.-Y. Yeung. A probabilistic model for multimodal hash function learning. In SIGKDD, pages 940--948, 2012.

Digital Library

[39]

X. Zhu, Z. Huang, H. Cheng, J. Cui, and H. T. Shen. Sparse hashing for fast multimedia search. ACM Trans. Inf. Syst., 31(2):509--517, 2013.

Digital Library

Cited By

Zhang BZhang YLi JChen JAkutsu TCheung YCai H(2025)Unsupervised Dual Deep Hashing With Semantic-Index and Content-Code for Cross-Modal RetrievalIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.346713047:1(387-399)Online publication date: Jan-2025
https://doi.org/10.1109/TPAMI.2024.3467130
Lu BZhao TLiang GLi JDuan X(2025)Adversarial Graph Convolutional Network Hashing for Cross-Modal RetrievalWeb and Big Data. APWeb-WAIM 2024 International Workshops10.1007/978-981-96-0055-7_6(69-80)Online publication date: 31-Jan-2025
https://doi.org/10.1007/978-981-96-0055-7_6
Shen XYu GChen YYang XZheng YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Graph Convolutional Semi-Supervised Cross-Modal HashingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680732(5930-5938)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680732
Show More Cited By

Index Terms

Linear cross-modal hashing for efficient multimedia search
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Linear unsupervised hashing for ANN search in Euclidean space

Approximate nearest neighbors (ANN) search for large scale data has attracted considerable attention due to the fact that large amounts of data are easily available. Recently, hashing has been widely adopted for similarity search because of its good ...
Sparse hashing for fast multimedia search

Hash-based methods achieve fast similarity search by representing high-dimensional data with compact binary codes. However, both generating binary codes and encoding unseen data effectively and efficiently remain very challenging tasks. In this article, ...
Data-Aware Proxy Hashing for Cross-modal Retrieval
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Recently, numerous proxy hash code based methods, which sufficiently exploit the label information of data to supervise the training of hashing models, have been proposed. Although these methods have made impressive progress, their generating processes ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '13: Proceedings of the 21st ACM international conference on Multimedia

October 2013

1166 pages

ISBN:9781450324045

DOI:10.1145/2502081

General Chairs:
Alejandro (Alex) Jaimes
Yahoo!, Spain
,
Nicu Sebe
University of Trento, Italy
,
Nozha Boujemaa
INRIA, France
,
Program Chairs:
Daniel Gatica-Perez
IDIAP & EPFL, Switzerland
,
David A. Shamma
Yahoo!, USA
,
Marcel Worring
University of Amsterdam, The Netherlands
,
Roger Zimmermann
National University of Singapore, Singapore

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '13

Sponsor:

SIGMM

MM '13: ACM Multimedia Conference

October 21 - 25, 2013

Barcelona, Spain

Acceptance Rates

MM '13 Paper Acceptance Rate 47 of 235 submissions, 20%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

238
Total Citations
View Citations
1,598
Total Downloads

Downloads (Last 12 months)58
Downloads (Last 6 weeks)9

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang BZhang YLi JChen JAkutsu TCheung YCai H(2025)Unsupervised Dual Deep Hashing With Semantic-Index and Content-Code for Cross-Modal RetrievalIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.346713047:1(387-399)Online publication date: Jan-2025
https://doi.org/10.1109/TPAMI.2024.3467130
Lu BZhao TLiang GLi JDuan X(2025)Adversarial Graph Convolutional Network Hashing for Cross-Modal RetrievalWeb and Big Data. APWeb-WAIM 2024 International Workshops10.1007/978-981-96-0055-7_6(69-80)Online publication date: 31-Jan-2025
https://doi.org/10.1007/978-981-96-0055-7_6
Shen XYu GChen YYang XZheng YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Graph Convolutional Semi-Supervised Cross-Modal HashingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680732(5930-5938)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680732
Wang TLi FZhu LLi JZhang ZShen H(2024)Invisible Black-Box Backdoor Attack against Deep Cross-Modal Hashing RetrievalACM Transactions on Information Systems10.1145/365020542:4(1-27)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1145/3650205
Cai HZhang BLi JHu BChen J(2024)Unsupervised Dual Hashing Coding (UDC) on Semantic Tagging and Sample Content for Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2024.338598626(9109-9120)Online publication date: 2024
https://doi.org/10.1109/TMM.2024.3385986
Qin QHuo YHuang LDai JZhang HZhang W(2024)Deep Neighborhood-Preserving Hashing With Quadratic Spherical Mutual Information for Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2023.334907526(6361-6374)Online publication date: 2024
https://doi.org/10.1109/TMM.2023.3349075
Zhu LZheng CGuan WLi JYang YShen H(2024)Multi-Modal Hashing for Efficient Multimedia Retrieval: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.328292136:1(239-260)Online publication date: Jan-2024
https://doi.org/10.1109/TKDE.2023.3282921
Zhang LChen MTu BLi YXia Y(2024)Retargeting HR Aerial Photos Under Contaminated Labels With Application in Smart NavigationIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.328887725:1(349-358)Online publication date: Jan-2024
https://doi.org/10.1109/TITS.2023.3288877
Wang TLi FZhu LLi JZhang ZShen H(2024)Cross-Modal Retrieval: A Systematic Review of Methods and Future DirectionsProceedings of the IEEE10.1109/JPROC.2024.3525147112:11(1716-1754)Online publication date: Nov-2024
https://doi.org/10.1109/JPROC.2024.3525147
Zhang ZChen Y(2024)Cross-Modal Semantic Embedding Hashing for Unsupervised Retrieval2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651304(1-7)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10651304
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten