skip to main content
10.1145/2502081.2502107acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Linear cross-modal hashing for efficient multimedia search

Published: 21 October 2013 Publication History

Abstract

Most existing cross-modal hashing methods suffer from the scalability issue in the training phase. In this paper, we propose a novel cross-modal hashing approach with a linear time complexity to the training data size, to enable scalable indexing for multimedia search across multiple modals. Taking both the intra-similarity in each modal and the inter-similarity across different modals into consideration, the proposed approach aims at effectively learning hash functions from large-scale training datasets. More specifically, for each modal, we first partition the training data into $k$ clusters and then represent each training data point with its distances to $k$ centroids of the clusters. Interestingly, such a k-dimensional data representation can reduce the time complexity of the training phase from traditional O(n2) or higher to O(n), where $n$ is the training data size, leading to practical learning on large-scale datasets. We further prove that this new representation preserves the intra-similarity in each modal. To preserve the inter-similarity among data points across different modals, we transform the derived data representations into a common binary subspace in which binary codes from all the modals are "consistent" and comparable. nThe transformation simultaneously outputs the hash functions for all modals, which are used to convert unseen data into binary codes. Given a query of one modal, it is first mapped into the binary codes using the modal's hash functions, followed by matching the database binary codes of any other modals. Experimental results on two benchmark datasets confirm the scalability and the effectiveness of the proposed approach in comparison with the state of the art.

References

[1]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003.
[2]
M.M. Bronstein, A.M. Bronstein, F. Michel, and N. Paragios. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In CVPR, pages 3594--3601, 2010.
[3]
R. Chaudhry and Y. Ivanov. Fast approximate nearest neighbor methods for non-euclidean manifolds with applications to human activity analysis in videos. In ECCV, pages 735--748, 2010.
[4]
M. Chen, K. Q. Weinberger, and J. C. Blitzer. Co-training for domain adaptation. In NIPS, pages 1--9, 2011.
[5]
X. Chen and D. Cai. Large scale spectral clustering with landmark-based representation. In AAAI, pages 313--318, 2011.
[6]
T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. Nus-wide: a real-world web image database from national university of singapore. In CIVR, pages 48--56, 2009.
[7]
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In SOCG, pages 253--262, 2004.
[8]
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, pages 518--529, 1999.
[9]
J. Goldberger, S.T. Roweis, G.E. Hinton, and R. Salakhutdinov. Neighbourhood components analysis. In NIPS, pages 1--9, 2004.
[10]
Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell., page accepted, 2012.
[11]
R. Gopalan, R. Li, and R. Chellappa. Domain adaptation for object recognition: An unsupervised approach. In ICCV, pages 999--1006, 2011.
[12]
P. Jain, B. Kulis, and K. Grauman. Fast image search for learned metrics. In CVPR, pages 1--8, 2008.
[13]
H. Jegou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. In CVPR, pages 117--128, 2011.
[14]
B. Kulis and T. Darrell. Learning to hash with binary reconstructive embeddings. In NIPS, pages 1042--1050, 2009.
[15]
B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalable image search. In ICCV, pages 2130--2137, 2009.
[16]
S. Kumar and R. Udupa. Learning hash functions for cross-view similarity search. In IJCAI, pages 1360--1365, 2011.
[17]
W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang. Supervised hashing with kernels. In CVPR, pages 2074--2081, 2012.
[18]
W. Liu, J. Wang, S. Kumar, and S.-F. Chang. Hashing with graphs. In ICML, pages 1--8, 2011.
[19]
M. Norouzi and D. J. Fleet. Minimal loss hashing for compact binary codes. In ICML, pages 353--360, 2011.
[20]
M. Norouzi, A. Punjani, and D. J. Fleet. Fast search in hamming space with multi-index hashing. In CVPR, pages 3108--3115, 2012.
[21]
M. Raginsky and S. Lazebnik. Locality-sensitive binary codes from shift-invariant kernels. In NIPS, pages 1509--1517, 2009.
[22]
N. Rasiwasia, J. C. Pereira, E. Coviello, and G. Doyle. A new approach to cross-modal multimedia retrieval. In ACM MM, pages 251--260, 2010.
[23]
S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323--2326, 2000.
[24]
R. Salakhutdinov and G. E. Hinton. Semantic hashing. Int. J. Approx. Reasoning, 50(7):969--978, 2009.
[25]
L. K. Saul and S. T. Roweis. Think globally, fit locally: Unsupervised learning of low dimensional manifold. J.Mach.Learn.Res., 4:119--155, 2003.
[26]
J. Song, Y. Yang, Z. Huang, H. T. Shen, and R. Hong. Multiple feature hashing for real-time large scale near-duplicate video retrieval. In ACM MM, pages 423--432, 2011.
[27]
J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-media hashing for large-scale retrieval from heterogenous data sources. In SIGMOD, pages 785--796, 2013.
[28]
C. Strecha, A. A. Bronstein, M. M. Bronstein, and P. Fua. Ldahash: Improved matching with smaller descriptors. IEEE Trans. Pattern Anal. Mach. Intell., 34(1):66--78, 2012.
[29]
A. Torralba, R. Fergus, and Y. Weiss. Small codes and large image databases for recognition. In CVPR, pages 1--8, 2008.
[30]
J. Wang, O. Kumar, and S.-F. Chang. Semi-supervised hashing for scalable image retrieval. In CVPR, pages 3424--3431, 2010.
[31]
J. Wang, S. Kumar, and S.-F. Chang. Sequential projection learning for hashing with compact codes. In ICML, pages 1127--1134, 2010.
[32]
K. Q. Weinberger, B. D. Packer, and L. K. Saul. Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In AISTATS, pages 381--388, 2005.
[33]
Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, pages 1753--1760, 2008.
[34]
C. Wu, J. Zhu, D. Cai, C. Chen, and J. Bu. Semi-supervised nonlinear hashing using bootstrap sequential projection learning. IEEE Trans. Knowl. Data Eng., 99:1, 2012.
[35]
Y. Yang, D. Xu, F. Nie, J. Luo, and Y. Zhuang. Ranking with local regression and global alignment for cross media retrieval. In ACM MM, pages 175--184, 2009.
[36]
D. Zhang, F. Wang, and L. Si. Composite hashing with multiple information sources. In SIGIR, pages 225--234, 2011.
[37]
Y. Zhen and D.-Y. Yeung. Co-regularized hashing for multimodal data. In NIPS, pages 2559--2567, 2012.
[38]
Y. Zhen and D.-Y. Yeung. A probabilistic model for multimodal hash function learning. In SIGKDD, pages 940--948, 2012.
[39]
X. Zhu, Z. Huang, H. Cheng, J. Cui, and H. T. Shen. Sparse hashing for fast multimedia search. ACM Trans. Inf. Syst., 31(2):509--517, 2013.

Cited By

View all
  • (2025)Unsupervised Dual Deep Hashing With Semantic-Index and Content-Code for Cross-Modal RetrievalIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.346713047:1(387-399)Online publication date: Jan-2025
  • (2025)Adversarial Graph Convolutional Network Hashing for Cross-Modal RetrievalWeb and Big Data. APWeb-WAIM 2024 International Workshops10.1007/978-981-96-0055-7_6(69-80)Online publication date: 31-Jan-2025
  • (2024)Graph Convolutional Semi-Supervised Cross-Modal HashingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680732(5930-5938)Online publication date: 28-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '13: Proceedings of the 21st ACM international conference on Multimedia
October 2013
1166 pages
ISBN:9781450324045
DOI:10.1145/2502081
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cross-modal
  2. hashing
  3. index
  4. multimedia search

Qualifiers

  • Research-article

Conference

MM '13
Sponsor:
MM '13: ACM Multimedia Conference
October 21 - 25, 2013
Barcelona, Spain

Acceptance Rates

MM '13 Paper Acceptance Rate 47 of 235 submissions, 20%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)58
  • Downloads (Last 6 weeks)9
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Unsupervised Dual Deep Hashing With Semantic-Index and Content-Code for Cross-Modal RetrievalIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.346713047:1(387-399)Online publication date: Jan-2025
  • (2025)Adversarial Graph Convolutional Network Hashing for Cross-Modal RetrievalWeb and Big Data. APWeb-WAIM 2024 International Workshops10.1007/978-981-96-0055-7_6(69-80)Online publication date: 31-Jan-2025
  • (2024)Graph Convolutional Semi-Supervised Cross-Modal HashingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680732(5930-5938)Online publication date: 28-Oct-2024
  • (2024)Invisible Black-Box Backdoor Attack against Deep Cross-Modal Hashing RetrievalACM Transactions on Information Systems10.1145/365020542:4(1-27)Online publication date: 2-Mar-2024
  • (2024)Unsupervised Dual Hashing Coding (UDC) on Semantic Tagging and Sample Content for Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2024.338598626(9109-9120)Online publication date: 2024
  • (2024)Deep Neighborhood-Preserving Hashing With Quadratic Spherical Mutual Information for Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2023.334907526(6361-6374)Online publication date: 2024
  • (2024)Multi-Modal Hashing for Efficient Multimedia Retrieval: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.328292136:1(239-260)Online publication date: Jan-2024
  • (2024)Retargeting HR Aerial Photos Under Contaminated Labels With Application in Smart NavigationIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.328887725:1(349-358)Online publication date: Jan-2024
  • (2024)Cross-Modal Retrieval: A Systematic Review of Methods and Future DirectionsProceedings of the IEEE10.1109/JPROC.2024.3525147112:11(1716-1754)Online publication date: Nov-2024
  • (2024)Cross-Modal Semantic Embedding Hashing for Unsupervised Retrieval2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651304(1-7)Online publication date: 30-Jun-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media