research-article

A unified approach to learning task-specific bit vector representations for fast nearest neighbor search

Authors:

Sundararajan SellamanickamAuthors Info & Claims

WWW '12: Proceedings of the 21st international conference on World Wide Web

Pages 929 - 938

https://doi.org/10.1145/2187836.2187961

Published: 16 April 2012 Publication History

Abstract

Fast nearest neighbor search is necessary for a variety of large scale web applications such as information retrieval, nearest neighbor classification and nearest neighbor regression. Recently a number of machine learning algorithms have been proposed for representing the data to be searched as (short) bit vectors and then using hashing to do rapid search. These algorithms have been limited in their applicability in that they are suited for only one type of task -- e.g. Spectral Hashing learns bit vector representations for retrieval, but not say, classification. In this paper we present a unified approach to learning bit vector representations for many applications that use nearest neighbor search. The main contribution is a single learning algorithm that can be customized to learn a bit vector representation suited for the task at hand. This broadens the usefulness of bit vector representations to tasks beyond just conventional retrieval. We propose a learning-to-rank formulation to learn the bit vector representation of the data. LambdaRank algorithm is used for learning a function that computes a task-specific bit vector from an input data vector. Our approach outperforms state-of-the-art nearest neighbor methods on a number of real world text and image classification and retrieval datasets. It is scalable and learns a 32-bit representation on 1.46 million training cases in two days.

References

[1]

A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In In FOCS 2006, pages 459--468. IEEE Computer Society, 2006.

Digital Library

[2]

R. Bell, Y. Koren, and C. Volinsky. Modeling relationships at multiple scales to improve accuracy of large recommender systems. In SIGKDD, KDD '07, pages 95--104, New York, NY, USA, 2007. ACM.

Digital Library

[3]

J. A. Blackard and D. J. Dean. Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Computers and Electronics in Agriculture, vol.24:131--151, 1999.

[4]

C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML, ICML '05, pages 89--96, New York, NY, USA, 2005. ACM.

Digital Library

[5]

C. J. Burges, R. Ragno, and Q. V. Le. Learning to rank with nonsmooth cost functions. In B. Schölkopf, J. Platt, and T. Hoffman, editors, NIPS 19, pages 193--200. MIT Press, Cambridge, MA, 2007.

[6]

G. Chechik, V. Sharma, U. Shalit, and S. Bengio. Large scale online learning of image similarity through ranking. J. Mach. Learn. Res., 11:1109--1135, March 2010.

Digital Library

[7]

P. Donmez, K. M. Svore, and C. J. Burges. On the local optimality of lambdarank. In SIGIR, pages 460--467, New York, NY, USA, 2009. ACM.

Digital Library

[8]

A. Globerson and S. T. Roweis. Metric learning by collapsing classes. In NIPS, pages --1--1, 2005.

[9]

J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood components analysis. In NIPS 17, pages 513--520. MIT Press, 2004.

[10]

J. He, R. Radhakrishnan, S.-F. Chang, and C. Bauer. Compact hashing with joint optimization of search accuracy and time. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), June 2011.

Digital Library

[11]

G. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504 -- 507, 2006.

[12]

P. Jain, B. Kulis, and K. Grauman. Fast image search for learned metrics. CVPR, 0:1--8, 2008.

[13]

B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalable image search. In ICCV, 2009.

[14]

D. D. Lewis, Y. Yang, T. G. Rose, F. Li, G. Dietterich, and F. Li. Rcv1: A new benchmark collection for text categorization research. JMLR, 5:361--397, 2004.

Digital Library

[15]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60:91--110, November 2004.

Digital Library

[16]

B. Mcfee and G. Lanckriet. Metric learning to rank. In ICML, 2010.

[17]

A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. IJCV, 42:145--175, 2001.

Digital Library

[18]

C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.

Digital Library

[19]

R. Salakhutdinov and G. Hinton. Semantic Hashing. In SIGIR workshop on Information Retrieval and applications of Graphical Models, 2007.

[20]

N. Shental, T. Hertz, D. Weinshall, and M. Pavel. Adjustment learning and relevant component analysis. In ECCV, ECCV '02, pages 776--792, London, UK, UK, 2002. Springer-Verlag.

Digital Library

[21]

A. Torralba, R. Fergus, and W. T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. PAMI, 30:1958--1970, November 2008.

Digital Library

[22]

K. Q. Weinberger and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. JMLR, 10:207--244, June 2009.

Digital Library

[23]

Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, pages 1753--1760, 2008.

Digital Library

Cited By

Kuo YHsu WKankanhalli MRueger SManmatha RJose Jvan Rijsbergen K(2014)Rank-Preserving and Unsupervised Hash Learning from Auxiliary Contextual CuesProceedings of International Conference on Multimedia Retrieval10.1145/2578726.2578778(399-402)Online publication date: 1-Apr-2014
https://dl.acm.org/doi/10.1145/2578726.2578778
Nettleton D(2013)SurveyComputer Science Review10.1016/j.cosrev.2012.12.0017(1-34)Online publication date: 1-Feb-2013
https://dl.acm.org/doi/10.1016/j.cosrev.2012.12.001
Sengamedu S(2012)Scalable Analytics – Algorithms and SystemsBig Data Analytics10.1007/978-3-642-35542-4_1(1-7)Online publication date: 2012
https://doi.org/10.1007/978-3-642-35542-4_1

Index Terms

A unified approach to learning task-specific bit vector representations for fast nearest neighbor search
1. Information systems
  1. Information retrieval

Recommendations

A Fast Approximate Nearest Neighbor Search Algorithm in the Hamming Space

A fast approximate nearest neighbor search algorithm for the (binary) Hamming space is proposed. The proposed Error Weighted Hashing (EWH) algorithm is up to 20 times faster than the popular locality sensitive hashing (LSH) algorithm and works well even ...
Confirmation Sampling for Exact Nearest Neighbor Search
Similarity Search and Applications
Abstract
Locality-sensitive hashing (LSH), introduced by Indyk and Motwani in STOC ’98, has been an extremely influential framework for nearest neighbor search in high-dimensional data sets. While theoretical work has focused on the approximate nearest ...
Hash Bit Selection Using Markov Process for Approximate Nearest Neighbor Search
MoMM '13: Proceedings of International Conference on Advances in Mobile Computing & Multimedia

Hashing for nearest neighbor search has attracted great attentions in the past years. Many hashing methods have been successfully applied in real-world applications like the mobile product search. The performance of these applications usually highly ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '12: Proceedings of the 21st international conference on World Wide Web

April 2012

1078 pages

ISBN:9781450312295

DOI:10.1145/2187836

General Chairs:
Alain Mille
Université de Lyon, France
,
Fabien Gandon
INRIA, France
,
Jacques Misselis
HP, France
,
Program Chairs:
Michael Rabinovich
Case Western Reserve University, USA
,
Steffen Staab
University of Koblenz-Landau, Germany

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Univ. de Lyon: Universite de Lyon

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 April 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW 2012

Sponsor:

Univ. de Lyon

WWW 2012: 21st World Wide Web Conference 2012

April 16 - 20, 2012

Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
168
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kuo YHsu WKankanhalli MRueger SManmatha RJose Jvan Rijsbergen K(2014)Rank-Preserving and Unsupervised Hash Learning from Auxiliary Contextual CuesProceedings of International Conference on Multimedia Retrieval10.1145/2578726.2578778(399-402)Online publication date: 1-Apr-2014
https://dl.acm.org/doi/10.1145/2578726.2578778
Nettleton D(2013)SurveyComputer Science Review10.1016/j.cosrev.2012.12.0017(1-34)Online publication date: 1-Feb-2013
https://dl.acm.org/doi/10.1016/j.cosrev.2012.12.001
Sengamedu S(2012)Scalable Analytics – Algorithms and SystemsBig Data Analytics10.1007/978-3-642-35542-4_1(1-7)Online publication date: 2012
https://doi.org/10.1007/978-3-642-35542-4_1

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten