Abstract
Locality Sensitive Hashing (LSH) has been popularly used in content-based search systems. There exist two main categories of LSH methods: one is to index the original data in an effective way to accelerate search process; the other one is to embed the high-dimensional data into hamming space and perform bit-wise operations to search similar objects. In this paper, we propose a new LSH scheme, called Distribution-Aware LSH (DALSH), to address the problem of lacking adaptation to real data, which is the intrinsic limitation of most LSH methods belong to the former category. In DALSH, a given dataset is embedded into a low-dimensional space with projection vectors learned from data, followed by deriving hash functions from the distribution of the dimension-reduced data. We also present a multi-probe strategy to improve the query performance. Experimental comparisons with the state-of-the-art LSH methods on two high-dimensional datasets demonstrate the efficacy of DALSH.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bentley, J.L.: K-d trees for semidynamic point sets. In: Proc. SCG, pp. 187–197 (1990)
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: International Conference on Management of Data, pp. 47–57 (1984)
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: International Conference on Database Theory, pp. 217–235 (1999)
Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM TODS 35(3), 20 (2010)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the International Conference on Very Large Data Bases, pp. 518–529 (1999)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proc. STOC, pp. 604–613 (1998)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proc. SCG, pp. 253–262 (2004)
Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe lsh: efficient indexing for high-dimensional similarity search. In: Proc. VLDB, pp. 950–961 (2007)
Wang, J., Kumar, S., Chang, S.F.: Sequential projection learning for hashing with compact codes. In: Proc. ICML, pp. 1127–1134 (2010)
Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing for scale image search. In: International Conference on Computer Vision, pp. 2130–2137 (2009)
Liu, W., Wang, J., Kumar, S., Chang, S.F.: Hashing with graphs. In: ICML, pp. 1–8 (2011)
Joly, A., Buisson, O.: A posteriori multi-probe locality sensitive hashing. In: ACM MM (2008)
Bawa, M., Condie, T., Ganesan, P.: LSH forest: self-tuning indexes for similarity search. In: Proc. WWW, pp. 651–660 (2005)
Shakhnarovich, G., Darrell, T., Indyk, P.: Nearest-neighbor methods in learning and vision. IEEE Transactions on Neural Networks 19(2), 337 (2008)
Jégou, H., Amsaleg, L., Schmid, C., Gros, P.: Query adaptative locality sensitive hashing. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. 825–828 (2008)
Paulevé, L., Jégou, H., Amsaleg, L.: Locality sensitive hashing: A comparison of hash function types and querying mechanisms. Pattern Recognition Letters 31(11), 1348 (2010)
Salakhutdinov, R., Hinton, G.: Semantic hashing. Int. J. Approx Reason. 50(7), 969 (2009)
Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: NIPS, pp. 1753–1760 (2008)
Raginsky, M., Lazebnik, S.: Locality-sensitive binary codes from shift-invariant kernels. In: Neural Information Processing Systems, pp. 1509–1517 (2009)
Wang, M., Yang, K., Hua, X., Zhang, H.: Towards a relevant and diverse search of social images. IEEE Transactions on Multimedia, 829–842 (2010)
Liu, W., Wang, J., Ji, R., Jiang, Y.G., Chang, S.F.: Supervised hashing with kernels. In: Proc. CVPR, pp. 2074–2081 (2012)
Lejsek, H., Ásmundsson, F.H., Jónsson, B.T., Amsaleg, L.: NV-tree: An efficient disk based index for approximate search in very large high-dimensional collections. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(5), 869 (2009)
Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(1), 117 (2011)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 91–110 (2004)
Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision 42(3), 145 (2001)
Silpa-Anan, C., Hartley, R.: Optimised KD-trees for fast image descriptor matching. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: Int. Conf. Computer Vision Theory and Applications, pp. 331–340 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, L., Zhang, Y., Zhang, D., Tian, Q. (2013). Distribution-Aware Locality Sensitive Hashing. In: Li, S., et al. Advances in Multimedia Modeling. Lecture Notes in Computer Science, vol 7733. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35728-2_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-35728-2_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35727-5
Online ISBN: 978-3-642-35728-2
eBook Packages: Computer ScienceComputer Science (R0)