Skip to main content

Locality-Sensitive Hashing for Finding Nearest Neighbors in Probability Distributions

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 774))

Abstract

In the past ten years, new powerful algorithms based on efficient data structures have been proposed to solve the problem of Approximate Nearest Neighbors search (ANN). To find the nearest neighbors in probability-distribution-type data, the existing Locality Sensitive Hashing (LSH) algorithms for vector-type data can be directly used to solve it. However, these methods do not consider the special properties of probability distributions. In this paper, based on the special properties of probability distributions, we present a novel LSH scheme adapted to angular distance for ANN search in high-dimensional probability distributions. We define the specific hashing functions, and prove their local-sensitivity. Also, we propose a Sequential Interleaving algorithm based on the “Unbalance Effect” of Euclidean and angular metrics for probability distributions. Finally, we compare, through experiments, our methods with the state-of-the-art LSH algorithms in the context of ANN on six public image databases. The results prove the proposed algorithms can provide far better accuracy in the context of ANN than baselines.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: 47th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2006, pp. 459–468. IEEE (2006)

    Google Scholar 

  2. Andoni, A., Indyk, P., Nguyen, H.L., Razenshteyn, I.: Beyond locality-sensitive hashing. In: Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1018–1028. SIAM (2014)

    Google Scholar 

  3. Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the web. Comput. Netw. ISDN Syst. 29(8), 1157–1166 (1997)

    Article  Google Scholar 

  4. Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, pp. 380–388. ACM (2002)

    Google Scholar 

  5. Chukllin, A., Schuth, A., Zhou, K., De Rijke, M.: A comparative analysis of interleaving methods for aggregated search. ACM Trans. Inf. Syst 33(2), 5 (2015)

    Article  Google Scholar 

  6. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262. ACM (2004)

    Google Scholar 

  7. Eshghi, K., Rajaram, S.: Locality sensitive hash functions based on concomitant rank order statistics. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 221–229. ACM (2008)

    Google Scholar 

  8. Gilbert, S.: Linear algebra and its applications, Thomson, Brooks/Cole, Belmont, CA. Technical report (2006). ISBN 0-030-10567-6

    Google Scholar 

  9. Gong, Y., Lazebnik, S.: Comparing data-dependent and data-independent embeddings for classification and ranking of internet images. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2633–2640. IEEE (2011)

    Google Scholar 

  10. Gorisse, D., Cord, M., Precioso, F.: Locality-sensitive hashing for chi2 distance. IEEE Trans. Pattern Anal. Mach. Intell. 34(2), 402–409 (2012)

    Article  Google Scholar 

  11. Hofmann, K., Whiteson, S., de Rijke, M.: A probabilistic method for inferring preferences from clicks. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 249–258. ACM (2011)

    Google Scholar 

  12. Hofmann, K., Whiteson, S., de Rijke, M.: Estimating interleaved comparison outcomes from historical click data. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1779–1783. ACM (2012)

    Google Scholar 

  13. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613. ACM (1998)

    Google Scholar 

  14. Jain, P., Vijayanarasimhan, S., Grauman, K.: Hashing hyperplane queries to near points with applications to large-scale active learning. In: Advances in Neural Information Processing Systems, pp. 928–936 (2010)

    Google Scholar 

  15. Ji, J., Li, J., Yan, S., Tian, Q., Zhang, B.: Min-max hash for Jaccard similarity. In: 2013 IEEE 13th International Conference on Data Mining (ICDM), pp. 301–309. IEEE (2013)

    Google Scholar 

  16. Ji, J., Li, J., Yan, S., Zhang, B., Tian, Q.: Super-bit locality-sensitive hashing. In: Advances in Neural Information Processing Systems, pp. 108–116 (2012)

    Google Scholar 

  17. Jiang, Q.Y., Li, W.J.: Scalable graph hashing with feature transformation. In: IJCAI (2015)

    Google Scholar 

  18. Kong, W., Li, W.J.: Isotropic hashing. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2012)

    Google Scholar 

  19. Kong, W., Li, W.J., Guo, M.: Manhattan hashing for large-scale image retrieval. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 45–54. ACM (2012)

    Google Scholar 

  20. Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing. IEEE Trans. Pattern Anal. Mach. Intell. 34(6), 1092–1104 (2012)

    Article  Google Scholar 

  21. Li, P., Konig, A., Gui, W.: B-bit minwise hashing for estimating three-way similarities. In: Advances in Neural Information Processing Systems, pp. 1387–1395 (2010)

    Google Scholar 

  22. Li, P., Owen, A., Zhang, C.H.: One permutation hashing. In: Advances in Neural Information Processing Systems, pp. 3113–3121 (2012)

    Google Scholar 

  23. Liu, W., Mu, C., Kumar, S., Chang, S.F.: Discrete graph hashing. In: Advances in Neural Information Processing Systems, pp. 3419–3427 (2014)

    Google Scholar 

  24. Liu, Y., Cui, J., Huang, Z., Li, H., Shen, H.T.: SK-LSH: an efficient index structure for approximate nearest neighbor search. Proc. VLDB Endowment 7(9), 745–756 (2014)

    Article  Google Scholar 

  25. Mu, Y., Yan, S.: Non-metric locality-sensitive hashing. In: AAAI (2010)

    Google Scholar 

  26. ODonnell, R., Wu, Y., Zhou, Y.: Optimal lower bounds for locality-sensitive hashing (except when q is tiny). ACM Trans. Comput. Theor. (TOCT) 6(1), 5 (2014)

    Google Scholar 

  27. Terasawa, K., Tanaka, Y.: Spherical LSH for approximate nearest neighbor search on unit hypersphere. In: Dehne, F., Sack, J.-R., Zeh, N. (eds.) WADS 2007. LNCS, vol. 4619, pp. 27–38. Springer, Heidelberg (2007). doi:10.1007/978-3-540-73951-7_4

    Chapter  Google Scholar 

  28. Wang, J., Shen, H.T., Song, J., Ji, J.: Hashing for similarity search: a survey. arXiv preprint arXiv:1408.2927 (2014)

  29. Zhang, T., Qi, G.J., Tang, J., Wang, J.: Sparse composite quantization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4548–4556 (2015)

    Google Scholar 

  30. Zhao, W.L., Jégou, H., Gravier, G.: Sim-Min-Hash: an efficient matching technique for linking large image collections. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 577–580. ACM (2013)

    Google Scholar 

Download references

Acknowledgments

This work was supported by 863 Program (2015AA015404), 973 Program (2013CB329303), China National Science Foundation (61402036, 60973083, 61273363), Beijing Advanced Innovation Center for Imaging Technology (BAICIT-2016007).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xian-Ling Mao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Tang, YK., Mao, XL., Hao, YJ., Xu, C., Huang, H. (2017). Locality-Sensitive Hashing for Finding Nearest Neighbors in Probability Distributions. In: Cheng, X., Ma, W., Liu, H., Shen, H., Feng, S., Xie, X. (eds) Social Media Processing. SMP 2017. Communications in Computer and Information Science, vol 774. Springer, Singapore. https://doi.org/10.1007/978-981-10-6805-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6805-8_1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6804-1

  • Online ISBN: 978-981-10-6805-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics