Abstract
In the past ten years, new powerful algorithms based on efficient data structures have been proposed to solve the problem of Approximate Nearest Neighbors search (ANN). To find the nearest neighbors in probability-distribution-type data, the existing Locality Sensitive Hashing (LSH) algorithms for vector-type data can be directly used to solve it. However, these methods do not consider the special properties of probability distributions. In this paper, based on the special properties of probability distributions, we present a novel LSH scheme adapted to angular distance for ANN search in high-dimensional probability distributions. We define the specific hashing functions, and prove their local-sensitivity. Also, we propose a Sequential Interleaving algorithm based on the “Unbalance Effect” of Euclidean and angular metrics for probability distributions. Finally, we compare, through experiments, our methods with the state-of-the-art LSH algorithms in the context of ANN on six public image databases. The results prove the proposed algorithms can provide far better accuracy in the context of ANN than baselines.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: 47th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2006, pp. 459–468. IEEE (2006)
Andoni, A., Indyk, P., Nguyen, H.L., Razenshteyn, I.: Beyond locality-sensitive hashing. In: Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1018–1028. SIAM (2014)
Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G.: Syntactic clustering of the web. Comput. Netw. ISDN Syst. 29(8), 1157–1166 (1997)
Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, pp. 380–388. ACM (2002)
Chukllin, A., Schuth, A., Zhou, K., De Rijke, M.: A comparative analysis of interleaving methods for aggregated search. ACM Trans. Inf. Syst 33(2), 5 (2015)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262. ACM (2004)
Eshghi, K., Rajaram, S.: Locality sensitive hash functions based on concomitant rank order statistics. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 221–229. ACM (2008)
Gilbert, S.: Linear algebra and its applications, Thomson, Brooks/Cole, Belmont, CA. Technical report (2006). ISBN 0-030-10567-6
Gong, Y., Lazebnik, S.: Comparing data-dependent and data-independent embeddings for classification and ranking of internet images. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2633–2640. IEEE (2011)
Gorisse, D., Cord, M., Precioso, F.: Locality-sensitive hashing for chi2 distance. IEEE Trans. Pattern Anal. Mach. Intell. 34(2), 402–409 (2012)
Hofmann, K., Whiteson, S., de Rijke, M.: A probabilistic method for inferring preferences from clicks. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 249–258. ACM (2011)
Hofmann, K., Whiteson, S., de Rijke, M.: Estimating interleaved comparison outcomes from historical click data. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1779–1783. ACM (2012)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613. ACM (1998)
Jain, P., Vijayanarasimhan, S., Grauman, K.: Hashing hyperplane queries to near points with applications to large-scale active learning. In: Advances in Neural Information Processing Systems, pp. 928–936 (2010)
Ji, J., Li, J., Yan, S., Tian, Q., Zhang, B.: Min-max hash for Jaccard similarity. In: 2013 IEEE 13th International Conference on Data Mining (ICDM), pp. 301–309. IEEE (2013)
Ji, J., Li, J., Yan, S., Zhang, B., Tian, Q.: Super-bit locality-sensitive hashing. In: Advances in Neural Information Processing Systems, pp. 108–116 (2012)
Jiang, Q.Y., Li, W.J.: Scalable graph hashing with feature transformation. In: IJCAI (2015)
Kong, W., Li, W.J.: Isotropic hashing. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2012)
Kong, W., Li, W.J., Guo, M.: Manhattan hashing for large-scale image retrieval. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 45–54. ACM (2012)
Kulis, B., Grauman, K.: Kernelized locality-sensitive hashing. IEEE Trans. Pattern Anal. Mach. Intell. 34(6), 1092–1104 (2012)
Li, P., Konig, A., Gui, W.: B-bit minwise hashing for estimating three-way similarities. In: Advances in Neural Information Processing Systems, pp. 1387–1395 (2010)
Li, P., Owen, A., Zhang, C.H.: One permutation hashing. In: Advances in Neural Information Processing Systems, pp. 3113–3121 (2012)
Liu, W., Mu, C., Kumar, S., Chang, S.F.: Discrete graph hashing. In: Advances in Neural Information Processing Systems, pp. 3419–3427 (2014)
Liu, Y., Cui, J., Huang, Z., Li, H., Shen, H.T.: SK-LSH: an efficient index structure for approximate nearest neighbor search. Proc. VLDB Endowment 7(9), 745–756 (2014)
Mu, Y., Yan, S.: Non-metric locality-sensitive hashing. In: AAAI (2010)
ODonnell, R., Wu, Y., Zhou, Y.: Optimal lower bounds for locality-sensitive hashing (except when q is tiny). ACM Trans. Comput. Theor. (TOCT) 6(1), 5 (2014)
Terasawa, K., Tanaka, Y.: Spherical LSH for approximate nearest neighbor search on unit hypersphere. In: Dehne, F., Sack, J.-R., Zeh, N. (eds.) WADS 2007. LNCS, vol. 4619, pp. 27–38. Springer, Heidelberg (2007). doi:10.1007/978-3-540-73951-7_4
Wang, J., Shen, H.T., Song, J., Ji, J.: Hashing for similarity search: a survey. arXiv preprint arXiv:1408.2927 (2014)
Zhang, T., Qi, G.J., Tang, J., Wang, J.: Sparse composite quantization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4548–4556 (2015)
Zhao, W.L., Jégou, H., Gravier, G.: Sim-Min-Hash: an efficient matching technique for linking large image collections. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 577–580. ACM (2013)
Acknowledgments
This work was supported by 863 Program (2015AA015404), 973 Program (2013CB329303), China National Science Foundation (61402036, 60973083, 61273363), Beijing Advanced Innovation Center for Imaging Technology (BAICIT-2016007).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tang, YK., Mao, XL., Hao, YJ., Xu, C., Huang, H. (2017). Locality-Sensitive Hashing for Finding Nearest Neighbors in Probability Distributions. In: Cheng, X., Ma, W., Liu, H., Shen, H., Feng, S., Xie, X. (eds) Social Media Processing. SMP 2017. Communications in Computer and Information Science, vol 774. Springer, Singapore. https://doi.org/10.1007/978-981-10-6805-8_1
Download citation
DOI: https://doi.org/10.1007/978-981-10-6805-8_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6804-1
Online ISBN: 978-981-10-6805-8
eBook Packages: Computer ScienceComputer Science (R0)