Abstract
Furthest Neighbor search in high-dimensional space has been widely used in many applications such as recommendation systems. Because of the “curse of dimensionality” problem, c-approximate furthest neighbor (C-AFN) is a substitute as a trade-off between result accuracy and efficiency. However, most of the current techniques for external memory are only suitable for low-dimensional space.
In this paper, we propose a novel algorithm called reverse incremental LSH based on Indyk’s LSH scheme to solve the problem with theoretical guarantee. Unlike the previous methods using hashing scheme, reverse incremental LSH (RI-LSH) is designed for external memory and can achieve a good performance on I/O cost. We provide rigorous theoretical analysis to prove that RI-LSH can return a \(c\)-AFN result with a constant possibility. Our comprehensive experiment results show that, compared with other \(c\)-AFN methods with theoretical guarantee, our algorithm can achieve better I/O efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agarwal, P.K., Matoušek, J., Suri, S.: Farthest neighbors, maximum spanning trees and related problems in higher dimensions. Comput. Geom. 1(4), 189–201 (1992)
Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: ACM SIGMOD Record, vol. 19, pp. 322–331. ACM (1990)
Bentley, J.L.: Multidimensional binary search trees in database applications. IEEE Trans. Softw. Eng. 4, 333–340 (1979)
Bespamyatnikh, S.: Dynamic algorithms for approximate neighbor searching. In: CCCG, pp. 252–257 (1996)
Curtin, R.R., et al.: MLPACK: a scalable C++ machine learning library. J. Mach. Learn. Res. 14, 801–805 (2013)
Curtin, R.R., Gardner, A.B.: Fast approximate furthest neighbors with data-dependent hashing. arXiv preprint arXiv:1605.09784 (2016)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 253–262. ACM (2004)
Gan, J., Feng, J., Fang, Q., Ng, W.: Locality-sensitive hashing scheme based on dynamic collision counting. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 541–552. ACM (2012)
Huang, Q., Feng, J., Fang, Q., Ng, W.: Two efficient hashing schemes for high-dimensional furthest neighbor search. IEEE Trans. Knowl. Data Eng. 29(12), 2772–2785 (2017)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613. ACM (1998)
Pagh, R., Silvestri, F., Sivertsen, J., Skala, M.: Approximate furthest neighbor with application to annulus query. Inf. Syst. 64, 152–162 (2017)
Said, A., Fields, B., Jain, B.J., Albayrak, S.: User-centric evaluation of a k-furthest neighbor collaborative filtering recommender algorithm. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 1399–1408. ACM (2013)
Said, A., Kille, B., Jain, B.J., Albayrak, S.: Increasing diversity through furthest neighbor-based recommendation. In: Proceedings of the WSDM 2012 (2012)
Vasiloglou, N., Gray, A.G., Anderson, D.V.: Scalable semidefinite manifold learning. In: 2008 IEEE Workshop on Machine Learning for Signal Processing, pp. 368–373. IEEE (2008)
Yao, B., Li, F., Kumar, P.: Reverse furthest neighbors in spatial databases. In: 2009 IEEE 25th International Conference on Data Engineering, pp. 664–675. IEEE (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, W., Wang, H., Zhang, Y., Qin, L., Zhang, W. (2020). I/O Efficient Algorithm for c-Approximate Furthest Neighbor Search in High-Dimensional Space. In: Nah, Y., Cui, B., Lee, SW., Yu, J.X., Moon, YS., Whang, S.E. (eds) Database Systems for Advanced Applications. DASFAA 2020. Lecture Notes in Computer Science(), vol 12114. Springer, Cham. https://doi.org/10.1007/978-3-030-59419-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-59419-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59418-3
Online ISBN: 978-3-030-59419-0
eBook Packages: Computer ScienceComputer Science (R0)