Abstract
This paper introduces a k-NN search index, the Rank Cover Tree (RCT), whose pruning tests rely solely on the comparison of similarity values; other properties of the underlying space, such as the triangle inequality, are not employed. A formal theoretical analysis shows that with very high probability, the RCT returns a correct query result in time that depends competitively on a measure of the intrinsic dimensionality of the data set. Experiments show that the RCT is capable of meeting or exceeding the level of performance of state-of-the-art methods that make use of metric pruning or selection tests involving numerical constraints on distance values.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: ICML 2006: Proc. 23rd Intern. Conf. on Machine Learning, pp. 97–104 (2006)
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Proc. 23rd Intern. Conf. on Very Large Data Bases, VLDB 1997, pp. 426–435 (1997)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB 1999: Proc. 25th Intern. Conf. on Very Large Data Bases, pp. 518–529 (1999)
Goyal, N., Lifshits, Y., Schütze, H.: Disorder inequality: a combinatorial approach to nearest neighbor search. In: WSDM 2008: Proc. Intern. Conf. on Web Search and Web Data Mining, pp. 25–32 (2008)
Houle, M.E., Sakuma, J.: Fast approximate similarity search in extremely high-dimensional data sets. In: ICDE 2005: Proc. 21st Intern. Conf. on Data Engineering, pp. 619–630 (2005)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC 1998: Proc. 30th ACM Symp. on Theory of Computing, pp. 604–613 (1998)
Karger, D.R., Ruhl, M.: Finding nearest neighbors in growth-restricted metrics. In: STOC 2002: Proc. 34th ACM Symp. on Theory of Computing, pp. 741–750 (2002)
Lifshits, Y., Zhang, S.: Combinatorial algorithms for nearest neighbors, near-duplicates and small-world design. In: SODA, pp. 318–326 (2009)
Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, New York (1995)
Mount, D.M., Arya, S.: ANN: A library for approximate nearest neighbor searching (2010)
Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: International Conference on Computer Vision Theory and Application (VISAPP 2009), pp. 331–340. INSTICC Press (2009)
Navarro, G.: Searching in metric spaces by spatial approximation. In: SPIRE 1999: Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware, p. 141. IEEE Computer Society, Washington, DC (1999)
Pestov, V.: On the geometry of similarity search: dimensionality curse and concentration of measure. Inf. Process. Lett. 73(1-2), 47–51 (2000)
Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Francisco (2006)
Ye, N.: The Handbook of Data Mining. Lawrence Erlbaum (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Houle, M.E., Nett, M. (2013). Rank Cover Trees for Nearest Neighbor Search. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds) Similarity Search and Applications. SISAP 2013. Lecture Notes in Computer Science, vol 8199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41062-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-41062-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41061-1
Online ISBN: 978-3-642-41062-8
eBook Packages: Computer ScienceComputer Science (R0)