Skip to main content

An Empirical Study of a New Approach to Nearest Neighbor Searching

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2153))

Abstract

In nearest neighbor searching we are given a set of n data points in real d-dimensional space, ℜd, and the problem is to preprocess these points into a data structure, so that given a query point, the nearest data point to the query point can be reported efficiently. Because data sets can be quite large, we are interested in data structures that use optimal O(dn) storage.

In this paper we consider a novel approach to nearest neighbor searching, in which the search returns the correct nearest neighbor with a given probability assuming that the queries are drawn from some known distribution. The query distribution is represented by providing a set of training query points at preprocessing time.

The data structure, called the overlapped split tree, is an augmented BSP-tree in which each node is associated with a cover region, which is used to determine whether the search should visit this node. We use principal component analysis and support vector machines to analyze the structure of the data and training points in order to better adapt the tree structure to the data sets. We show empirically that this new approach provides improved predictability over the kd-tree in average query performance.

The support of the National Science Foundation under grant CCR-9712379 is gratefully acknowledged.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Arya and D. M. Mount. Approximate nearest neighbor queries in fixed dimensions. In Proc. 4th ACM-SIAM Sympos. Discrete Algorithms, pages 271–280, 1993.

    Google Scholar 

  2. S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Wu. An optimal algorithm for approximate nearest neighbor searching. In Proc. 5th ACM-SIAM Sympos. Discrete Algorithms, pages 573–582, 1994.

    Google Scholar 

  3. S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Wu. An optimal algorithm for approximate nearest neighbor searching. Journal of the ACM, 45:891–923, 1998.

    Article  MATH  MathSciNet  Google Scholar 

  4. N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An efficient and robust access method for points and rectangles. In Proc. ACM SIGMOD Conf. on Management of Data, pages 322–331, 1990.

    Google Scholar 

  5. S. Berchtold, D. A. Keim, and H.-P. Kriegel. The X-tree: An index structure for high-dimensional data. In Proc. 22nd VLDB Conference, pages 28–39, 1996.

    Google Scholar 

  6. M. Bern. Approximate closest-point queries in high dimensions. Inform. Process. Lett., 45:95–99, 1993.

    Article  MATH  MathSciNet  Google Scholar 

  7. C. J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121–167, 1998.

    Article  Google Scholar 

  8. T. Chan. Approximate nearest neighbor queries revisited. In Proc. 13th Annu. ACM Sympos. Comput. Geom., pages 352–358, 1997.

    Google Scholar 

  9. C. Chang and C. Lin. LIBSVM: Introduction and benchmarks. LIBSVM can be obtained from URL: http://www.csie.ntu.edu.tw/~cjlin/libsvm, 1999.

  10. P. Ciaccia and M. Patella. Using the distance distribution for approximate similarity queries in high-dimensional metric spaces. In Proc. 10th Workshop Database and Expert Systems Applications, pages 200–205, 1999.

    Google Scholar 

  11. K. L. Clarkson. An algorithm for approximate closest-point queries. In Proc. 10th Annu. ACM Sympos. Comput. Geom., pages 160–164, 1994.

    Google Scholar 

  12. S. Cost and S. Salzberg. A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning, 10:57–78, 1993.

    Google Scholar 

  13. T. M. Cover and P. E. Hart. Nearest neighbor pattern classification. IEEE Trans. Inform. Theory, 13:57–67, 1967.

    Article  Google Scholar 

  14. M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf. Computational Geometry: Algorithms and Applications. Springer-Verlag, 1997.

    Google Scholar 

  15. S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. J. Amer. Soc. Inform. Sci., 41(6):391–407, 1990.

    Article  Google Scholar 

  16. L. Devroye and T. J. Wagner. Nearest neighbor methods in discrimination. In P. R. Krishnaiah and L. N. Kanal, editors, Handbook of Statistics, volume 2. North-Holland, 1982.

    Google Scholar 

  17. R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley & Sons, NY, 1973.

    MATH  Google Scholar 

  18. C. Duncan, M. Goodrich, and S. Kobourov. Balanced aspect ratio trees: Combining the advantages of k-d trees and octrees. In Proc. 10th ACM-SIAM Sympos. Discrete Algorithms, pages 300–309, 1999.

    Google Scholar 

  19. Christos Faloutsos and Ibrahim Kamel. Beyond uniformity and independence: Analysis of r-trees using the concept of fractal dimension. In Proc. Annu. ACM Sympos. Principles Database Syst., pages 4–13, 1994.

    Google Scholar 

  20. U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in Knowledge Discovery and Data Mining. AAAI Press/MIT Press, 1996.

    Google Scholar 

  21. M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker. Query by image and video content: The QBIC system. IEEE Computer, 28:23–32, 1995.

    Google Scholar 

  22. J. H. Friedman, J. L. Bentley, and R. A. Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Software, 3(3):209–226, 1977.

    Article  MATH  Google Scholar 

  23. K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, 2nd edition, 1990.

    Google Scholar 

  24. A. Gersho and R. M. Gray. Vector Quantization and Signal Compression. Kluwer Academic, Boston, 1992.

    MATH  Google Scholar 

  25. P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. 30th Annu. ACM Sympos. Theory Comput., pages 604–613, 1998.

    Google Scholar 

  26. N. Katayama and S. Satoh. The SR-tree: An index structure for high-dimensional nearest neighbor queries. In Proc. ACM SIGMOD Conf. on Management of Data, pages 369–380, 1997.

    Google Scholar 

  27. J. M. Kleinberg. Two algorithms for nearest-neighbor search in high dimension. In Proc. 29th Annu. ACM Sympos. Theory Comput., pages 599–608, 1997.

    Google Scholar 

  28. E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimemsional spaces. In Proc. 30th Annu. ACM Sympos. Theory Comput., pages 614–623, 1998.

    Google Scholar 

  29. K. I. Lin, H. V. Jagadish, and C. Faloutsos. The TV-tree: An index structure for high-dimensional data. VLDB Journal, 3(4):517–542, 1994.

    Article  Google Scholar 

  30. D. M. Mount and S. Arya. ANN: A library for approximate nearest neighbor searching. Center for Geometric Computing 2nd Annual Fall Workshop on Computational Geometry, URL: http://www.cs.umd.edu/~mount/ANN, 1997.

  31. F. Murtagh. PC A (principal components analysis): C program. PC A program can be obtained from URL: http://astro.u-strasbg.fr/~fmurtagh/mda-sw/pca.c, 1989.

  32. D. Saupe, 1994. Private communication.

    Google Scholar 

  33. T. Sellis, N. Roussopoulos, and C. Faloutsos. The R+-tree: A dynamic index for multi-dimensional objects. In Proc. 13th VLDB Conference, pages 507–517, 1987.

    Google Scholar 

  34. V. Vapnik. Statistical Learning Theory. John Wiley & Sons, NY, 1998.

    MATH  Google Scholar 

  35. K. Zatloukal, M. H. Johnson, and R. Ladner. Nearest neighbor search for data compression. (Presented at the 6th DIMACS Implementation Challenge Workshop), URL: http://www.cs.washington.edu/homes/ladner/nns.ps, 1999.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Maneewongvatana, S., Mount, D.M. (2001). An Empirical Study of a New Approach to Nearest Neighbor Searching. In: Buchsbaum, A.L., Snoeyink, J. (eds) Algorithm Engineering and Experimentation. ALENEX 2001. Lecture Notes in Computer Science, vol 2153. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44808-X_14

Download citation

  • DOI: https://doi.org/10.1007/3-540-44808-X_14

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42560-1

  • Online ISBN: 978-3-540-44808-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics