Skip to main content

Nearest Neighbor Search on Vertically Partitioned High-Dimensional Data

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3589))

Included in the following conference series:

  • 1585 Accesses

Abstract

In this paper, we present a new approach to indexing multidimensional data that is particularly suitable for the efficient incremental processing of nearest neighbor queries. The basic idea is to use index-striping that vertically splits the data space into multiple low- and medium-dimensional data spaces. The data from each of these lower-dimensional subspaces is organized by using a standard multi-dimensional index structure. In order to perform incremental NN-queries on top of index-striping efficiently, we first develop an algorithm for merging the results received from the underlying indexes. Then, an accurate cost model relying on a power law is presented that determines an appropriate number of indexes. Moreover, we consider the problem of dimension assignment, where each dimension is assigned to a lower-dimensional subspace, such that the cost of nearest neighbor queries is minimized. Our experiments confirm the validity of our cost model and evaluate the performance of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Belussi, A., Faloutsos, C.: Self-Spatial Join Selectivity Estimation Using Fractal Concepts. ACM Transactions on Information Systems (TOIS) 16(2), 161–201 (1998)

    Article  Google Scholar 

  2. Berchtold, S., Böhm, C., Keim, D., Kriegel, H.-P., Xu, X.: Optimal multidimensional query processing using tree striping. In: Int. Conf. on Data Warehousing and Knowledge Discovery, pp. 244–257 (2000)

    Google Scholar 

  3. Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Int. Conf. on Knowledge discovery and data mining, ACM SIGKDD, pp. 245–250 (2001)

    Google Scholar 

  4. Böhm, C.: A cost model for query processing in high dimensional data spaces. ACM Transactions on Database Systems (TODS) 25(2), 129–178 (2000)

    Article  Google Scholar 

  5. Böhm, C., Berchtold, S., Keim, D.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys 33, 322–373 (2001)

    Article  Google Scholar 

  6. Böhm, K., Mlivoncic, M., Schek, H.-J., Weber, R.: Fast Evaluation Techniques for Complex Similarity Queries. In: Int. Conf. on Very Large Databases (VLDB), pp. 211–220 (2001)

    Google Scholar 

  7. Chakrabarti, K., Mehrotra, S.: Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces. In: Int. Conf. on Very Large Databases (VLDB), pp. 89–100 (2000)

    Google Scholar 

  8. Chaudhuri, S., Gravano, L.: Evaluating Top-k Selection Queries. In: Int. Conf. on Very Large Databases (VLDB), pp. 397–410 (1999)

    Google Scholar 

  9. Ciaccia, P., Patella, M., Zezula, P.: Processing complex similarity queries with distance-based access methods. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 9–23. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  10. Fagin, R., Kumar, R., Sivakumar, D.: Efficient similarity search and classification via rank aggregation. In: ACM Symp. on Principles of Database Systems, pp. 301–312 (2003)

    Google Scholar 

  11. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences 66(4), 614–656 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  12. Gaede, V., Gunther, O.: Multidimensional Access Methods. ACM Computing Surveys 30, 170–231 (1998)

    Article  Google Scholar 

  13. Güntzer, U., Balker, W.-T., Kiessling, W.: Optimizing multi-feature queries in image databases. In: Int. Conf. on Very Large Databases (VLDB), pp. 419–428 (2000)

    Google Scholar 

  14. Henrich, A.: A Distance Scan Algorithm for Spatial Access Structures. In: ACM-GIS, pp. 136–143 (1994)

    Google Scholar 

  15. Hinneburg, A., Aggarwal, C., Keim, D.: What Is the Nearest Neighbor in High Dimensional Spaces? In: Int. Conf. on Very Large Databases (VLDB), pp. 506–515 (2000)

    Google Scholar 

  16. Hjaltason, G.R., Samet, H.: Ranking in Spatial Databases. Advances in Spatial Databases. In: Egenhofer, M.J., Herring, J.R. (eds.) SSD 1995. LNCS, vol. 951, pp. 83–95. Springer, Heidelberg (1995)

    Google Scholar 

  17. Korn, F., Pagel, B.-U., Faloutsos, C.: On the “Dimensionality Curse” and the “Self-Similarity Blessing”. IEEE Transactions on Knowledge and Data Engineering 13(1) (2001)

    Google Scholar 

  18. Korn, F., Sidiropoulos, N., Faloutsos, C., Siegel, E., Protopapas, Z.: Fast nearest neighbor search in medical image databases. In: Int. Conf. on Very Large Databases (VLDB), pp. 215–226 (1996)

    Google Scholar 

  19. Pagel, B.-U., Korn, F., Faloutsos, C.: Deflating the Dimensionality Curse Using Multiple Fractal Dimensions. In: Int. Conf. on Data Engineering (ICDE), pp. 589–598 (2000)

    Google Scholar 

  20. Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: ACM SIGMOD, pp. 71–79 (1995)

    Google Scholar 

  21. Seidl, T., Kriegel, H.-P.: Optimal Multi-Step k-Nearest Neighbor Search. In: ACM SIGMOD, pp. 154–165 (1998)

    Google Scholar 

  22. Tao, Y., Faloutsos, C., Papadias, D.: The power-method: a comprehensive estimation technique for multi-dimensional queries. In: ACM CIKM, Information and Knowledge Management, pp. 83–90 (2003)

    Google Scholar 

  23. Tao, Y., Zhang, J., Papadias, D., Mamoulis, N.: An Efficient Cost Model for Optimization of Nearest Neighbor Search in Low and Medium Dimensional Spaces. IEEE TKDE 16(10) (2004)

    Google Scholar 

  24. Theodoridis, Y., Sellis, T.: A Model for the Prediction of R-tree Performance. In: ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, pp. 161–171 (1996)

    Google Scholar 

  25. Traina Jr, C., Traina, A.J.M., Wu, L., Faloutsos, C.: Fast Feature Selection Using Fractal Dimension. In: SBBD 2000, pp. 158–171.

    Google Scholar 

  26. Weber, R., Schek, H.-J., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: Int. Conf. Very Large Databases (VLDB), pp. 194–205 (1998)

    Google Scholar 

  27. Yu, C., Bressan, S., Ooi, B.C., Tan, K.-L.: Quering high-dimensional data in single-dimensional space. The VLDB Journal 13, 105–119 (2004)

    Article  Google Scholar 

  28. Yu, C., Ooi, B.C., Tan, K.-L., Jagadish, H.V.: Indexing the distance: an efficient method to KNN processing. In: Int. Conf. Very Large Databases (VLDB), pp. 421–430 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dellis, E., Seeger, B., Vlachou, A. (2005). Nearest Neighbor Search on Vertically Partitioned High-Dimensional Data. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2005. Lecture Notes in Computer Science, vol 3589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11546849_24

Download citation

  • DOI: https://doi.org/10.1007/11546849_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28558-8

  • Online ISBN: 978-3-540-31732-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics