Nearest Neighbor Search on Vertically Partitioned High-Dimensional Data

Dellis, Evangelos; Seeger, Bernhard; Vlachou, Akrivi

doi:10.1007/11546849_24

Evangelos Dellis¹⁸,
Bernhard Seeger¹⁸ &
Akrivi Vlachou¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3589))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

1585 Accesses

Abstract

In this paper, we present a new approach to indexing multidimensional data that is particularly suitable for the efficient incremental processing of nearest neighbor queries. The basic idea is to use index-striping that vertically splits the data space into multiple low- and medium-dimensional data spaces. The data from each of these lower-dimensional subspaces is organized by using a standard multi-dimensional index structure. In order to perform incremental NN-queries on top of index-striping efficiently, we first develop an algorithm for merging the results received from the underlying indexes. Then, an accurate cost model relying on a power law is presented that determines an appropriate number of indexes. Moreover, we consider the problem of dimension assignment, where each dimension is assigned to a lower-dimensional subspace, such that the cost of nearest neighbor queries is minimized. Our experiments confirm the validity of our cost model and evaluate the performance of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Indexability-Based Dataset Partitioning

High-dimensional similarity searches using query driven dynamic quantization and distributed indexing

Article 11 April 2019

Efficient Representation of Multidimensional Data over Hierarchical Domains

References

Belussi, A., Faloutsos, C.: Self-Spatial Join Selectivity Estimation Using Fractal Concepts. ACM Transactions on Information Systems (TOIS) 16(2), 161–201 (1998)
Article Google Scholar
Berchtold, S., Böhm, C., Keim, D., Kriegel, H.-P., Xu, X.: Optimal multidimensional query processing using tree striping. In: Int. Conf. on Data Warehousing and Knowledge Discovery, pp. 244–257 (2000)
Google Scholar
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Int. Conf. on Knowledge discovery and data mining, ACM SIGKDD, pp. 245–250 (2001)
Google Scholar
Böhm, C.: A cost model for query processing in high dimensional data spaces. ACM Transactions on Database Systems (TODS) 25(2), 129–178 (2000)
Article Google Scholar
Böhm, C., Berchtold, S., Keim, D.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys 33, 322–373 (2001)
Article Google Scholar
Böhm, K., Mlivoncic, M., Schek, H.-J., Weber, R.: Fast Evaluation Techniques for Complex Similarity Queries. In: Int. Conf. on Very Large Databases (VLDB), pp. 211–220 (2001)
Google Scholar
Chakrabarti, K., Mehrotra, S.: Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces. In: Int. Conf. on Very Large Databases (VLDB), pp. 89–100 (2000)
Google Scholar
Chaudhuri, S., Gravano, L.: Evaluating Top-k Selection Queries. In: Int. Conf. on Very Large Databases (VLDB), pp. 397–410 (1999)
Google Scholar
Ciaccia, P., Patella, M., Zezula, P.: Processing complex similarity queries with distance-based access methods. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 9–23. Springer, Heidelberg (1998)
Chapter Google Scholar
Fagin, R., Kumar, R., Sivakumar, D.: Efficient similarity search and classification via rank aggregation. In: ACM Symp. on Principles of Database Systems, pp. 301–312 (2003)
Google Scholar
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences 66(4), 614–656 (2003)
Article MATH MathSciNet Google Scholar
Gaede, V., Gunther, O.: Multidimensional Access Methods. ACM Computing Surveys 30, 170–231 (1998)
Article Google Scholar
Güntzer, U., Balker, W.-T., Kiessling, W.: Optimizing multi-feature queries in image databases. In: Int. Conf. on Very Large Databases (VLDB), pp. 419–428 (2000)
Google Scholar
Henrich, A.: A Distance Scan Algorithm for Spatial Access Structures. In: ACM-GIS, pp. 136–143 (1994)
Google Scholar
Hinneburg, A., Aggarwal, C., Keim, D.: What Is the Nearest Neighbor in High Dimensional Spaces? In: Int. Conf. on Very Large Databases (VLDB), pp. 506–515 (2000)
Google Scholar
Hjaltason, G.R., Samet, H.: Ranking in Spatial Databases. Advances in Spatial Databases. In: Egenhofer, M.J., Herring, J.R. (eds.) SSD 1995. LNCS, vol. 951, pp. 83–95. Springer, Heidelberg (1995)
Google Scholar
Korn, F., Pagel, B.-U., Faloutsos, C.: On the “Dimensionality Curse” and the “Self-Similarity Blessing”. IEEE Transactions on Knowledge and Data Engineering 13(1) (2001)
Google Scholar
Korn, F., Sidiropoulos, N., Faloutsos, C., Siegel, E., Protopapas, Z.: Fast nearest neighbor search in medical image databases. In: Int. Conf. on Very Large Databases (VLDB), pp. 215–226 (1996)
Google Scholar
Pagel, B.-U., Korn, F., Faloutsos, C.: Deflating the Dimensionality Curse Using Multiple Fractal Dimensions. In: Int. Conf. on Data Engineering (ICDE), pp. 589–598 (2000)
Google Scholar
Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: ACM SIGMOD, pp. 71–79 (1995)
Google Scholar
Seidl, T., Kriegel, H.-P.: Optimal Multi-Step k-Nearest Neighbor Search. In: ACM SIGMOD, pp. 154–165 (1998)
Google Scholar
Tao, Y., Faloutsos, C., Papadias, D.: The power-method: a comprehensive estimation technique for multi-dimensional queries. In: ACM CIKM, Information and Knowledge Management, pp. 83–90 (2003)
Google Scholar
Tao, Y., Zhang, J., Papadias, D., Mamoulis, N.: An Efficient Cost Model for Optimization of Nearest Neighbor Search in Low and Medium Dimensional Spaces. IEEE TKDE 16(10) (2004)
Google Scholar
Theodoridis, Y., Sellis, T.: A Model for the Prediction of R-tree Performance. In: ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, pp. 161–171 (1996)
Google Scholar
Traina Jr, C., Traina, A.J.M., Wu, L., Faloutsos, C.: Fast Feature Selection Using Fractal Dimension. In: SBBD 2000, pp. 158–171.
Google Scholar
Weber, R., Schek, H.-J., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: Int. Conf. Very Large Databases (VLDB), pp. 194–205 (1998)
Google Scholar
Yu, C., Bressan, S., Ooi, B.C., Tan, K.-L.: Quering high-dimensional data in single-dimensional space. The VLDB Journal 13, 105–119 (2004)
Article Google Scholar
Yu, C., Ooi, B.C., Tan, K.-L., Jagadish, H.V.: Indexing the distance: an efficient method to KNN processing. In: Int. Conf. Very Large Databases (VLDB), pp. 421–430 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Straße, 35032, Germany
Evangelos Dellis, Bernhard Seeger & Akrivi Vlachou

Authors

Evangelos Dellis
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Seeger
View author publications
You can also search for this author in PubMed Google Scholar
Akrivi Vlachou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstr. 9-11/188, A-1040, Wien, Austria
A Min Tjoa
Department of Software and Computing Systems, University of Alicante, Spain
Juan Trujillo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dellis, E., Seeger, B., Vlachou, A. (2005). Nearest Neighbor Search on Vertically Partitioned High-Dimensional Data. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2005. Lecture Notes in Computer Science, vol 3589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11546849_24

Download citation

DOI: https://doi.org/10.1007/11546849_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28558-8
Online ISBN: 978-3-540-31732-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Nearest Neighbor Search on Vertically Partitioned High-Dimensional Data

Abstract

Access this chapter

Preview

Similar content being viewed by others

Indexability-Based Dataset Partitioning

High-dimensional similarity searches using query driven dynamic quantization and distributed indexing

Efficient Representation of Multidimensional Data over Hierarchical Domains

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Nearest Neighbor Search on Vertically Partitioned High-Dimensional Data

Abstract

Access this chapter

Preview

Similar content being viewed by others

Indexability-Based Dataset Partitioning

High-dimensional similarity searches using query driven dynamic quantization and distributed indexing

Efficient Representation of Multidimensional Data over Hierarchical Domains

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation