Abstract
In this paper, we present a new approach to indexing multidimensional data that is particularly suitable for the efficient incremental processing of nearest neighbor queries. The basic idea is to use index-striping that vertically splits the data space into multiple low- and medium-dimensional data spaces. The data from each of these lower-dimensional subspaces is organized by using a standard multi-dimensional index structure. In order to perform incremental NN-queries on top of index-striping efficiently, we first develop an algorithm for merging the results received from the underlying indexes. Then, an accurate cost model relying on a power law is presented that determines an appropriate number of indexes. Moreover, we consider the problem of dimension assignment, where each dimension is assigned to a lower-dimensional subspace, such that the cost of nearest neighbor queries is minimized. Our experiments confirm the validity of our cost model and evaluate the performance of our approach.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Belussi, A., Faloutsos, C.: Self-Spatial Join Selectivity Estimation Using Fractal Concepts. ACM Transactions on Information Systems (TOIS) 16(2), 161–201 (1998)
Berchtold, S., Böhm, C., Keim, D., Kriegel, H.-P., Xu, X.: Optimal multidimensional query processing using tree striping. In: Int. Conf. on Data Warehousing and Knowledge Discovery, pp. 244–257 (2000)
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Int. Conf. on Knowledge discovery and data mining, ACM SIGKDD, pp. 245–250 (2001)
Böhm, C.: A cost model for query processing in high dimensional data spaces. ACM Transactions on Database Systems (TODS) 25(2), 129–178 (2000)
Böhm, C., Berchtold, S., Keim, D.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys 33, 322–373 (2001)
Böhm, K., Mlivoncic, M., Schek, H.-J., Weber, R.: Fast Evaluation Techniques for Complex Similarity Queries. In: Int. Conf. on Very Large Databases (VLDB), pp. 211–220 (2001)
Chakrabarti, K., Mehrotra, S.: Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces. In: Int. Conf. on Very Large Databases (VLDB), pp. 89–100 (2000)
Chaudhuri, S., Gravano, L.: Evaluating Top-k Selection Queries. In: Int. Conf. on Very Large Databases (VLDB), pp. 397–410 (1999)
Ciaccia, P., Patella, M., Zezula, P.: Processing complex similarity queries with distance-based access methods. In: Schek, H.-J., Saltor, F., Ramos, I., Alonso, G. (eds.) EDBT 1998. LNCS, vol. 1377, pp. 9–23. Springer, Heidelberg (1998)
Fagin, R., Kumar, R., Sivakumar, D.: Efficient similarity search and classification via rank aggregation. In: ACM Symp. on Principles of Database Systems, pp. 301–312 (2003)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences 66(4), 614–656 (2003)
Gaede, V., Gunther, O.: Multidimensional Access Methods. ACM Computing Surveys 30, 170–231 (1998)
Güntzer, U., Balker, W.-T., Kiessling, W.: Optimizing multi-feature queries in image databases. In: Int. Conf. on Very Large Databases (VLDB), pp. 419–428 (2000)
Henrich, A.: A Distance Scan Algorithm for Spatial Access Structures. In: ACM-GIS, pp. 136–143 (1994)
Hinneburg, A., Aggarwal, C., Keim, D.: What Is the Nearest Neighbor in High Dimensional Spaces? In: Int. Conf. on Very Large Databases (VLDB), pp. 506–515 (2000)
Hjaltason, G.R., Samet, H.: Ranking in Spatial Databases. Advances in Spatial Databases. In: Egenhofer, M.J., Herring, J.R. (eds.) SSD 1995. LNCS, vol. 951, pp. 83–95. Springer, Heidelberg (1995)
Korn, F., Pagel, B.-U., Faloutsos, C.: On the “Dimensionality Curse” and the “Self-Similarity Blessing”. IEEE Transactions on Knowledge and Data Engineering 13(1) (2001)
Korn, F., Sidiropoulos, N., Faloutsos, C., Siegel, E., Protopapas, Z.: Fast nearest neighbor search in medical image databases. In: Int. Conf. on Very Large Databases (VLDB), pp. 215–226 (1996)
Pagel, B.-U., Korn, F., Faloutsos, C.: Deflating the Dimensionality Curse Using Multiple Fractal Dimensions. In: Int. Conf. on Data Engineering (ICDE), pp. 589–598 (2000)
Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: ACM SIGMOD, pp. 71–79 (1995)
Seidl, T., Kriegel, H.-P.: Optimal Multi-Step k-Nearest Neighbor Search. In: ACM SIGMOD, pp. 154–165 (1998)
Tao, Y., Faloutsos, C., Papadias, D.: The power-method: a comprehensive estimation technique for multi-dimensional queries. In: ACM CIKM, Information and Knowledge Management, pp. 83–90 (2003)
Tao, Y., Zhang, J., Papadias, D., Mamoulis, N.: An Efficient Cost Model for Optimization of Nearest Neighbor Search in Low and Medium Dimensional Spaces. IEEE TKDE 16(10) (2004)
Theodoridis, Y., Sellis, T.: A Model for the Prediction of R-tree Performance. In: ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, pp. 161–171 (1996)
Traina Jr, C., Traina, A.J.M., Wu, L., Faloutsos, C.: Fast Feature Selection Using Fractal Dimension. In: SBBD 2000, pp. 158–171.
Weber, R., Schek, H.-J., Blott, S.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: Int. Conf. Very Large Databases (VLDB), pp. 194–205 (1998)
Yu, C., Bressan, S., Ooi, B.C., Tan, K.-L.: Quering high-dimensional data in single-dimensional space. The VLDB Journal 13, 105–119 (2004)
Yu, C., Ooi, B.C., Tan, K.-L., Jagadish, H.V.: Indexing the distance: an efficient method to KNN processing. In: Int. Conf. Very Large Databases (VLDB), pp. 421–430 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dellis, E., Seeger, B., Vlachou, A. (2005). Nearest Neighbor Search on Vertically Partitioned High-Dimensional Data. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2005. Lecture Notes in Computer Science, vol 3589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11546849_24
Download citation
DOI: https://doi.org/10.1007/11546849_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28558-8
Online ISBN: 978-3-540-31732-6
eBook Packages: Computer ScienceComputer Science (R0)