Abstract
Indexing high dimensional datasets has attracted extensive attention from many researchers in the last decade. Since R-tree type of index structures are known as suffering “curse of dimensionality” problems, Pyramid-tree type of index structures, which are based on the B-tree, have been proposed to break the curse of dimensionality. However, for high dimensional data, the number of pyramids is often insufficient to discriminate data points when the number of dimensions is high. Its effectiveness degrades dramatically with the increase of dimensionality. In this paper, we focus on one particular issue of “curse of dimensionality”; that is, the surface of a hypercube in a high dimensional space approaches 100% of the total hypercube volume when the number of dimensions approaches infinite. We propose a new indexing method based on the surface of dimensionality. We prove that the Pyramid tree technology is a special case of our method. The results of our experiments demonstrate clear priority of our novel method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
An, J., Chen, H., Furuse, K., Ishikawa, M.: The convex polyhedra technique: An index structure for high-dimensional space. In: Proc. of the 13th Australasian Database Conference, pp. 33–40 (2002)
An, J., Chen, H., Furuse, K., Ohbo, N.: CVA-file: An Index Structure for High-Dimensional Datasets. Journal of knowledge and Information Systems (to appear)
Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When Is "Nearest Neighbor" Meaningful. When Is "Nearest Neighbor" Meaningful, 217–235 (1999)
Berchtold, S., Keim, D., Kriegel, H.-P.: The X-tree: An Index Structure for High-Dimensional Data. In: 22nd Conf. on Very Large Database, Bombay, India, pp. 28–39 (1996)
Berchtold, S., Keim, D., Kriegel, H.-P.: The pyramid-Technique: Towards Breaking the Curse of Dimensional Data Spaces. In: Proc. ACM SIGMOD Int. Conf. Managment of Data, Seattle, pp. 142–153 (1998)
Beckmann, N., Kriegel, P.H., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, pp. 322–331 (1990)
Ciaccia, P., Patella, M., Zezula, P.: M-tree:An Efficient Access Method for Similarity Seach in Metric Spaces. In: Proc. 23rd Int. Conf. on Very Large Data Bases, Athens, Greece, pp. 426–435 (1997)
Guttman, A.: R-tree: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, pp. 47–57 (1984)
Hellerstein, J.M., Naughton, J.F., Pfefer, A.: Generalized search trees for database systems. In: Proc. of the 21th VLDB conference, Zurich, Switzerland, September 1995, pp. 562–573 (1995)
Katayama, N., Satoh, S.: The SR-tree: An index structure for high-dimensional nearest neighbour queries. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, pp. 369–380 (1997)
Ooi, B.C., Tan, K.L., Yu, C., Bressan, S.: Indexing the Edges - A Simple and Yet Efficient Approach to High-Dimensional Indexing. In: PODS 2000, pp. 166–174 (2000)
Weber, R., Schek, J.H., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of 24th International Conference on Very Large Data Bases, pp. 194–205 (1998)
Zhang, R., Ooi, B.C., Tan, K.L.: Making the Pyramid Technique Robust to Query Types and Workloads. In: ICDE 2004, pp. 313–324 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
An, J., Chen, YP.P., Xu, Q., Zhou, X. (2005). A New Indexing Method for High Dimensional Dataset. In: Zhou, L., Ooi, B.C., Meng, X. (eds) Database Systems for Advanced Applications. DASFAA 2005. Lecture Notes in Computer Science, vol 3453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11408079_35
Download citation
DOI: https://doi.org/10.1007/11408079_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25334-1
Online ISBN: 978-3-540-32005-0
eBook Packages: Computer ScienceComputer Science (R0)