Abstract
Due to the famous dimensionality curse problem, search in a high-dimensional space is considered as a “hard” problem. In this paper, a novel composite distance transformation method, which is called CDT, is proposed to support a fast k-nearest-neighbor (k-NN) search in high-dimensional spaces. In CDT, all (n) data points are first grouped into some clusters by a k-Means clustering algorithm. Then a composite distance key of each data point is computed. Finally, these index keys of such n data points are inserted by a partition-based B+-tree. Thus, given a query point, its k-NN search in high-dimensional spaces is transformed into the search in the single dimensional space with the aid of CDT index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of the proposed scheme. Our results show that this method outperforms the state-of-the-art high-dimensional search techniques, such as the X-Tree, VA-file, iDistance and NB-Tree.
Similar content being viewed by others
References
Christian Böhm, Stefan Berchtold, Daniel Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys, 2001, 33(3): 322–373.
Guttman A. R-tree: A dynamic index structure for spatial searching. In Proc. the ACM SIGMOD Int. Conf. Management Data, Boston, USA, 1984, pp. 47–54.
Beckmann N, Kriegel H-P, Schneider R, Seeger B. The R *-tree: An efficient and robust access method for points and rectangles. In Proc. ACM SIGMOD Int. Conf. Management Data, Atlantic, USA, 1990, pp. 322–331.
Berchtold S, Keim D A, Kriegel H P. The X-tree: An index structure for high-dimensional data. In Proc. 22nd Int. Conf. Very Large Data Bases, India, 1996, pp. 28–37.
Katamaya N, Satoh S. The SR-tree: An index structure for high-dimensional nearest neighbor queries. In Proc. ACM SIGMOD Int. Conf. Management of Data, Arizona, USA, 1997, pp. 32–42.
Weber R, Schek H, Blott S. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proc. 24th Int. Conf. Very Large Data Bases, New York, USA, 1998, pp. 194–205.
Berchtold S, Bohm C, Kriegel H P et al. Independent quantization: An index compression technique for high-dimensional data spaces. In Proc. 16th Int. Conf. Data Engineering, San Diego, USA, 2000, pp. 577–588.
Fonseca M J, Jorge J A. Indexing high-dimensional data for content-based retrieval in large databases. In Proc. the 8th Int. Conf. Database Systems for Advanced Applications, Kyoto, Japan, 2003, pp. 267–274.
Jagadish H V, Ooi B C, Tan K L et al. iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Data Base Systems, 2005, 30(2): 364–397.
The UCI KDD Archive. http://www.kdd.ics.uci.edu, 2002.
Author information
Authors and Affiliations
Corresponding author
Additional information
Partially supported by the National Natural Science Foundation of China (Grant No. 60533090), National Science Fund for Distinguished Young Scholars (Grant No. 60525108), the National Grand Fundamental Research 973 Program of China (Grant No. 2002CB312101), Science and Technology Project of Zhejiang Province (Grant Nos. 2005C13032, 2005C11001-05) and China-America Academic Digital Library Project (see www.cadal.zju.edu.cn).
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Zhuang, Y., Zhuang, YT. & Wu, F. Composite Distance Transformation for Indexing and k-Nearest-Neighbor Searching in High-Dimensional Spaces. J Comput Sci Technol 22, 208–217 (2007). https://doi.org/10.1007/s11390-007-9027-5
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-007-9027-5