Abstract
The paper proposes a novel symmetrical encoding-based index structure, which is called EDD-tree (for encoding-based dual distance tree), to support fast k-nearest neighbor (k-NN) search in high-dimensional spaces. In the EDD-tree, all data points are first grouped into clusters by a k-means clustering algorithm. Then the uniform ID number of each data point is obtained by a dual-distance-driven encoding scheme, in which each cluster sphere is partitioned twice according to the dual distances of start-and centroid-distance. Finally, the uniform ID number and the centroid-distance of each data point are combined to get a uniform index key, the latter is then indexed through a partition-based B+-tree. Thus, given a query point, its k-NN search in high-dimensional spaces can be transformed into search in a single dimensional space with the aid of the EDD-tree index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of our proposed scheme, and the results demonstrate that this method outperforms the state-of-the-art high-dimensional search techniques such as the X-tree, VA-file, iDistance and NB-tree, especially when the query radius is not very large.
Similar content being viewed by others
References
Bohm C, Berchtold S, Keim D. Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput Surv, 2001, 33(3): 322–373
Guttman A. R-tree: a dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. Boston: ACM Press, 1984. 47–54
Beckmann N, Kriegel H P, Schneider R, et al. The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of ACM SIGMOD International Conference on Management of Data. Atlantic City: SIGMOD Record, 1990, 19(2). 322–331
Berchtold S, Keim D A, Kriegel H P. The X-tree: an index structure for high-dimensional data. In: Proceedings of the 22th International Conference on Very Large Data Bases. India: Morgan Kaufmann, 1996. 28–37
Weber R, Schek H, Blott S. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the 24th International Conference on Very Large Data Bases. New York: Morgan Kaufmann Publishers, 1998. 194–205
Berchtold S, Bohm C, Kriegel H P, et al. Independent quantization: an index compression technique for high-dimensional data spaces. In: Proceedings of the 16th International Conference on Data Engineering. USA: IEEE Computer Society, 2000. 577–588
Fonseca M J, Jorge J A. NB-Tree: an indexing structure for content-based retrieval in large databases. In: Proceedings of the 8th International Conference on Database Systems for Advanced Applications. Kyoto: IEEE Computer Society, 2003. 267–274
Jagadish H V, Ooi B C, Tan K L, et al. iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst, 2005, 30(2): 364–397
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the key program of the National Natural Science Foundation of China (Grant No. 60533090), the National Natural Science Fund for Distinguished Young Scholars (Grant No. 60525108), and China-America Academic Digital Library Project
Rights and permissions
About this article
Cite this article
Zhuang, Y., Zhuang, Y. & Wu, F. An encoding-based dual distance tree high-dimensional index. Sci. China Ser. F-Inf. Sci. 51, 1401–1414 (2008). https://doi.org/10.1007/s11432-008-0104-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-008-0104-3