Skip to main content
Log in

An encoding-based dual distance tree high-dimensional index

  • Published:
Science in China Series F: Information Sciences Aims and scope Submit manuscript

Abstract

The paper proposes a novel symmetrical encoding-based index structure, which is called EDD-tree (for encoding-based dual distance tree), to support fast k-nearest neighbor (k-NN) search in high-dimensional spaces. In the EDD-tree, all data points are first grouped into clusters by a k-means clustering algorithm. Then the uniform ID number of each data point is obtained by a dual-distance-driven encoding scheme, in which each cluster sphere is partitioned twice according to the dual distances of start-and centroid-distance. Finally, the uniform ID number and the centroid-distance of each data point are combined to get a uniform index key, the latter is then indexed through a partition-based B+-tree. Thus, given a query point, its k-NN search in high-dimensional spaces can be transformed into search in a single dimensional space with the aid of the EDD-tree index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of our proposed scheme, and the results demonstrate that this method outperforms the state-of-the-art high-dimensional search techniques such as the X-tree, VA-file, iDistance and NB-tree, especially when the query radius is not very large.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bohm C, Berchtold S, Keim D. Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput Surv, 2001, 33(3): 322–373

    Article  Google Scholar 

  2. Guttman A. R-tree: a dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. Boston: ACM Press, 1984. 47–54

    Google Scholar 

  3. Beckmann N, Kriegel H P, Schneider R, et al. The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of ACM SIGMOD International Conference on Management of Data. Atlantic City: SIGMOD Record, 1990, 19(2). 322–331

    Google Scholar 

  4. Berchtold S, Keim D A, Kriegel H P. The X-tree: an index structure for high-dimensional data. In: Proceedings of the 22th International Conference on Very Large Data Bases. India: Morgan Kaufmann, 1996. 28–37

    Google Scholar 

  5. Weber R, Schek H, Blott S. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the 24th International Conference on Very Large Data Bases. New York: Morgan Kaufmann Publishers, 1998. 194–205

    Google Scholar 

  6. Berchtold S, Bohm C, Kriegel H P, et al. Independent quantization: an index compression technique for high-dimensional data spaces. In: Proceedings of the 16th International Conference on Data Engineering. USA: IEEE Computer Society, 2000. 577–588

    Google Scholar 

  7. Fonseca M J, Jorge J A. NB-Tree: an indexing structure for content-based retrieval in large databases. In: Proceedings of the 8th International Conference on Database Systems for Advanced Applications. Kyoto: IEEE Computer Society, 2003. 267–274

    Google Scholar 

  8. Jagadish H V, Ooi B C, Tan K L, et al. iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst, 2005, 30(2): 364–397

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to YueTing Zhuang.

Additional information

Supported by the key program of the National Natural Science Foundation of China (Grant No. 60533090), the National Natural Science Fund for Distinguished Young Scholars (Grant No. 60525108), and China-America Academic Digital Library Project

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhuang, Y., Zhuang, Y. & Wu, F. An encoding-based dual distance tree high-dimensional index. Sci. China Ser. F-Inf. Sci. 51, 1401–1414 (2008). https://doi.org/10.1007/s11432-008-0104-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-008-0104-3

Keywords

Navigation