skip to main content
10.1145/1353343.1353375acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free Access

Indexing high-dimensional data in dual distance spaces: a symmetrical encoding approach

Published:25 March 2008Publication History

ABSTRACT

Due to the well-known dimensionality curse problem, search in a high-dimensional space is considered as a "hard" problem. In this paper, a novel symmetrical encoding-based index structure, which is called EHD-Tree (for symmetrical Encoding-based Hybrid Distance Tree), is proposed to support fast k-Nearest-Neighbor (k-NN) search in high-dimensional spaces. In an EHD-Tree, all data points are first grouped into clusters by a k-Means clustering algorithm. Then the uniform ID number of each data point is obtained by a dual-distance-driven encoding scheme in which each cluster sphere is partitioned twice according to the dual distances of start- and centroid-distance. Finally, the uniform ID number and the centroid-distance of each data point are combined to get a uniform index key, the latter is then indexed through a partition-based B+-tree. Thus, given a query point, its k-NN search in high-dimensional spaces can be transformed into search in a single dimensional space with the aid of the EHD-Tree index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of our proposed scheme, and the results demonstrate that this method outperforms the state-of-the-art high dimensional search techniques such as the X-Tree, VA-file, iDistance and NB-Tree, especially when the query radius is not very large.

References

  1. Christian Böhm, Stefan Berchtold, Daniel Keira. Searching in High-dimensional Spaces: Index Structures for Improving the Performance of Multimedia Databases, ACM Computing Surveys, 2001. 33 (3). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bentley JL. Multidimensional binary search trees used for associative searching, Communications of the ACM, 18(9): pp. 509--517, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Guttman, R-tree: A dynamic index structure for spatial searching, In Proceedings of the ACM SIGMOD Conference, pp.47--54, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Beckmann, H.-P. Kriegel, R. Schneider, B. Seeger. The R*-tree: An Efficient and Robust Access Method for Points and Rectangles, In Proceedings of ACM SIGMOD Conference, pp. 322--331, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. King-Ip Lin, H. V. Jagadish and Christos Faloutsos, The TV-tree an index structure for high-dimensional data, VLDB Journal, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Berchtold, D. A. Keim and H. P. Kriegel. The X-tree: An index structure for high-dimensional data. In Proceedings of the 22th VLDB Conference, pp. 28--37, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. A. White and R. Jain. Similarity Indexing with the SS- tree, In Proceedings of ICDE Conference, pp. 516--523, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Katamaya and S. Satoh. The SR-tree: An index structure for high-dimensional nearest neighbor queries. In Proceedings of ACM SIGMOD Conference, pp. 32--42. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Weber, H. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the 24th VLDB Conference, pp. 194--205, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Berchtold, C. Bohm, H. P. Kriegel, J. Sander, and H. V. Jagadish. Independent quantization: An index compression technique for high-dimensional data spaces. In Proceedings of the 16th ICDE Conference, pp. 577--588. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Sakurai, M. Yoshikawa, S. Uemura, and H. Kojima. The A-tree: An index structure for high-dimensional spaces using relative approximation. In Proceedings of VLDB Conference, pp. 516--526, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Chávez, G. Navarro, R. Baeza-Yates, and J. Marroquín, Searching in Metric Spaces, ACM Computing Surveys: 33(3), pp. 273--321, ACM Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proceedings of ACM SIGMOD Conference, pages 357--368. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P.Ciaccia, M. Patella, and P. Zezula. M-trees: An efficient access method for similarity search in metric space. In Proceedings of the 23rd VLDB Conference, pages 426--435. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Berchtold, C. Bohm, and H.-P. Kriegel. The pyramid technique: Towards breaking the curse of dimensionality. In Proceedings of SIGMOD Conference, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Traina Jr., C., Traina, A., Seeger, B., Faloutsos, Slim-trees: High Performance Metric Trees Minimizing Overlap Between Nodes, In Proceedings of the EDBT Conference, Konstanz, Germany, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Filho, R. F. S., Traina, A., and Faloutsos, C. Similarity search without tears: The Omni family of all-purpose access methods. In Proceedings of ICDE Conference, pp. 623--630. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M J. Fonseca and J A. Jorge. Indexing High-dimensional Data for Content-Based Retrieval in Large Databases. In Proceedings of the 8th DASSFA Conference, Kyoto, Japan, pp. 267--274, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. V. Jagadish, B. C. Ooi, K. L. Tan, C. Yu, R. Zhang. iDistance: An Adaptive B+-tree Based Indexing Method for Nearest Neighbor Search., ACM Transactions on Data Base Systems, 2005. 30(2), pp. 364--397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. UCI KDD Archive, http://www.kdd.ics.uci.edu, 2002.Google ScholarGoogle Scholar

Index Terms

  1. Indexing high-dimensional data in dual distance spaces: a symmetrical encoding approach

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          EDBT '08: Proceedings of the 11th international conference on Extending database technology: Advances in database technology
          March 2008
          762 pages
          ISBN:9781595939265
          DOI:10.1145/1353343

          Copyright © 2008 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 March 2008

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate7of10submissions,70%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader