skip to main content
10.1145/1353343.1353375acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free access

Indexing high-dimensional data in dual distance spaces: a symmetrical encoding approach

Published: 25 March 2008 Publication History

Abstract

Due to the well-known dimensionality curse problem, search in a high-dimensional space is considered as a "hard" problem. In this paper, a novel symmetrical encoding-based index structure, which is called EHD-Tree (for symmetrical Encoding-based Hybrid Distance Tree), is proposed to support fast k-Nearest-Neighbor (k-NN) search in high-dimensional spaces. In an EHD-Tree, all data points are first grouped into clusters by a k-Means clustering algorithm. Then the uniform ID number of each data point is obtained by a dual-distance-driven encoding scheme in which each cluster sphere is partitioned twice according to the dual distances of start- and centroid-distance. Finally, the uniform ID number and the centroid-distance of each data point are combined to get a uniform index key, the latter is then indexed through a partition-based B+-tree. Thus, given a query point, its k-NN search in high-dimensional spaces can be transformed into search in a single dimensional space with the aid of the EHD-Tree index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of our proposed scheme, and the results demonstrate that this method outperforms the state-of-the-art high dimensional search techniques such as the X-Tree, VA-file, iDistance and NB-Tree, especially when the query radius is not very large.

References

[1]
Christian Böhm, Stefan Berchtold, Daniel Keira. Searching in High-dimensional Spaces: Index Structures for Improving the Performance of Multimedia Databases, ACM Computing Surveys, 2001. 33 (3).
[2]
Bentley JL. Multidimensional binary search trees used for associative searching, Communications of the ACM, 18(9): pp. 509--517, 1975.
[3]
A. Guttman, R-tree: A dynamic index structure for spatial searching, In Proceedings of the ACM SIGMOD Conference, pp.47--54, 1984.
[4]
N. Beckmann, H.-P. Kriegel, R. Schneider, B. Seeger. The R*-tree: An Efficient and Robust Access Method for Points and Rectangles, In Proceedings of ACM SIGMOD Conference, pp. 322--331, 1990.
[5]
King-Ip Lin, H. V. Jagadish and Christos Faloutsos, The TV-tree an index structure for high-dimensional data, VLDB Journal, 1994.
[6]
S. Berchtold, D. A. Keim and H. P. Kriegel. The X-tree: An index structure for high-dimensional data. In Proceedings of the 22th VLDB Conference, pp. 28--37, 1996.
[7]
D. A. White and R. Jain. Similarity Indexing with the SS- tree, In Proceedings of ICDE Conference, pp. 516--523, 1996.
[8]
N. Katamaya and S. Satoh. The SR-tree: An index structure for high-dimensional nearest neighbor queries. In Proceedings of ACM SIGMOD Conference, pp. 32--42. 1997.
[9]
R. Weber, H. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the 24th VLDB Conference, pp. 194--205, 1998.
[10]
S. Berchtold, C. Bohm, H. P. Kriegel, J. Sander, and H. V. Jagadish. Independent quantization: An index compression technique for high-dimensional data spaces. In Proceedings of the 16th ICDE Conference, pp. 577--588. 2000.
[11]
Y. Sakurai, M. Yoshikawa, S. Uemura, and H. Kojima. The A-tree: An index structure for high-dimensional spaces using relative approximation. In Proceedings of VLDB Conference, pp. 516--526, 2000.
[12]
E. Chávez, G. Navarro, R. Baeza-Yates, and J. Marroquín, Searching in Metric Spaces, ACM Computing Surveys: 33(3), pp. 273--321, ACM Press, 2001.
[13]
T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proceedings of ACM SIGMOD Conference, pages 357--368. 1997.
[14]
P.Ciaccia, M. Patella, and P. Zezula. M-trees: An efficient access method for similarity search in metric space. In Proceedings of the 23rd VLDB Conference, pages 426--435. 1997.
[15]
S. Berchtold, C. Bohm, and H.-P. Kriegel. The pyramid technique: Towards breaking the curse of dimensionality. In Proceedings of SIGMOD Conference, 1998.
[16]
Traina Jr., C., Traina, A., Seeger, B., Faloutsos, Slim-trees: High Performance Metric Trees Minimizing Overlap Between Nodes, In Proceedings of the EDBT Conference, Konstanz, Germany, 2000.
[17]
Filho, R. F. S., Traina, A., and Faloutsos, C. Similarity search without tears: The Omni family of all-purpose access methods. In Proceedings of ICDE Conference, pp. 623--630. 2001.
[18]
M J. Fonseca and J A. Jorge. Indexing High-dimensional Data for Content-Based Retrieval in Large Databases. In Proceedings of the 8th DASSFA Conference, Kyoto, Japan, pp. 267--274, 2003.
[19]
H. V. Jagadish, B. C. Ooi, K. L. Tan, C. Yu, R. Zhang. iDistance: An Adaptive B+-tree Based Indexing Method for Nearest Neighbor Search., ACM Transactions on Data Base Systems, 2005. 30(2), pp. 364--397.
[20]
UCI KDD Archive, http://www.kdd.ics.uci.edu, 2002.

Cited By

View all
  • (2017)A Loitering Discovery System Using Efficient Similarity Search Based on Similarity HierarchyIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.E100.A.367E100.A:2(367-375)Online publication date: 2017
  • (2014)On Index Structures for Star Query Processing in Data WarehousesBusiness Intelligence10.1007/978-3-319-05461-2_6(182-217)Online publication date: 2014
  • (2013)High-dimensional indexing technologies for large scale content-based image retrieval: a reviewJournal of Zhejiang University SCIENCE C10.1631/jzus.CIDE130414:7(505-520)Online publication date: 12-Jul-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT '08: Proceedings of the 11th international conference on Extending database technology: Advances in database technology
March 2008
762 pages
ISBN:9781595939265
DOI:10.1145/1353343
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2008

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

EDBT '08

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)12
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2017)A Loitering Discovery System Using Efficient Similarity Search Based on Similarity HierarchyIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.E100.A.367E100.A:2(367-375)Online publication date: 2017
  • (2014)On Index Structures for Star Query Processing in Data WarehousesBusiness Intelligence10.1007/978-3-319-05461-2_6(182-217)Online publication date: 2014
  • (2013)High-dimensional indexing technologies for large scale content-based image retrieval: a reviewJournal of Zhejiang University SCIENCE C10.1631/jzus.CIDE130414:7(505-520)Online publication date: 12-Jul-2013
  • (2011)Finding the k-closest pairs in metric spacesProceedings of the 1st Workshop on New Trends in Similarity Search10.1145/1966865.1966870(8-13)Online publication date: 25-Mar-2011
  • (2010)Pivot selection method for optimizing both pruning and balancing in metric space indexesProceedings of the 21st international conference on Database and expert systems applications: Part II10.5555/1887568.1887582(141-148)Online publication date: 30-Aug-2010
  • (2010)Pivot Selection Method for Optimizing both Pruning and Balancing in Metric Space IndexesDatabase and Expert Systems Applications10.1007/978-3-642-15251-1_10(141-148)Online publication date: 2010
  • (2009)Maximal metric margin partitioning for similarity search indexesProceedings of the 18th ACM conference on Information and knowledge management10.1145/1645953.1646256(1887-1890)Online publication date: 2-Nov-2009

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media