research-article

Free access

Indexing high-dimensional data in dual distance spaces: a symmetrical encoding approach

Authors:

Yueting Zhuang,

Yi YuAuthors Info & Claims

EDBT '08: Proceedings of the 11th international conference on Extending database technology: Advances in database technology

Pages 241 - 251

https://doi.org/10.1145/1353343.1353375

Published: 25 March 2008 Publication History

Abstract

Due to the well-known dimensionality curse problem, search in a high-dimensional space is considered as a "hard" problem. In this paper, a novel symmetrical encoding-based index structure, which is called EHD-Tree (for symmetrical Encoding-based Hybrid Distance Tree), is proposed to support fast k-Nearest-Neighbor (k-NN) search in high-dimensional spaces. In an EHD-Tree, all data points are first grouped into clusters by a k-Means clustering algorithm. Then the uniform ID number of each data point is obtained by a dual-distance-driven encoding scheme in which each cluster sphere is partitioned twice according to the dual distances of start- and centroid-distance. Finally, the uniform ID number and the centroid-distance of each data point are combined to get a uniform index key, the latter is then indexed through a partition-based B^+-tree. Thus, given a query point, its k-NN search in high-dimensional spaces can be transformed into search in a single dimensional space with the aid of the EHD-Tree index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of our proposed scheme, and the results demonstrate that this method outperforms the state-of-the-art high dimensional search techniques such as the X-Tree, VA-file, iDistance and NB-Tree, especially when the query radius is not very large.

References

[1]

Christian Böhm, Stefan Berchtold, Daniel Keira. Searching in High-dimensional Spaces: Index Structures for Improving the Performance of Multimedia Databases, ACM Computing Surveys, 2001. 33 (3).

Digital Library

[2]

Bentley JL. Multidimensional binary search trees used for associative searching, Communications of the ACM, 18(9): pp. 509--517, 1975.

Digital Library

[3]

A. Guttman, R-tree: A dynamic index structure for spatial searching, In Proceedings of the ACM SIGMOD Conference, pp.47--54, 1984.

Digital Library

[4]

N. Beckmann, H.-P. Kriegel, R. Schneider, B. Seeger. The R*-tree: An Efficient and Robust Access Method for Points and Rectangles, In Proceedings of ACM SIGMOD Conference, pp. 322--331, 1990.

Digital Library

[5]

King-Ip Lin, H. V. Jagadish and Christos Faloutsos, The TV-tree an index structure for high-dimensional data, VLDB Journal, 1994.

Digital Library

[6]

S. Berchtold, D. A. Keim and H. P. Kriegel. The X-tree: An index structure for high-dimensional data. In Proceedings of the 22th VLDB Conference, pp. 28--37, 1996.

Digital Library

[7]

D. A. White and R. Jain. Similarity Indexing with the SS- tree, In Proceedings of ICDE Conference, pp. 516--523, 1996.

Digital Library

[8]

N. Katamaya and S. Satoh. The SR-tree: An index structure for high-dimensional nearest neighbor queries. In Proceedings of ACM SIGMOD Conference, pp. 32--42. 1997.

Digital Library

[9]

R. Weber, H. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the 24th VLDB Conference, pp. 194--205, 1998.

Digital Library

[10]

S. Berchtold, C. Bohm, H. P. Kriegel, J. Sander, and H. V. Jagadish. Independent quantization: An index compression technique for high-dimensional data spaces. In Proceedings of the 16th ICDE Conference, pp. 577--588. 2000.

Digital Library

[11]

Y. Sakurai, M. Yoshikawa, S. Uemura, and H. Kojima. The A-tree: An index structure for high-dimensional spaces using relative approximation. In Proceedings of VLDB Conference, pp. 516--526, 2000.

Digital Library

[12]

E. Chávez, G. Navarro, R. Baeza-Yates, and J. Marroquín, Searching in Metric Spaces, ACM Computing Surveys: 33(3), pp. 273--321, ACM Press, 2001.

Digital Library

[13]

T. Bozkaya and M. Ozsoyoglu. Distance-based indexing for high-dimensional metric spaces. In Proceedings of ACM SIGMOD Conference, pages 357--368. 1997.

Digital Library

[14]

P.Ciaccia, M. Patella, and P. Zezula. M-trees: An efficient access method for similarity search in metric space. In Proceedings of the 23rd VLDB Conference, pages 426--435. 1997.

Digital Library

[15]

S. Berchtold, C. Bohm, and H.-P. Kriegel. The pyramid technique: Towards breaking the curse of dimensionality. In Proceedings of SIGMOD Conference, 1998.

Digital Library

[16]

Traina Jr., C., Traina, A., Seeger, B., Faloutsos, Slim-trees: High Performance Metric Trees Minimizing Overlap Between Nodes, In Proceedings of the EDBT Conference, Konstanz, Germany, 2000.

Digital Library

[17]

Filho, R. F. S., Traina, A., and Faloutsos, C. Similarity search without tears: The Omni family of all-purpose access methods. In Proceedings of ICDE Conference, pp. 623--630. 2001.

Digital Library

[18]

M J. Fonseca and J A. Jorge. Indexing High-dimensional Data for Content-Based Retrieval in Large Databases. In Proceedings of the 8th DASSFA Conference, Kyoto, Japan, pp. 267--274, 2003.

Digital Library

[19]

H. V. Jagadish, B. C. Ooi, K. L. Tan, C. Yu, R. Zhang. iDistance: An Adaptive B⁺-tree Based Indexing Method for Nearest Neighbor Search., ACM Transactions on Data Base Systems, 2005. 30(2), pp. 364--397.

Digital Library

[20]

UCI KDD Archive, http://www.kdd.ics.uci.edu, 2002.

Cited By

LIU JNISHIMURA SARAKI TNAKAMURA Y(2017)A Loitering Discovery System Using Efficient Similarity Search Based on Similarity HierarchyIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.E100.A.367E100.A:2(367-375)Online publication date: 2017
https://doi.org/10.1587/transfun.E100.A.367
Wojciechowski AWrembel R(2014)On Index Structures for Star Query Processing in Data WarehousesBusiness Intelligence10.1007/978-3-319-05461-2_6(182-217)Online publication date: 2014
https://doi.org/10.1007/978-3-319-05461-2_6
Ai LYu JHe YGuan T(2013)High-dimensional indexing technologies for large scale content-based image retrieval: a reviewJournal of Zhejiang University SCIENCE C10.1631/jzus.CIDE130414:7(505-520)Online publication date: 12-Jul-2013
https://doi.org/10.1631/jzus.CIDE1304
Show More Cited By

Index Terms

Indexing high-dimensional data in dual distance spaces: a symmetrical encoding approach
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Composite distance transformation for indexing and k-nearest-neighbor searching in high-dimensional spaces

Due to the famous dimensionality curse problem, search in a high-dimensional space is considered as a "hard" problem. In this paper, a novel composite distance transformation method, which is called CDT, is proposed to support a fast k-nearest-neighbor (...
Enhanced algorithm for high-dimensional data classification

Graphical abstractIllustration of the decision hyperplanes generated by TSSVM, MCVSVM, and LMLP on an artificial dataset. Display Omitted HighlightsIn the case of the singularity of the within-class scatter matrix, the drawbacks of both MCVSVM and LMLP ...
Constrained discriminant neighborhood embedding for high dimensional data feature extraction

When handling pattern classification problem such as face recognition and digital handwriting identification, image data is always represented to high dimensional vectors, from which discriminant features are extracted using dimensionality reduction ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

EDBT '08: Proceedings of the 11th international conference on Extending database technology: Advances in database technology

March 2008

762 pages

ISBN:9781595939265

DOI:10.1145/1353343

Conference Chair:
Noureddine Mouaddib,
General Chair:
Patrick Valduriez,
Program Chairs:
Alfons Kemper
Technische Universität München, Germany
,
Mokrane Bouzeghoub,
Volker Markl,
Laurent Amsaleg,
Ioana Manolescu,
Publications Chair:
Jens Teubner

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Science and Technology Project of Zhejiang Province
National Natural Science Foundation of China
Ministry of Science and Technology of the People's Republic of China

Conference

EDBT '08

EDBT '08: 11th International Conference on Extending Database Technology

March 25 - 29, 2008

Nantes, France

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
390
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)12

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

LIU JNISHIMURA SARAKI TNAKAMURA Y(2017)A Loitering Discovery System Using Efficient Similarity Search Based on Similarity HierarchyIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.E100.A.367E100.A:2(367-375)Online publication date: 2017
https://doi.org/10.1587/transfun.E100.A.367
Wojciechowski AWrembel R(2014)On Index Structures for Star Query Processing in Data WarehousesBusiness Intelligence10.1007/978-3-319-05461-2_6(182-217)Online publication date: 2014
https://doi.org/10.1007/978-3-319-05461-2_6
Ai LYu JHe YGuan T(2013)High-dimensional indexing technologies for large scale content-based image retrieval: a reviewJournal of Zhejiang University SCIENCE C10.1631/jzus.CIDE130414:7(505-520)Online publication date: 12-Jul-2013
https://doi.org/10.1631/jzus.CIDE1304
Kurasawa HTakasu AAdachi J(2011)Finding the k-closest pairs in metric spacesProceedings of the 1st Workshop on New Trends in Similarity Search10.1145/1966865.1966870(8-13)Online publication date: 25-Mar-2011
https://dl.acm.org/doi/10.1145/1966865.1966870
Kurasawa HFukagawa DTakasu AAdachi J(2010)Pivot selection method for optimizing both pruning and balancing in metric space indexesProceedings of the 21st international conference on Database and expert systems applications: Part II10.5555/1887568.1887582(141-148)Online publication date: 30-Aug-2010
https://dl.acm.org/doi/10.5555/1887568.1887582
Kurasawa HFukagawa DTakasu AAdachi J(2010)Pivot Selection Method for Optimizing both Pruning and Balancing in Metric Space IndexesDatabase and Expert Systems Applications10.1007/978-3-642-15251-1_10(141-148)Online publication date: 2010
https://doi.org/10.1007/978-3-642-15251-1_10
Kurasawa HFukagawa DTakasu AAdachi JCheung DSong IChu WHu XLin J(2009)Maximal metric margin partitioning for similarity search indexesProceedings of the 18th ACM conference on Information and knowledge management10.1145/1645953.1646256(1887-1890)Online publication date: 2-Nov-2009
https://dl.acm.org/doi/10.1145/1645953.1646256

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents