Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space

Zhang, Ming; Alhajj, Reda

doi:10.1007/s10115-008-0190-y

Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space

Regular Paper
Published: 23 January 2009

Volume 22, pages 1–26, (2010)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Ming Zhang¹ &
Reda Alhajj^1,2

168 Accesses
19 Citations
Explore all metrics

Abstract

Similarity search (e.g., k-nearest neighbor search) in high-dimensional metric space is the key operation in many applications, such as multimedia databases, image retrieval and object recognition, among others. The high dimensionality and the huge size of the data set require an index structure to facilitate the search. State-of-the-art index structures are built by partitioning the data set based on distances to certain reference point(s). Using the index, search is confined to a small number of partitions. However, these methods either ignore the property of the data distribution (e.g., VP-tree and its variants) or produce non-disjoint partitions (e.g., M-tree and its variants, DBM-tree); these greatly affect the search efficiency. In this paper, we study the effectiveness of a new index structure, called Nested-Approximate-eQuivalence-class tree (NAQ-tree), which overcomes the above disadvantages. NAQ-tree is constructed by recursively dividing the data set into nested approximate equivalence classes. The conducted analysis and the reported comparative test results demonstrate the effectiveness of NAQ-tree in significantly improving the search efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective optimizations of cluster-based nearest neighbor search in high-dimensional space

Article 24 December 2014

Pruning Algorithms for Low-Dimensional Non-metric k-NN Search: A Case Study

LGTM: A Fast and Accurate kNN Search Algorithm in High-Dimensional Spaces

References

An J, Chen H, Furuse K, Ohbo N (2005) CVA file: an index structure for high-dimensional datasets. Knowl Inform Syst 7: 337–357
Article Google Scholar
Andoni A, Indyk P (2008) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Comm ACM 51(1): 117–122
Article Google Scholar
Berchtold S, Keim D, Kriegel HP (1996) The X-tree: an index structure for high-dimensional data. In: Proceedings of VLDB
Beyer K et al (1999) When is “Nearest Neighbor” meaningful? In: Proceedings of IEEE ICDE
Bozkaya T, Ozsoyoglu M (1997) Distance-based indexing for high-dimensional metric spaces. In: Proceedings of ACM SIGMOD, pp 357–368
Brin S (1995) Near neighbor search in large metric spaces. In: Proceedings of VLDB, pp 574–584
Burkhard WA, Keller RM (1973) Some approaches to best-match file searching. Comm ACM 16(4): 230–236
Article MATH Google Scholar
Chavez E, Navarro G (2005) A compact space decomposition for effective metric indexing. Pattern Recognit Lett 26(9): 1363–1376
Article Google Scholar
Ciaccia P, Patella M, Zezula P (1997) M-Tree: an efficient access method for similarity search in metric spaces. VLDB J 426–435
Data set is available at: http://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.html
Cui B, Ooi BC, Su J, Tan T (2005) Indexing high-dimensional data for efficient in-memory similarity search. IEEE Trans Knowl Data Eng 17(3): 339–353
Article Google Scholar
Dohnal V et al (2003) D-index: distance searching index for metric data sets. Multimedia Tools Appl 21(19): 33
Google Scholar
Filho RFS, Traina AJM, Traina C, Faloutsos C (2001) Similarity search without tears: the OMNI family of all-purpose access methods. In: Proceedings of IEEE ICDE, pp 623–630
Fu AW-C et al (2000) Dynamic VP-tree indexing for N-Nearest search given pair-wise distances. VLDB J
Source code is available at http://www.cs.mu.oz.au/~rui/code.htm
Jagadish HV, Ooi BC, Tan KL, Yu C, Zhang R (2005) iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst 30(2): 364–397
Article Google Scholar
Katamaya N, Satoh S (1997) The SR-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings of ACM SIGMOD
Data set is available at: http://kodiak.cs.cornell.edu/kddcup/datasets.html
Kailing K, Kriegel H-P, Pfeifle M, Schonauer S (2006) Extending metric index structures for efficient range query processing. Knowl Inform Syst 10(2): 211–227
Article Google Scholar
Kadiyala S, Shiri N (2008) A compact multi-resolution index for variable length queries in time series databases. Knowl Inform Syst 15: 131–147
Article Google Scholar
Lowe DG (2004) Distinctive image features from scale invariant features. IJCV 60(2): 91–110
Article Google Scholar
Nievergelt J, Hinterberger H, Sevcik KC (1984) The grid file: an adaptable, symmetric multikey file structure. ACM Trans Database Syst 9(1)
Pawlak Z (1991) Rough sets theoretical aspects of reasoning about data. Kluwer, Dordrecht
MATH Google Scholar
Seidl T, Kriegel HP (1998) Optimal multi-step k-nearest neighbor search. In: Proceedings of ACM SIGMOD
Shaft U, Ramakrishnan R (2005) When is nearest neighbors indexable? In: Proceedings of ICDT, pp 158–172
Traina C, Traina AJM, Seeger B, Faloutsos C (2000) Slim-trees: highet performance metric trees minimizing overlap between nodes. In: Proceedings of EDBT, pp 51–65
Uhlmann JK (1991) Satisfying general proximity/similarity queries with metric trees. Inform Process Lett 40: 175–179
Article MATH Google Scholar
Venkateswaran J, Lachwani D, Kahveci T, Jermaine C (2006) Reference-based indexing of sequences databases. In: Proceedings of VLDB
Vieira MR, Traina C, Chino FJT, Traina AJM (2004) DBM-tree: a dynamic metric access method sensitive to local density data. In: Proceedings of SBBD, pp 163–177
Source code is available at: http://www.cse.cuhk.edu.hk/~kdd/program.html
White DA, Jain R (1996) Similarity indexing with the ss-tree. In: Proceedings of IEEE ICDE
Weber R, Schek HJ, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional space. In: Proceedings of VLDB
Yianilos P (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of ACM-SIAM symposium on discrete algorithms, pp 311–321
Yianilos P (1999) Excluded middle vantage point forests for nearest neighbor search. In: Implementation challenge, ALENEX’99

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Calgary, Calgary, AB, Canada
Ming Zhang & Reda Alhajj
Department of Computer Science, Global University, Beirut, Lebanon
Reda Alhajj

Authors

Ming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Reda Alhajj
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reda Alhajj.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, M., Alhajj, R. Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space. Knowl Inf Syst 22, 1–26 (2010). https://doi.org/10.1007/s10115-008-0190-y

Download citation

Received: 04 August 2008
Revised: 12 November 2008
Accepted: 22 November 2008
Published: 23 January 2009
Issue Date: January 2010
DOI: https://doi.org/10.1007/s10115-008-0190-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space

Abstract

Access this article

Similar content being viewed by others

Effective optimizations of cluster-based nearest neighbor search in high-dimensional space

Pruning Algorithms for Low-Dimensional Non-metric k-NN Search: A Case Study

LGTM: A Fast and Accurate kNN Search Algorithm in High-Dimensional Spaces

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space

Abstract

Access this article

Similar content being viewed by others

Effective optimizations of cluster-based nearest neighbor search in high-dimensional space

Pruning Algorithms for Low-Dimensional Non-metric k-NN Search: A Case Study

LGTM: A Fast and Accurate kNN Search Algorithm in High-Dimensional Spaces

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation