Skip to main content
Log in

Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Similarity search (e.g., k-nearest neighbor search) in high-dimensional metric space is the key operation in many applications, such as multimedia databases, image retrieval and object recognition, among others. The high dimensionality and the huge size of the data set require an index structure to facilitate the search. State-of-the-art index structures are built by partitioning the data set based on distances to certain reference point(s). Using the index, search is confined to a small number of partitions. However, these methods either ignore the property of the data distribution (e.g., VP-tree and its variants) or produce non-disjoint partitions (e.g., M-tree and its variants, DBM-tree); these greatly affect the search efficiency. In this paper, we study the effectiveness of a new index structure, called Nested-Approximate-eQuivalence-class tree (NAQ-tree), which overcomes the above disadvantages. NAQ-tree is constructed by recursively dividing the data set into nested approximate equivalence classes. The conducted analysis and the reported comparative test results demonstrate the effectiveness of NAQ-tree in significantly improving the search efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. An J, Chen H, Furuse K, Ohbo N (2005) CVA file: an index structure for high-dimensional datasets. Knowl Inform Syst 7: 337–357

    Article  Google Scholar 

  2. Andoni A, Indyk P (2008) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Comm ACM 51(1): 117–122

    Article  Google Scholar 

  3. Berchtold S, Keim D, Kriegel HP (1996) The X-tree: an index structure for high-dimensional data. In: Proceedings of VLDB

  4. Beyer K et al (1999) When is “Nearest Neighbor” meaningful? In: Proceedings of IEEE ICDE

  5. Bozkaya T, Ozsoyoglu M (1997) Distance-based indexing for high-dimensional metric spaces. In: Proceedings of ACM SIGMOD, pp 357–368

  6. Brin S (1995) Near neighbor search in large metric spaces. In: Proceedings of VLDB, pp 574–584

  7. Burkhard WA, Keller RM (1973) Some approaches to best-match file searching. Comm ACM 16(4): 230–236

    Article  MATH  Google Scholar 

  8. Chavez E, Navarro G (2005) A compact space decomposition for effective metric indexing. Pattern Recognit Lett 26(9): 1363–1376

    Article  Google Scholar 

  9. Ciaccia P, Patella M, Zezula P (1997) M-Tree: an efficient access method for similarity search in metric spaces. VLDB J 426–435

  10. Data set is available at: http://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.html

  11. Cui B, Ooi BC, Su J, Tan T (2005) Indexing high-dimensional data for efficient in-memory similarity search. IEEE Trans Knowl Data Eng 17(3): 339–353

    Article  Google Scholar 

  12. Dohnal V et al (2003) D-index: distance searching index for metric data sets. Multimedia Tools Appl 21(19): 33

    Google Scholar 

  13. Filho RFS, Traina AJM, Traina C, Faloutsos C (2001) Similarity search without tears: the OMNI family of all-purpose access methods. In: Proceedings of IEEE ICDE, pp 623–630

  14. Fu AW-C et al (2000) Dynamic VP-tree indexing for N-Nearest search given pair-wise distances. VLDB J

  15. Source code is available at http://www.cs.mu.oz.au/~rui/code.htm

  16. Jagadish HV, Ooi BC, Tan KL, Yu C, Zhang R (2005) iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst 30(2): 364–397

    Article  Google Scholar 

  17. Katamaya N, Satoh S (1997) The SR-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings of ACM SIGMOD

  18. Data set is available at: http://kodiak.cs.cornell.edu/kddcup/datasets.html

  19. Kailing K, Kriegel H-P, Pfeifle M, Schonauer S (2006) Extending metric index structures for efficient range query processing. Knowl Inform Syst 10(2): 211–227

    Article  Google Scholar 

  20. Kadiyala S, Shiri N (2008) A compact multi-resolution index for variable length queries in time series databases. Knowl Inform Syst 15: 131–147

    Article  Google Scholar 

  21. Lowe DG (2004) Distinctive image features from scale invariant features. IJCV 60(2): 91–110

    Article  Google Scholar 

  22. Nievergelt J, Hinterberger H, Sevcik KC (1984) The grid file: an adaptable, symmetric multikey file structure. ACM Trans Database Syst 9(1)

  23. Pawlak Z (1991) Rough sets theoretical aspects of reasoning about data. Kluwer, Dordrecht

    MATH  Google Scholar 

  24. Seidl T, Kriegel HP (1998) Optimal multi-step k-nearest neighbor search. In: Proceedings of ACM SIGMOD

  25. Shaft U, Ramakrishnan R (2005) When is nearest neighbors indexable? In: Proceedings of ICDT, pp 158–172

  26. Traina C, Traina AJM, Seeger B, Faloutsos C (2000) Slim-trees: highet performance metric trees minimizing overlap between nodes. In: Proceedings of EDBT, pp 51–65

  27. Uhlmann JK (1991) Satisfying general proximity/similarity queries with metric trees. Inform Process Lett 40: 175–179

    Article  MATH  Google Scholar 

  28. Venkateswaran J, Lachwani D, Kahveci T, Jermaine C (2006) Reference-based indexing of sequences databases. In: Proceedings of VLDB

  29. Vieira MR, Traina C, Chino FJT, Traina AJM (2004) DBM-tree: a dynamic metric access method sensitive to local density data. In: Proceedings of SBBD, pp 163–177

  30. Source code is available at: http://www.cse.cuhk.edu.hk/~kdd/program.html

  31. White DA, Jain R (1996) Similarity indexing with the ss-tree. In: Proceedings of IEEE ICDE

  32. Weber R, Schek HJ, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional space. In: Proceedings of VLDB

  33. Yianilos P (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of ACM-SIAM symposium on discrete algorithms, pp 311–321

  34. Yianilos P (1999) Excluded middle vantage point forests for nearest neighbor search. In: Implementation challenge, ALENEX’99

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reda Alhajj.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, M., Alhajj, R. Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space. Knowl Inf Syst 22, 1–26 (2010). https://doi.org/10.1007/s10115-008-0190-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-008-0190-y

Keywords

Navigation