Abstract
Reverse nearest-neighbor (RNN) query processing is important for many applications such as decision-support systems, profile-based marketing and molecular biology; consequently, RNN query processing has attracted considerable attention in the research community in recent years. Most existing approaches for RNN query processing either rely on nearest-neighbor pre-computation or work for specific data space (e.g., the Euclidean space). The only method for RNN query processing in metric space is based on the M-tree. In this paper, we propose an approach for RNN query processing in high-dimensional metric space using distance-based index structure (in particular, NAQ-tree that outperforms the other distance-based index structures as we have already verified in a previous study). In high-dimensional space, the properties of distance-based index structure provide strong pruning rules than the M-tree. In addition, unlike the previous work, our approach integrates the filtering and verification steps and uses the information obtained in the verification stage to further improve the filtering rate. Our approach delivers results incrementally and hence well serves real-time applications. The reported experimental results demonstrate the applicability and effectiveness of the proposed NAQ-tree-based RNN approach.
Similar content being viewed by others
References
Achtert E, Böhm C, Kröger P, Kunath P, Pryakhin A, Renz M (2006) Efficient reverse k-nearest neighbor search in arbitrary metric space. In: Proceedings of ACM SIGMOD. pp 515–526
Aronovich L, Spiegler I (2010) Bulk construction of dynamic clustered metric trees. Knowl Inf Syst 22(2): 211–244
Baeza-Yates R, Cunto W, Manber U, Wu S (1994) Proximity matching using fixed-queries trees. In: Proceedings of conference on combinatorial pattern matching. pp 198–212
Benetis R, Jensen CS, Karciauskas G, Saltenis S (2006) Nearest neighbor and reverse nearest neighbor queries for moving objects. VLDB J 15(3): 229–249
Bozkaya T, Ozsoyoglu M (1997) Distance-based indexing for high-dimensional metric spaces. In: Proceedings of ACM SIGMOD. pp 357–368
Brin S (1995) Near neighbor search in large metric spaces. In: Proceedings of VLDB. pp 574–584
Burkhard W, Keller R (1973) Some approach to best-match file searching. Commun ACM 16(4): 230–236
Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of 23rd international conference on very large data bases, August 25–29, Athens, Greece. Morgan Kaufmann, pp 426–435. ISBN 1-55860-470-7
Conway J, Sloane N (1988) Sphere packings, lattices and groups, 1st edn. Springer, New York
Copeland G, Koshafian S (1985) A decomposition storage model. In: Proceedings of ACM SIGMOD. pp 268–279
Chavez E, Navarro G, Baeza-Yates R, Marroquin JL (2001) Searching in metric spaces. ACM Comput Surv 33(3): 273–321
Fu AW, Chan PM, Cheung YL, Moon YS (2000) Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances. VLDB J 9(2): 154–173
Ferhatosmanoglu H, Stanoi I, Agrawal D, Abbadi AE (2001) Constrained nearest neighbor queries. In: Proceedings of the international symposium on spatial and temporal databases. pp 257–278
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of ACM SIGMOD. pp 47–57
Kalantari I, McDonald G (1983) A data structure and an algorithm for nearest point problem. IEEE Trans Softw Eng 9(5): 631–634
Kelil A, Wang S, Jiang Q, Brzezinski R (2010) A general measure of similarity for categorical sequences. Knowl Inf Syst 24(2): 197–220
Korn F, Muthukrishnan S (2000) Influence sets based on reverse nearest neighbor queries. In: Proceedings of ACM SIGMOD. pp 201–212
Korn F, Muthukrishnan S, Srivastava D (2002) Reverse nearest neighbor aggregates over data streams. In: Proceeding of VLDB. pp 814–825
Lin K-I, Nolen M, Yang C (2003) Applying bulk insertion techniques for dynamic reverse nearest neighbor problems. In: Proceedings of IDEAS. pp 290–297
Maheshwari A, Vahrenhold J, Zeh N (2002) On reverse nearest neighbor queries. In: Proceedings of the canadian conference on computational geometry. pp 128–132
Seidl T, Kriegel HP (1998) Optimal multi-step k-nearest neighbor search. In: Proceeedings of ACM-SIGMOD. pp 154–165
Shaft U, Ramakrishnan R (2005) When is nearest neighbors indexable? In: Proceedings of ICDT. pp 158–172
Singh A, Ferhatosmanoglu H, Tosun AS (2003) High dimensional reverse nearest neighbor queries. In: Proceedings of ACM CIKM. pp 91–98
Song G, Cui B, Zheng B, Yang D (2009) Accelerating sequence searching: dimensionality reduction method. Knowl Inf Syst 20(3): 301–322
Stanoi I, Riedewald M, Agrawal D, El Abbadi A (2001) Discovery of influence sets in frequently updated databases. In: Proceeding of VLDB. pp 99–108
Stanoi I, Agrawal D, Abbadi AE (2000) Reverse nearest neighbor queries for dynamic databases. In: Proceedings of ACM SIGMOD workshop on research issues in data mining and knowledge discovery. pp 44–53
Tao Y, Papadias D, Lian X, Xiao X (2007) Multi-dimensional reverse kNN search. VLDB J 16(3): 293–316
Tao Y, Yiu M, Mamoulis N (2006) Reverse nearest neighbor search in metric spaces. IEEE Trans Knowl Data Eng 18(9): 1239–1252
Uhlmann JK (1991) Satisfying general proximity/similarity queries with metric trees. Inf Process Lett 40: 175–179
Vidal E (1986) An algorithm for finding nearest neighbors in (approximately) constant average time. Pattern Recognit Lett 4: 145–157
Yianilos P (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of ACM-SIAM symposium on discrete algorithms. pp 311–321
Yianilos P (1999) Excluded middle vantage point forest for nearest neighbor search. In: DIMACS implementation challenge, ALENEX’99, Baltimore, MD
Yang C, Lin K-I (2001) An index structure for efficient reverse nearest neighbor queries. In: Proceedings of IEEE international conference on data engineering. pp 485–492
Yiu M, Papadias D, Mamoulis N, Tao Y (2006) Reverse nearest neighbor in large graphs. IEEE Trans Knowl Data Eng 18(4): 540–553
Yiu M, Mamoulis N (2007) Reverse nearest neighbor search in Ad-hoc subspaces. IEEE TKDE 19(3): 412–426
Zhang M, Alhajj R (2010) Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space. Knowl Inf Syst 22(1): 1–26
Zhang M, Alhajj R, Rokne J (2008) Optimal incremental multi-step nearest-neighbor search. In: Proceedings of ACM international conference on advances in geographic information systems
The source code is available at. http://www.cse.cuhk.edu.hk/~taoyf/paper/tkde06-rnn-metric.html
The data set is available at. http://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.html
The data set is available at. http://kodiak.cs.cornell.edu/kddcup/datasets.html
The data set is available at. http://kdd.ics.uci.edu/databases/covertype/covertype.html
The data set is available at. http://archive.ics.uci.edu/ml/datasets/Poker+Hand
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, M., Alhajj, R. Effectiveness of NAQ-tree in handling reverse nearest-neighbor queries in high-dimensional metric space. Knowl Inf Syst 31, 307–343 (2012). https://doi.org/10.1007/s10115-011-0405-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-011-0405-5