Skip to main content

Processing Reverse Nearest Neighbor Queries Based on Unbalanced Multiway Region Tree Index

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2023 (WISE 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14306))

Included in the following conference series:

  • 683 Accesses

Abstract

In many applications and scenarios, there are opportunities for processing reverse nearest neighbor (RNN) queries, which are derived from and more complex than nearest neighbor (NN) queries. Generally, processing NN queries involves sophisticated data structures and methods, and has been very well addressed for low-dimensional data (usually less than 10); while efficiently processing exact NN or RNN queries for high dimensional data remains a challenging problem. This paper proposes an algorithm of evaluating RNN queries in higher dimensional lp spaces. The main idea of our algorithm is that an RNN query can be processed efficiently based on relevant information easily available and retrievable from memory. The data space containing a finite dataset is divided into multiple small regions forming an unbalanced multiway region tree, then an index containing important information is created by using the tree and the sorted lists of tuples in the dataset. The algorithm consists of two pruning approaches and a verification method based on the index and the characteristics of lp spaces. Extensive experiments are conducted to demonstrate the excellent performance of our algorithm over various datasets and to show that it outperforms existing state-of-the-art methods CSD, VR-RNN, SFT and TPL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Allheeib, N., Adhinugraha, K., Taniar, D., Islam, M.S.: Computing reverse nearest neighbourhood on road maps. World Wide Web 25, 99–130 (2022)

    Article  Google Scholar 

  2. Blackard, J.A., Dean, D.J., Anderson, C.W.: UCI repository of machine learning data-bases (1998). http://archive.ics.uci.edu/ml/datasets/Covertype. Accessed 10 Aug 2022

  3. Borodin, A., Ostrovsky, R., Rabani, Y.: Lower bounds for high dimensional nearest neighbor search and related problems. In: Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing (STOC 1999), pp. 312–321 (1999)

    Google Scholar 

  4. Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: a multidimensional workload-aware histogram. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (SIGMOD 2001), pp. 211–222 (2001)

    Google Scholar 

  5. Casanova, G., et al.: Dimensional testing for reverse k-nearest neighbor search. Proc. VLDB Endowment 10(7), 769–780 (2017)

    Article  Google Scholar 

  6. Chahal, H., Toner, H., Rahkovsky, I.: Small data’s big AI potential. Center for Security and Emerging Technology (2021). https://cset.georgetown.edu/publication/small-datas-big-ai-potential/. Accessed 26 July 2022

  7. Cheema, M.A., Zhang, W., Lin, X., Zhang, Y.: Efficiently processing snapshot and continuous reverse k nearest neighbors queries. VLDB J. 21(5), 703–728 (2012)

    Article  Google Scholar 

  8. Das, R., Biswas, S.K., Devi, D., Sarma, B.: An oversampling technique by integrating reverse nearest neighbor in SMOTE: Reverse-SMOTE. In: 2020 International Conference on Smart Electronics and Communication (ICOSEC), pp. 1239–1244 (2020)

    Google Scholar 

  9. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  10. Guo, Y.-R., Bai, Y.-Q., Li, C.-N., Shao, Y.-H., Ye, Y.-F., Jiang, C.-Z.: Reverse nearest neighbors bhattacharyya bound linear discriminant analysis for multimodal classification. Eng. Appl. Artif. Intell. 97, 104033 (2021)

    Article  Google Scholar 

  11. Har-Peled, S., Indyk, P., Motwani, R.: Approximate nearest neighbor: towards removing the curse of dimensionality. Theory Comput. 8(1), 321–350 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  12. Hu, L., Liu, H., Zhang, J., Liu, A.: KR-DBSCAN: a density-based clustering algorithm based on reverse nearest neighbor and influence space. Expert Syst. Appl. 186, 115763 (2021)

    Article  Google Scholar 

  13. Jin, P., et al.: Maximizing the influence of bichromatic reverse k nearest neighbors in geo-social networks. World Wide Web 26(4), 1567–1598 (2023)

    Article  Google Scholar 

  14. Khedr, A.M., Raj, P.V.P.: DRNNA: decomposable reverse nearest neighbor algorithm for vertically distributed databases. In: 2021 18th International Multi-Conference on Systems, Signals & Devices (SSD), pp. 681–686 (2021)

    Google Scholar 

  15. Korn, F., Muthukrishnan, S.: Influence sets based on reverse nearest neighbor queries. ACM SIGMOD Rec. 29(2), 201–212 (2000)

    Article  Google Scholar 

  16. Li, Y., Liu, G., Bai, M., Gao, J., Ye, L., Ming, Z.: CSD: Discriminance with conic section for improving reverse k nearest neighbors queries. arXiv:2005.08483 (2020)

  17. Panetta, K.: Gartner top 10 data and analytics trends for 2021 (2021). https://www.gartner.com/smarterwithgartner/gartner-top-10-data-and-analytics-trends-for-2021. Accessed 15 July 2022

  18. Sharifzadeh, M., Shahabi, C.: VoR-tree: R-trees with voronoi diagrams for efficient processing of spatial nearest neighbor queries. Proc. VLDB Endowment 3(1–2), 1231–1242 (2010)

    Article  Google Scholar 

  19. Singh, A., Ferhatosmanoğlu, H., Tosun, A.Ş.: High dimensional reverse nearest neighbor queries. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management (CIKM 2003), pp. 91–98 (2003)

    Google Scholar 

  20. Singh, V., Singh, A.K.: SIMP: accurate and efficient near neighbor search in high dimensional spaces. In: Proceedings of the 15th International Conference on Extending Database Technology (EDBT 2012), pp. 492–503 (2012)

    Google Scholar 

  21. Stanoi, I., Agrawal, D., Abbadi, A.E.: Reverse nearest neighbor queries for dynamic databases. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 44–53 (2000)

    Google Scholar 

  22. Tao, Y., Papadias, D., Lian, X., Xiao, X.: Multi-dimensional reverse kNN search. VLDB J. 16(3), 293–316 (2007)

    Article  Google Scholar 

  23. U.S. Census Bureau. https://www2.census.gov/geo/tiger/TGRGDB21/. Accessed 24 July 2022

  24. Wang, S., Zhang, Y., Lin, X., Cheema, M.A.: Maximize spatial influence of facility bundle considering reverse k nearest neighbors. In: Database Systems for Advanced Applications, DASFAA 2018, pp. 684–700 (2018)

    Google Scholar 

  25. Wu, W., Yang, F., Chan, C.-Y., Tan, K.-L.: FINCH: Evaluating reverse k-nearest-neighbor queries on location data. Proc. VLDB Endowment 1(1), 1056–1067 (2008)

    Article  Google Scholar 

  26. Yang, S., Cheema, M.A., Lin, X., Zhang, Y., Zhang, W.: Reverse k nearest neighbors queries and spatial reverse top-k queries. VLDB J. 26(2), 151–176 (2017)

    Article  Google Scholar 

  27. Zheng, B., Zhao, X., Weng, L., Hung, N.Q.V., Liu, H., Jensen, C.S.: PM-LSH: A fast and accurate LSH framework for high-dimensional approximate NN search. Proc. VLDB Endowment 13(5), 643–655 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Liang Zhu or Xin Song .

Editor information

Editors and Affiliations

Appendix: Proofs for Lemmas

Appendix: Proofs for Lemmas

Lemma 1.

Consider the dataset R and its UR-tree Index with the region set {Pi: i = 0, 1, ⋅⋅⋅, s}. For a query point Q, a region Pi and Ri = R ∩ Pi (0 ≤ i ≤ s), if d(c, Q) > rmax + dmax, then ∀t ∈ Ri, t ∉ RNN(Q), where c is the center-point of Pi, rmax = d(c, y), dmax = max{d(t, tNN) | ∀t ∈ Ri}, and y = (y1, ⋅⋅⋅, yn) is the max-point of Pi.

Proof of Lemma 1:

For a query point Q, a region Pi and Ri = R ∩ Pi, as is shown in Fig.A, d(c, Q) > rmax + dmax = d(c, y) + max{d(t, tNN) | ∀t ∈ Ri}, c is the center-point of Pi, and y = (y1, ⋅⋅⋅, yn) is the max-point of Pi.

Fig. A.
figure 6

Illustration of the proof of Lemma 1.

For arbitrary t ∈ Ri, according to the triangle inequality of distance function, we have

$$ \begin{array}{*{20}c} {d\left( {{\varvec{c}},{\varvec{t}}} \right) + d\left( {{\varvec{t}},Q} \right) \, \ge d\left( {{\varvec{c}},Q} \right)} \\ {d\left( {{\varvec{t}},Q} \right) \, \ge d\left( {{\varvec{c}},Q} \right) - d\left( {{\varvec{c}},{\varvec{t}}} \right)} \\ \end{array} $$

Since d(c, Q) > rmax + dmax, we have.

$$ \begin{array}{*{20}c} {d\left( {{\varvec{t}},Q} \right)\, > \,r_{max} \, + \,d_{max} \, - \,d({\varvec{c}},{\varvec{t}})} \\ {d\left( {{\varvec{t}},Q} \right)\, - \,d_{max} \, > \,r_{max} \, - \,d({\varvec{c}},{\varvec{t}})} \\ \end{array} $$

Moreover, d(c, t) ≤ d(c, y) = rmax as y is the max-point of Pi. Therefore,

$$ \begin{array}{*{20}c} {d\left( {{\varvec{t}},Q} \right)\, - \,d_{max} \, > \,0} \\ {d\left( {{\varvec{t}},Q} \right)\, > \,d_{max} \, = \,max\{ d\left( {{\varvec{t}},{\varvec{t}}_{NN} } \right) \, |\forall {\varvec{t}}\, \in \,{\varvec{R}}_{i} \} \, \ge \,d({\varvec{t}},{\varvec{t}}_{NN} )} \\ \end{array} $$

Thus, query point Q is not the nearest neighbor of tuple t, that is, t ∉ RNN(Q).

We restate Lemma 2 that summarizes some characteristics of lp spaces, and then we prove it.

Lemma 2.

Let two arbitrary points x = (x1, ⋅⋅⋅, xn), y = (y1, ⋅⋅⋅, yn) ∈ ℜn, and a constant σ > 0. Then.

(1°) ||x||p ≤ n(1/p−1/q) ||x||q, for 1 ≤ p < q.

(2°) ||x|| ≤ ||x||p ≤ n1/p||x||, for 1 ≤ p < ∞.

(3°) dp(x, y) > σ if |xi − yi|>σ for some 1 ≤ i ≤ n, where the distance function dp(⋅,⋅) is induced by ||⋅||p.

We will present the proof of (1°) by using Hölder’s inequality.

Hölder’s Inequality:

Assume that r and s are in the open interval (1, ∞) with 1/r + 1/s = 1. Then, for arbitrary a = (a1, a2, ⋅⋅⋅, an), b = (b1, b2, ⋅⋅⋅, bn) ∈ ℜn,

$$ \left| {\left| {{\varvec{ab}}} \right|} \right|_{1} \, \le \,\left| {\varvec{a}} \right|_{r} \left| {\left| {\varvec{b}} \right|} \right|_{s} $$

that is,

$$ \sum\nolimits_{i = 1}^{n} {\left| {a_{i} b_{i} } \right| \le \left( {\sum\nolimits_{i = 1}^{n} {\left| {a_{i} } \right|^{r} } } \right)}^{1/r} \left( {\sum\nolimits_{i = 1}^{n} {\left| {b_{i} } \right|^{s} } } \right)^{1/s} $$

Proof of Lemma 2:

(1°) Let s = q/p and r = q/(q − p). Then s > 1, r > 1 and 1/r + 1/s = 1 since 1 ≤ p < q. Suppose that ai = 1 and bi = |xi|p, i = 1, ⋅⋅⋅, n. By Hölder’s inequality, we have.

$$ \begin{gathered} \sum\nolimits_{i = 1}^{n} {\left| {x_{i} } \right|^{p} = \sum\nolimits_{i = 1}^{n} {(1 \cdot \left| {x_{i} } \right|^{p} )\, = \sum\nolimits_{i = 1}^{n} {\left| {a_{i} b_{i} } \right| \le \left( {\sum\nolimits_{i = 1}^{n} {\left| {a_{i} } \right|^{r} } } \right)} } }^{1/r} \left( {\sum\nolimits_{i = 1}^{n} {\left| {b_{i} } \right|^{s} } } \right)^{1/s} \hfill \\ = \,n^{1/r} \left( {\sum\nolimits_{i = 1}^{n} {\left| {b_{i} } \right|^{s} } } \right)^{1/s} = n^{(q - p)/q} \left( {\sum\nolimits_{i = 1}^{n} {\left| {x_{i} } \right|^{ps} } } \right)^{1/s} = n^{(q - p)/q} \left( {\sum\nolimits_{i = 1}^{n} {\left| {x_{i} } \right|^{q} } } \right)^{p/q} \hfill \\ \end{gathered} $$

Then,

$$ \begin{gathered} \left( {\sum\nolimits_{i = 1}^{n} {\left| {x_{i} } \right|^{p} } } \right)^{1/p} \le \left( {n^{(q - p)/q} \left( {\sum\nolimits_{i = 1}^{n} {\left| {x_{i} } \right|^{q} } } \right)^{p/q} } \right)^{1/p} = \,n^{(q - p)/qp} \left( {\sum\nolimits_{i = 1}^{n} {\left| {x_{i} } \right|^{q} } } \right)^{1/q} \hfill \\ = \,n^{(1/p - 1/q)} \left( {\sum\nolimits_{i = 1}^{n} {\left| {x_{i} } \right|^{q} } } \right)^{1/q} = \,n^{(1/p - 1/q)} \left| {\left| {\varvec{x}} \right|} \right|_{q} \hfill \\ \end{gathered} $$

That is, ||x||p ≤ n(1/p−1/q)||x||q by the definition ||x||p = (\(\sum^{n}_{i=1} \)|xi|p)1/p.

(2°) By the definition of lp-norm ||⋅||p, (1°) and ||x||q → ||x|| when q → ∞, we have.

$$ \left| {\left| {\varvec{x}} \right|} \right|_{\infty } \, \le \,\left| {\left| {\varvec{x}} \right|} \right|_{p} and \, \left| {\left| {\varvec{x}} \right|} \right|_{p} \, \le \,n^{1/p} \left| {\left| {\varvec{x}} \right|} \right|_{\infty } $$

That is,

$$ \left| {\left| {\varvec{x}} \right|} \right|_{\infty } \, \le \,\left| {\left| {\varvec{x}} \right|} \right|_{p} \, \le \,n^{1/p} \left| {\left| {\varvec{x}} \right|} \right|_{\infty } {\text{fo}}r \, 1\, \le \,p\, < \,\infty $$

(3°) If |xi − yi| > σ for some 1 ≤ i ≤ n, then ||x − y|| ≥ |xi − yi| > σ. Thus, dp(x, y) = ||x − y||p ≥ ||x − y|| > σ by using (2°).

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhu, L., Zhang, S., Song, X., Ma, Q., Meng, W. (2023). Processing Reverse Nearest Neighbor Queries Based on Unbalanced Multiway Region Tree Index. In: Zhang, F., Wang, H., Barhamgi, M., Chen, L., Zhou, R. (eds) Web Information Systems Engineering – WISE 2023. WISE 2023. Lecture Notes in Computer Science, vol 14306. Springer, Singapore. https://doi.org/10.1007/978-981-99-7254-8_57

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-7254-8_57

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-7253-1

  • Online ISBN: 978-981-99-7254-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics