Skip to main content
Log in

Range-constrained probabilistic mutual furthest neighbor queries in uncertain databases

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

For decades, query processing over uncertain databases has received much attention from the database community due to the pervasive data uncertainty in many real-world applications such as location-based services (LBS), sensor networks, business planning, biological databases, and so on. In this paper, we will study a novel query type, namely range-constrained probabilistic mutual furthest neighbor query (PMFN), over uncertain databases. PMFN retrieves a set of object pairs, \((o_i, o_j)\), within a given query range Q, such that uncertain objects \(o_i\) and \(o_j\) are furthest neighbors of each other with high probabilities. In order to efficiently tackle the PMFN problem, we propose effective pruning methods, range, convex hull, and hypersphere pruning, for filtering out uncertain objects that can never appear in the PMFN answer set. Then, we also design spatial and probabilistic pruning methods to rule out false alarms of PMFN candidate pairs. Finally, we utilize a variant of the R\(^*\)-tree to integrate our proposed pruning methods and efficiently process ad hoc PMFN queries. Extensive experiments show the efficiency and effectiveness of our pruning techniques and PMFN query processing algorithms over real and synthetic data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Aggarwal A, Kravets D (1989) A linear time algorithm for finding all farthest neighbors in a convex polygon. Inf Process Lett 31(1):17–20

    Article  MathSciNet  MATH  Google Scholar 

  2. Amagata D, Hara T, Xiao C (2019) Dynamic set knn self-join. In: 2019 IEEE 35th international conference on data engineering (ICDE), pp 818–829

  3. Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In: ACM SIGMOD Record, vol 19

  4. Beskales G, Soliman MA, Ilyas IF (2008) Efficient search for the top-k probable nearest neighbors in uncertain databases. Proc VLDB Endow 1(1):326–339

    Article  Google Scholar 

  5. Böhm C, Kriegel H-P (2001) Determining the convex hull in large multidimensional databases. In: Kambayashi Y, Winiwarter W, Arikawa M (eds) Data warehousing and knowledge discovery. Springer Berlin Heidelberg, Berlin, pp 294–306

    Chapter  Google Scholar 

  6. Chan TM (1996) Optimal output-sensitive convex hull algorithms in two and three dimensions. Discrete Comput Geom 16(4):361–368

    Article  MathSciNet  MATH  Google Scholar 

  7. Chazelle B (1993) An optimal convex hull algorithm in any fixed dimension. Discrete Comput Geom 10(4):377–409

    Article  MathSciNet  MATH  Google Scholar 

  8. Chen L, Gao Y, Zhong A, Jensen CS, Chen G, Zheng B (2017) Indexing metric uncertain data for range queries and range joins. VLDB J 26(4):585–610

    Article  Google Scholar 

  9. Chen Y, Zhao L, Mei P (2019) Monochromatic mutual nearest neighbor queries over uncertain data. In: Sun X, Pan Z, Bertino E (eds) Artificial Intelligence and Security. Springer, Cham, pp 617–629

    Chapter  Google Scholar 

  10. Cheng R, Kalashnikov DV, Prabhakar S (2004) Querying imprecise data in moving object environments. IEEE Trans Knowl Data Eng 16(9):1112–1127

    Article  Google Scholar 

  11. Chen L, Gao Y, Li X, Jensen CS, Chen G, Zheng B (2015) Indexing metric uncertain data for range queries. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD’15. Association for Computing Machinery, New York, NY, USA, pp 951–965

  12. Cheng R, Kalashnikov DV, Prabhakar S (2003) Evaluating probabilistic queries over imprecise data. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD’03, New York, NY, USA. ACM, pp 551–562

  13. Cho H-J, Attique M (2020) Group processing of multiple k-farthest neighbor queries in road networks. IEEE Access 8:110959–110973

    Article  Google Scholar 

  14. Dalvi N, Suciu D (2007) Efficient query evaluation on probabilistic databases. VLDB J 16(4)

  15. Fomin FV, Golovach PA, Jaffke L, Philip G, Sagunov D (2020) Diverse pairs of matchings. In: Cao Y, Cheng S, Li M (eds) 31st International symposium on algorithms and computation, ISAAC 2020, December 14-18, 2020, Hong Kong, China (Virtual Conference), volume 181 of LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, pp 26:1–26:12

  16. Fort M, Sellarès JA (2016) Efficient multiple bichromatic mutual nearest neighbor query processing. Inf Syst 62(C):136–154

    Article  Google Scholar 

  17. Gao Y, Zheng B, Chen G, Li Q (2009) On efficient mutual nearest neighbor query processing in spatial databases. Data Knowl Eng 68(8):705–727

    Article  Google Scholar 

  18. Gao Y, Zheng B, Chen G, Li Q, Chen C, Chen G (2010) Efficient mutual nearest neighbor query processing for moving object trajectories. Inf Sci 180(11):2176–2195

    Article  Google Scholar 

  19. Gao Y, Miao X, Chen G, Zheng B, Cai D, Cui H (2017) On efficiently finding reverse k-nearest neighbors over uncertain graphs. VLDB J 26(4):467–492

    Article  Google Scholar 

  20. Graham R (1972) An efficient algorithm for determining the convex hull of a finite planar set. Inf Process Lett 132–133

  21. Hjaltason GR, Samet H (1995) Ranking in spatial databases. Springer Berlin Heidelberg, Berlin, pp 83–95

    Book  Google Scholar 

  22. Hjaltason GR, Samet H (1999) Distance browsing in spatial databases. ACM Trans Database Syst 24(2):265–318

    Article  Google Scholar 

  23. Jiang T, Gao Y, Zhang B, Lin D, Li Q (2014) Monochromatic and bichromatic mutual skyline queries. Expert Syst Appl 41(4):1885–1900

    Article  Google Scholar 

  24. Jiang T, Zhang B, Lin D, Gao Y, Li Q (2020) Efficient column-oriented processing for mutual subspace skyline queries. Soft Comput 24(20):15427–15445

    Article  MATH  Google Scholar 

  25. Korn F, Muthukrishnan S (2000) Influence sets based on reverse nearest neighbor queries. SIGMOD Rec 29(2):201–212

    Article  Google Scholar 

  26. Kumar Y, Janardan R, Gupta P (2008) Efficient algorithms for reverse proximity query problems. In: Proceedings of the 16th ACM SIGSPATIAL international conference on advances in geographic information systems, GIS’08. ACM, New York, NY, USA, pp 39:1–39:10

  27. Lian X, Chen L (2008) Probabilistic group nearest neighbor queries in uncertain databases. IEEE Trans Knowl Data Eng 20(6):809–824

    Article  Google Scholar 

  28. Lian X, Chen L (2009) Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data. VLDB J 18(3):787–808

    Article  Google Scholar 

  29. Lian X, Chen L (2008) Monochromatic and bichromatic reverse skyline search over uncertain databases. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, SIGMOD’08. ACM, New York, NY, USA, pp 213–226

  30. Lian X, Chen L (2009) Top-k dominating queries in uncertain databases. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, EDBT’09. ACM, New York, NY, USA, pp 660–671

  31. Lian X, Chen L, Wang G (2011) Finding the least influenced set in uncertain databases. Inf Syst 36(2):359–385. Special Issue: Semantic Integration of Data, Multimedia, and Services

  32. Liu Y, Gong X, Kong D, Hao T, Yan X (2020) A voronoi-based group reverse k farthest neighbor query method in the obstacle space. IEEE Access 8:50659–50673

    Article  Google Scholar 

  33. Long beach county roads (2012)

  34. Mujeeb-u Rehman M, Yang X, Dong J, Abdul Ghafoor M (2005) Heterogeneous and homogenous pairs in pair programming: an empirical analysis. In: Canadian conference on electrical and computer engineering 2005, pp 1116–1119

  35. Pei J, Jiang B, Lin X, Yuan Y (2007) Probabilistic skylines on uncertain data. In: Proceedings of the 33rd international conference on very large data bases, VLDB’07. VLDB Endowment, pp 15–26

  36. Potamias M, Bonchi F, Gionis A, Kollios G (2010) K-nearest neighbors in uncertain graphs. Proc VLDB Endow 3(1–2):997–1008

    Article  Google Scholar 

  37. Preparata FP, Hong SJ (1977) Convex hulls of finite sets of points in two and three dimensions. Commun ACM 20(2):87–93

    Article  MathSciNet  MATH  Google Scholar 

  38. Singh A, Ferhatosmanoglu H, Tosun Ac (2003) High dimensional reverse nearest neighbor queries. In: Proceedings of the twelfth international conference on information and knowledge management, CIKM’03. ACM, New York, NY, USA, pp 91–98

  39. Soliman MA, Ilyas IF, Chang KC (2007) Top-k query processing in uncertain databases. In: 2007 IEEE 23rd international conference on data engineering, pp 896–905

  40. Streets (polylines) of Germany (2012)

  41. Tao Y, Papadias D, Lian X (2004) Reverse knn search in arbitrary dimensionality. In: Proceedings of the thirtieth international conference on very large data Bases - Volume 30, VLDB’04. VLDB Endowment, pp 744–755

  42. Theodoridis Y, Sellis T (1996) A model for the prediction of r-tree performance. In: Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, PODS’96. ACM, New York, NY, USA, pp 161–171

  43. Toussaint GT (1983) The symmetric all-furthest-neighbor problem. Comput Math Appl 9:747–754

    Article  MathSciNet  MATH  Google Scholar 

  44. Wang X, Liu S, Du P, Liang H, Xia J, Li Y (2018) Object-based change detection in urban areas from high spatial resolution images based on multiple features and ensemble learning. Remote Sens 10(2):276

    Article  Google Scholar 

  45. Wang S, Cheema MA, Lin X, Zhang Y, Liu D (2016) Efficiently computing reverse k furthest neighbors. In: 2016 IEEE 32nd international conference on data engineering (ICDE), pp 1110–1121

  46. Yu C, Ooi BC, Lee Tan K, Jagadish HV (2001) Indexing the distance: an efficient method to knn processing

  47. Zheng K, Fung PC, Zhou X (2010) K-nearest neighbor search for fuzzy objects. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, SIGMOD’10, New York, NY, USA. ACM, pp 699–710

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiang Lian.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bavi, K., Lian, X. Range-constrained probabilistic mutual furthest neighbor queries in uncertain databases. Knowl Inf Syst 65, 2375–2402 (2023). https://doi.org/10.1007/s10115-022-01807-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-022-01807-0

Keywords

Navigation