Abstract
For decades, query processing over uncertain databases has received much attention from the database community due to the pervasive data uncertainty in many real-world applications such as location-based services (LBS), sensor networks, business planning, biological databases, and so on. In this paper, we will study a novel query type, namely range-constrained probabilistic mutual furthest neighbor query (PMFN), over uncertain databases. PMFN retrieves a set of object pairs, \((o_i, o_j)\), within a given query range Q, such that uncertain objects \(o_i\) and \(o_j\) are furthest neighbors of each other with high probabilities. In order to efficiently tackle the PMFN problem, we propose effective pruning methods, range, convex hull, and hypersphere pruning, for filtering out uncertain objects that can never appear in the PMFN answer set. Then, we also design spatial and probabilistic pruning methods to rule out false alarms of PMFN candidate pairs. Finally, we utilize a variant of the R\(^*\)-tree to integrate our proposed pruning methods and efficiently process ad hoc PMFN queries. Extensive experiments show the efficiency and effectiveness of our pruning techniques and PMFN query processing algorithms over real and synthetic data sets.












Similar content being viewed by others
References
Aggarwal A, Kravets D (1989) A linear time algorithm for finding all farthest neighbors in a convex polygon. Inf Process Lett 31(1):17–20
Amagata D, Hara T, Xiao C (2019) Dynamic set knn self-join. In: 2019 IEEE 35th international conference on data engineering (ICDE), pp 818–829
Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. In: ACM SIGMOD Record, vol 19
Beskales G, Soliman MA, Ilyas IF (2008) Efficient search for the top-k probable nearest neighbors in uncertain databases. Proc VLDB Endow 1(1):326–339
Böhm C, Kriegel H-P (2001) Determining the convex hull in large multidimensional databases. In: Kambayashi Y, Winiwarter W, Arikawa M (eds) Data warehousing and knowledge discovery. Springer Berlin Heidelberg, Berlin, pp 294–306
Chan TM (1996) Optimal output-sensitive convex hull algorithms in two and three dimensions. Discrete Comput Geom 16(4):361–368
Chazelle B (1993) An optimal convex hull algorithm in any fixed dimension. Discrete Comput Geom 10(4):377–409
Chen L, Gao Y, Zhong A, Jensen CS, Chen G, Zheng B (2017) Indexing metric uncertain data for range queries and range joins. VLDB J 26(4):585–610
Chen Y, Zhao L, Mei P (2019) Monochromatic mutual nearest neighbor queries over uncertain data. In: Sun X, Pan Z, Bertino E (eds) Artificial Intelligence and Security. Springer, Cham, pp 617–629
Cheng R, Kalashnikov DV, Prabhakar S (2004) Querying imprecise data in moving object environments. IEEE Trans Knowl Data Eng 16(9):1112–1127
Chen L, Gao Y, Li X, Jensen CS, Chen G, Zheng B (2015) Indexing metric uncertain data for range queries. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD’15. Association for Computing Machinery, New York, NY, USA, pp 951–965
Cheng R, Kalashnikov DV, Prabhakar S (2003) Evaluating probabilistic queries over imprecise data. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD’03, New York, NY, USA. ACM, pp 551–562
Cho H-J, Attique M (2020) Group processing of multiple k-farthest neighbor queries in road networks. IEEE Access 8:110959–110973
Dalvi N, Suciu D (2007) Efficient query evaluation on probabilistic databases. VLDB J 16(4)
Fomin FV, Golovach PA, Jaffke L, Philip G, Sagunov D (2020) Diverse pairs of matchings. In: Cao Y, Cheng S, Li M (eds) 31st International symposium on algorithms and computation, ISAAC 2020, December 14-18, 2020, Hong Kong, China (Virtual Conference), volume 181 of LIPIcs. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, pp 26:1–26:12
Fort M, Sellarès JA (2016) Efficient multiple bichromatic mutual nearest neighbor query processing. Inf Syst 62(C):136–154
Gao Y, Zheng B, Chen G, Li Q (2009) On efficient mutual nearest neighbor query processing in spatial databases. Data Knowl Eng 68(8):705–727
Gao Y, Zheng B, Chen G, Li Q, Chen C, Chen G (2010) Efficient mutual nearest neighbor query processing for moving object trajectories. Inf Sci 180(11):2176–2195
Gao Y, Miao X, Chen G, Zheng B, Cai D, Cui H (2017) On efficiently finding reverse k-nearest neighbors over uncertain graphs. VLDB J 26(4):467–492
Graham R (1972) An efficient algorithm for determining the convex hull of a finite planar set. Inf Process Lett 132–133
Hjaltason GR, Samet H (1995) Ranking in spatial databases. Springer Berlin Heidelberg, Berlin, pp 83–95
Hjaltason GR, Samet H (1999) Distance browsing in spatial databases. ACM Trans Database Syst 24(2):265–318
Jiang T, Gao Y, Zhang B, Lin D, Li Q (2014) Monochromatic and bichromatic mutual skyline queries. Expert Syst Appl 41(4):1885–1900
Jiang T, Zhang B, Lin D, Gao Y, Li Q (2020) Efficient column-oriented processing for mutual subspace skyline queries. Soft Comput 24(20):15427–15445
Korn F, Muthukrishnan S (2000) Influence sets based on reverse nearest neighbor queries. SIGMOD Rec 29(2):201–212
Kumar Y, Janardan R, Gupta P (2008) Efficient algorithms for reverse proximity query problems. In: Proceedings of the 16th ACM SIGSPATIAL international conference on advances in geographic information systems, GIS’08. ACM, New York, NY, USA, pp 39:1–39:10
Lian X, Chen L (2008) Probabilistic group nearest neighbor queries in uncertain databases. IEEE Trans Knowl Data Eng 20(6):809–824
Lian X, Chen L (2009) Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data. VLDB J 18(3):787–808
Lian X, Chen L (2008) Monochromatic and bichromatic reverse skyline search over uncertain databases. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, SIGMOD’08. ACM, New York, NY, USA, pp 213–226
Lian X, Chen L (2009) Top-k dominating queries in uncertain databases. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, EDBT’09. ACM, New York, NY, USA, pp 660–671
Lian X, Chen L, Wang G (2011) Finding the least influenced set in uncertain databases. Inf Syst 36(2):359–385. Special Issue: Semantic Integration of Data, Multimedia, and Services
Liu Y, Gong X, Kong D, Hao T, Yan X (2020) A voronoi-based group reverse k farthest neighbor query method in the obstacle space. IEEE Access 8:50659–50673
Long beach county roads (2012)
Mujeeb-u Rehman M, Yang X, Dong J, Abdul Ghafoor M (2005) Heterogeneous and homogenous pairs in pair programming: an empirical analysis. In: Canadian conference on electrical and computer engineering 2005, pp 1116–1119
Pei J, Jiang B, Lin X, Yuan Y (2007) Probabilistic skylines on uncertain data. In: Proceedings of the 33rd international conference on very large data bases, VLDB’07. VLDB Endowment, pp 15–26
Potamias M, Bonchi F, Gionis A, Kollios G (2010) K-nearest neighbors in uncertain graphs. Proc VLDB Endow 3(1–2):997–1008
Preparata FP, Hong SJ (1977) Convex hulls of finite sets of points in two and three dimensions. Commun ACM 20(2):87–93
Singh A, Ferhatosmanoglu H, Tosun Ac (2003) High dimensional reverse nearest neighbor queries. In: Proceedings of the twelfth international conference on information and knowledge management, CIKM’03. ACM, New York, NY, USA, pp 91–98
Soliman MA, Ilyas IF, Chang KC (2007) Top-k query processing in uncertain databases. In: 2007 IEEE 23rd international conference on data engineering, pp 896–905
Streets (polylines) of Germany (2012)
Tao Y, Papadias D, Lian X (2004) Reverse knn search in arbitrary dimensionality. In: Proceedings of the thirtieth international conference on very large data Bases - Volume 30, VLDB’04. VLDB Endowment, pp 744–755
Theodoridis Y, Sellis T (1996) A model for the prediction of r-tree performance. In: Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, PODS’96. ACM, New York, NY, USA, pp 161–171
Toussaint GT (1983) The symmetric all-furthest-neighbor problem. Comput Math Appl 9:747–754
Wang X, Liu S, Du P, Liang H, Xia J, Li Y (2018) Object-based change detection in urban areas from high spatial resolution images based on multiple features and ensemble learning. Remote Sens 10(2):276
Wang S, Cheema MA, Lin X, Zhang Y, Liu D (2016) Efficiently computing reverse k furthest neighbors. In: 2016 IEEE 32nd international conference on data engineering (ICDE), pp 1110–1121
Yu C, Ooi BC, Lee Tan K, Jagadish HV (2001) Indexing the distance: an efficient method to knn processing
Zheng K, Fung PC, Zhou X (2010) K-nearest neighbor search for fuzzy objects. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, SIGMOD’10, New York, NY, USA. ACM, pp 699–710
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bavi, K., Lian, X. Range-constrained probabilistic mutual furthest neighbor queries in uncertain databases. Knowl Inf Syst 65, 2375–2402 (2023). https://doi.org/10.1007/s10115-022-01807-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-022-01807-0