Abstract
Nearest-neighbor queries, which ask for returning the nearest neighbor of a query point in a set of points, are important and widely studied in many fields because of a wide range of applications. In many of these applications, such as sensor databases, location based services, face recognition, and mobile data, the location of data is imprecise. We therefore study nearest-neighbor queries in a probabilistic framework in which the location of each input point and/or query point is specified as a probability density function and the goal is to return the point that minimizes the expected distance, which we refer to as the expected nearest neighbor (\(\mathop {\mathrm {ENN}}\)). We present methods for computing an exact \(\mathop {\mathrm {ENN}}\) or an \(\varepsilon \)-approximate \(\mathop {\mathrm {ENN}}\), for a given error parameter \(0<\varepsilon < 1\), under different distance functions. These methods build a data structure of near-linear size and answer \(\mathop {\mathrm {ENN}}\) queries in polylogarithmic or sublinear time, depending on the underlying function. As far as we know, these are the first nontrivial methods for answering exact or \(\varepsilon \)-approximate \(\mathop {\mathrm {ENN}}\) queries with provable performance guarantees. Moreover, we extend our results to answer exact or \(\varepsilon \)-approximate k-\(\mathop {\mathrm {ENN}}\) queries.
Similar content being viewed by others
Notes
If the location of data is precise, we refer to it as certain.
The squared Euclidean distance between two points \(p, q \in {\mathbb {R}}^d\) is \(\Vert p-q \Vert ^2\) where \(\Vert \cdot \Vert \) is the \(L_2\) metric.
We note that we are assuming a model of computation in which some basic primitive operations on functions of constant-description complexity (e.g. Gaussian distribution) can be performed in O(1) time.
If there are multiple points of \(\mathcal {P}\) at the ith smallest expected distance from Q, then we break ties arbitrarily and choose \(\varphi _i(\mathcal {P}, Q)\) to be one of these points. We do the same for \(\varphi (\mathcal {P},Q)\).
For simplicity, we focus on P having a continuous pdf; the argument holds for a discrete pdf as well.
If a square \(\square \) appears in multiple \(\mathcal {B}_i\)’s, we keep only one copy of \(\square \) in \(\mathcal {B}_{\mathrm {in}}\) and \(\delta _{\square }\) is the minimum of the values associated with the different copies of \(\square \).
References
Afshani, P., Chan, T.M.: Optimal halfspace range reporting in three dimensions. In: Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’09), pp. 180–186. SIAM, Philadelphia (2009)
Agarwal, P.K., Aronov, B., Har-Peled, S., Phillips, J.M., Yi, K., Zhang, W.: Nearest neighbor searching under uncertainty II. In: Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS’13), pp. 115–126. ACM, New York (2013)
Agarwal, P.K., Cheng, S.-W., Yi, K.: Range searching on uncertain data. ACM Trans. Algorithms 8(4), Article No. 43 (2012). doi:10.1145/2344422.2344433
Agarwal, P.K., Har-Peled, S., Sharir, M., Wang, Y.: Hausdorff distance under translation for points and balls. ACM Trans. Algorithms 6(4), Article No. 71 (2010). doi:10.1145/1824777.1824791
Agarwal, P.K., Matoušek, J.: Ray shooting and parametric search. SIAM J. Comput. 22(4), 794–806 (1993)
Agarwal, P.K., Sharir, M.: Arrangements and their applications. In: Sack, J.-R., Urrutia, J. (eds.) Handbook of Computational Geometry, pp. 49–119. North-Holland, Amsterdam (2000)
Aggarwal, C.C. (ed.): Managing and Mining Uncertain Data. Advances in Database Systems, vol. 35. Springer, Berlin (2009)
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)
Arya, S., Malamatos, T., Mount, D.M.: Space–time tradeoffs for approximate nearest neighbor searching. J. ACM 57(54), Article No. 1 (2009). doi:10.1145/1613676.1613677
Aurenhammer, F., Klein, R.: Voronoi diagrams. In: Sack, J.-R., Urrutia, J. (eds.) Handbook of Computational Geometry, pp. 201–290. North-Holland, Amsterdam (2000)
Aurenhammer, F., Klein, R., Lee, D.-T.: Voronoi Diagrams and Delaunay Triangulations. World Scientific, Hackensack (2013)
Beskales, G., Soliman, M.A., IIyas, I.F.: Efficient search for the top-\(k\) probable nearest neighbors in uncertain databases. VLDB 1(1), 326–339 (2008)
Blömer, J.: Computing sums of radicals in polynomial time. In: Proceedings of the 32nd Annual Symposium on Foundations of Computer Science (FOCS’91), pp. 670–677. IEEE, Los Alamitos (1991)
Cabello, S.: Approximation algorithms for spreading points. J. Algorithms 62(2), 49–73 (2007)
Cabello, S., van Kreveld, M.J.: Approximation algorithms for aligning points. In: Proceedings of the 19th Annual Symposium on Computational Geometry (SCD’13), pp. 20–28. ACM, New York (2003)
Chan, T.M.: Low-dimensional linear programming with violations. SIAM J. Comput. 34(4), 879–893 (2005)
Chazelle, B.: On the convex layers of a planar set. IEEE Trans. Inf. Theory 31(4), 509–517 (1985)
Chazelle, B., Guibas, L.J.: Fractional cascading. I. A data structuring technique. Algorithmica 1(2), 133–162 (1986)
Cheng, R., Chen, J., Mokbel, M., Chow, C.: Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: Proceedings of the IEEE 24th International Conference on Data Engineering (ICDE’08), pp. 973–982. IEEE, Los Alamitos (2008)
Cheng, R., Chen, L., Chen, J., Xie, X.: Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (EDBT’09), pp. 672–683. ACM, New York (2009)
Cheng, R., Xie, X., Yiu, M.L., Chen, J., Sun, L.: UV-diagram: a Voronoi diagram for uncertain data. In: Proceedings of the IEEE 26th International Conference on Data Engineering (ICDE’10), pp. 796–807. IEEE, Los Alamitos (2010)
Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data and expected ranks. In: Proceedings of the IEEE 25th International Conference on Data Engineering (ICDE’09), pp. 305–316. IEEE, Los Alamitos (2009)
Dalvi, N., Ré, C., Suciu, D.: Probabilistic databases: diamonds in the dirt. Commun. ACM 52(7), 86–94 (2009)
de Berg, M., Cheong, O., van Kreveld, M., Overmars, M., Schwarzkopf, O.: Computational Geometry: Algorithms and Applications, 2nd edn. Springer, Berlin (2000)
Driscoll, J.R., Sarnak, N., Sleator, D.D., Tarjan, R.E.: Making data structures persistent. J. Comput. Syst. Sci. 38(1), 86–124 (1989)
Guibas, L., Hershberger, J., Snoeyink, J.: Compact interval trees: a data structure for convex hulls. In: Proceedings of the 1st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’90), pp. 169–178. SIAM, Philadelphia (1990)
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data (SIGMOD’84), pp. 47–57. ACM, New York (1984)
Har-Peled, S.: Geometric Approximation Algorithms. Mathematical Surveys and Monographs, vol. 173. American Mathematical Society, Providence (2011)
Har-Peled, S., Kumar, N.: Down the rabbit hole: robust proximity search and density estimation in sublinear space. SIAM J. Comput. 43(4), 1486–1511 (2014)
Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD’08), pp. 673–686. ACM, New York (2008)
Jooyandeh, M., Mohades, A., Mirzakhah, M.: Uncertain Voronoi diagram. Inf. Process. Lett. 109(13), 709–712 (2009)
Kamousi, P., Chan, T.M., Suri, S.: Closest pair and the post office problem for stochastic points. In: Dehne, F., Iacono, J., Sack, J.-R. (eds.) Algorithms and Data Structures (WADS’11). Lecture Notes in Computer Science, vol. 6844, pp. 548–559. Springer, Heidelberg (2011)
Kriegel, H., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: Kotagiri, Ramamohanarao, et al. (eds.) Advances in Databases: Concepts, Systems and Applications (DASFAA’07). Lecture Notes in Computer Science, vol. 4443, pp. 337–348. Springer, Berlin (2007)
Li, Y., Li, F., Yi, K., Yao, B., Wang, M.: Flexible aggregate similarity search. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (SIGMOD’11), pp. 1009–1020. ACM, New York (2011)
Li, H., Lu, H., Huang, B., Huang, Z.: Two ellipse-based pruning methods for group nearest neighbor queries. In: Proceedings of the 13th Annual ACM International Workshop on Geographic Information Systems (GIS’05), pp. 192–199. ACM, New York (2005)
Li, F., Yao, B., Kumar, P.: Group enclosing queries. IEEE Trans. Knowl. Data Eng. 23(10), 1526–1540 (2011)
Lian, X., Chen, L.: Probabilistic group nearest neighbor queries in uncertain databases. IEEE Trans. Knowl. Data Eng. 20(6), 809–824 (2008)
Ljosa, V., Singh, A.K.: APLA: indexing arbitrary probability distributions. In: Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE’07), pp. 946–955. IEEE, Los Alamitos (2007)
Löffler, M., van Kreveld, M.: Largest bounding box, smallest diameter, and related problems on imprecise points. Comput. Geom. 43(4), 419–433 (2010)
Luo, Y., Chen, H., Furuse, K., Ohbo, N.: Efficient methods in finding aggregate nearest neighbor by projection-based filtering. In: Gervasi, O., Gavrilova, M.L. (eds.) Computational Science and Its Applications (ICCSA’07). Lecture Notes in Computer Science, vol. 4707, pp. 821–833. Springer, Berlin (2007)
Matoušek, J.: Reporting points in halfspaces. Comput. Geom. 2(3), 169–186 (1992)
Papadias, D., Shen, Q., Tao, Y., Mouratidis, K.: Group nearest neighbor queries. In: Proceedings of the 20th International Conference on Data Engineering (ICDE’04), pp. 301–312. IEEE, Washington, DC (2004)
Papadopoulos, A.N., Manolopoulos, Y.: Nearest Neighbor Search: A Database Perspective. Series in Computer Science. Springer, New York (2005)
Renz, M., Mamoulis, N., Emrich, T., Tang, Y., Cheng, R., Züfle, A., Zhang, P.: Voronoi-based nearest neighbor search for multi-dimensional uncertain databases. In: Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE’13), pp. 158–169. IEEE, Washington, DC (2013)
Sarnak, N., Tarjan, R.E.: Planar point location using persistent search trees. Commun. ACM 29(7), 669–679 (1986)
Sember, J., Evans, W.: Guaranteed Voronoi diagrams of uncertain sites. In: Proceedings of the 20th Canadian Conference on Computational Geometry (CCCG’08), pp. 207–210. CCCG (2008)
Shakhnarovich, G., Darrell, T., Indyk, P.: Nearest-neighbor searching and metric space dimensions. In: Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, pp. 15–59. MIT Press, Cambridge, MA (2006)
Sharifzadeh, M., Shahabi, C.: VoR-tree: R-trees with Voronoi diagrams for efficient processing of spatial nearest neighbor queries. VLDB 3(1–2), 1231–1242 (2010)
Sharir, M., Agarwal, P.K.: Davenport–Schinzel Sequences and Their Geometric Applications. Cambridge University Press, Cambridge (1995)
Soliman, M.A., Ilyas, I.F., Chang, K.C.-.C.: Top-\(k\) query processing in uncertain databases. In: Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE’07), pp. 896–905. IEEE, Los Alamitos (2007)
van Kreveld, M., Löffler, M., Mitchell, J.S.B.: Preprocessing imprecise points and splitting triangulations. SIAM J. Comput. 39(7), 2990–3000 (2010)
Wang, H., Zhang, W.: On top-\(k\) weighted sum aggregate nearest and farthest neighbors in the \(L_1\) plane. arXiv:1211.5084 (2012)
Yiu, M.L., Mamoulis, N., Papadias, D.: Aggregate nearest neighbor queries in road networks. IEEE Trans. Knowl. Data Eng. 17(6), 820–833 (2005)
Yuen, S.M., Tao, Y., Xiao, X., Pei, J., Zhang, D.: Superseding nearest neighbor search on uncertain spatial databases. IEEE Trans. Knowl. Data Eng. 22(7), 1041–1055 (2010)
Acknowledgements
The work of P. Agarwal and W. Zhang was supported by NSF under Grants CCF-09-40671, CCF-10-12254, and CCF-11-61359, by ARO Grants W911NF-07-1-0376 and W911NF-08-1-0452, and by an ERDC contract W9132V-11-C-0003. The work of A. Efrat and S. Sankararaman was supported by NSF CAREER Grant 0348000.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor in Charge: Kenneth Clarkson
A preliminary version appeared as “Nearest-neighbor searching under uncertainty, Proc. 31st ACM Sympos. Principles of Database Systems, 2012, 225–236”.
Rights and permissions
About this article
Cite this article
Agarwal, P.K., Efrat, A., Sankararaman, S. et al. Nearest-Neighbor Searching Under Uncertainty I. Discrete Comput Geom 58, 705–745 (2017). https://doi.org/10.1007/s00454-017-9903-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00454-017-9903-x
Keywords
- Queries on uncertain data
- Nearest-neighbor queries
- Approximate nearest neighbor \((\mathop {\mathrm {ANN}})\)