Skip to main content
Log in

Nearest-Neighbor Searching Under Uncertainty I

  • Published:
Discrete & Computational Geometry Aims and scope Submit manuscript

Abstract

Nearest-neighbor queries, which ask for returning the nearest neighbor of a query point in a set of points, are important and widely studied in many fields because of a wide range of applications. In many of these applications, such as sensor databases, location based services, face recognition, and mobile data, the location of data is imprecise. We therefore study nearest-neighbor queries in a probabilistic framework in which the location of each input point and/or query point is specified as a probability density function and the goal is to return the point that minimizes the expected distance, which we refer to as the expected nearest neighbor (\(\mathop {\mathrm {ENN}}\)). We present methods for computing an exact \(\mathop {\mathrm {ENN}}\) or an \(\varepsilon \)-approximate \(\mathop {\mathrm {ENN}}\), for a given error parameter \(0<\varepsilon < 1\), under different distance functions. These methods build a data structure of near-linear size and answer \(\mathop {\mathrm {ENN}}\) queries in polylogarithmic or sublinear time, depending on the underlying function. As far as we know, these are the first nontrivial methods for answering exact or \(\varepsilon \)-approximate \(\mathop {\mathrm {ENN}}\) queries with provable performance guarantees. Moreover, we extend our results to answer exact or \(\varepsilon \)-approximate k-\(\mathop {\mathrm {ENN}}\) queries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. If the location of data is precise, we refer to it as certain.

  2. The squared Euclidean distance between two points \(p, q \in {\mathbb {R}}^d\) is \(\Vert p-q \Vert ^2\) where \(\Vert \cdot \Vert \) is the \(L_2\) metric.

  3. We note that we are assuming a model of computation in which some basic primitive operations on functions of constant-description complexity (e.g. Gaussian distribution) can be performed in O(1) time.

  4. If there are multiple points of \(\mathcal {P}\) at the ith smallest expected distance from Q, then we break ties arbitrarily and choose \(\varphi _i(\mathcal {P}, Q)\) to be one of these points. We do the same for \(\varphi (\mathcal {P},Q)\).

  5. For simplicity, we focus on P having a continuous pdf; the argument holds for a discrete pdf as well.

  6. If a square \(\square \) appears in multiple \(\mathcal {B}_i\)’s, we keep only one copy of \(\square \) in \(\mathcal {B}_{\mathrm {in}}\) and \(\delta _{\square }\) is the minimum of the values associated with the different copies of \(\square \).

References

  1. Afshani, P., Chan, T.M.: Optimal halfspace range reporting in three dimensions. In: Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’09), pp. 180–186. SIAM, Philadelphia (2009)

  2. Agarwal, P.K., Aronov, B., Har-Peled, S., Phillips, J.M., Yi, K., Zhang, W.: Nearest neighbor searching under uncertainty II. In: Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS’13), pp. 115–126. ACM, New York (2013)

  3. Agarwal, P.K., Cheng, S.-W., Yi, K.: Range searching on uncertain data. ACM Trans. Algorithms 8(4), Article No. 43 (2012). doi:10.1145/2344422.2344433

  4. Agarwal, P.K., Har-Peled, S., Sharir, M., Wang, Y.: Hausdorff distance under translation for points and balls. ACM Trans. Algorithms 6(4), Article No. 71 (2010). doi:10.1145/1824777.1824791

  5. Agarwal, P.K., Matoušek, J.: Ray shooting and parametric search. SIAM J. Comput. 22(4), 794–806 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  6. Agarwal, P.K., Sharir, M.: Arrangements and their applications. In: Sack, J.-R., Urrutia, J. (eds.) Handbook of Computational Geometry, pp. 49–119. North-Holland, Amsterdam (2000)

    Chapter  Google Scholar 

  7. Aggarwal, C.C. (ed.): Managing and Mining Uncertain Data. Advances in Database Systems, vol. 35. Springer, Berlin (2009)

  8. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)

    Article  Google Scholar 

  9. Arya, S., Malamatos, T., Mount, D.M.: Space–time tradeoffs for approximate nearest neighbor searching. J. ACM 57(54), Article No. 1 (2009). doi:10.1145/1613676.1613677

  10. Aurenhammer, F., Klein, R.: Voronoi diagrams. In: Sack, J.-R., Urrutia, J. (eds.) Handbook of Computational Geometry, pp. 201–290. North-Holland, Amsterdam (2000)

    Chapter  Google Scholar 

  11. Aurenhammer, F., Klein, R., Lee, D.-T.: Voronoi Diagrams and Delaunay Triangulations. World Scientific, Hackensack (2013)

    Book  MATH  Google Scholar 

  12. Beskales, G., Soliman, M.A., IIyas, I.F.: Efficient search for the top-\(k\) probable nearest neighbors in uncertain databases. VLDB 1(1), 326–339 (2008)

    Google Scholar 

  13. Blömer, J.: Computing sums of radicals in polynomial time. In: Proceedings of the 32nd Annual Symposium on Foundations of Computer Science (FOCS’91), pp. 670–677. IEEE, Los Alamitos (1991)

  14. Cabello, S.: Approximation algorithms for spreading points. J. Algorithms 62(2), 49–73 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  15. Cabello, S., van Kreveld, M.J.: Approximation algorithms for aligning points. In: Proceedings of the 19th Annual Symposium on Computational Geometry (SCD’13), pp. 20–28. ACM, New York (2003)

  16. Chan, T.M.: Low-dimensional linear programming with violations. SIAM J. Comput. 34(4), 879–893 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  17. Chazelle, B.: On the convex layers of a planar set. IEEE Trans. Inf. Theory 31(4), 509–517 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  18. Chazelle, B., Guibas, L.J.: Fractional cascading. I. A data structuring technique. Algorithmica 1(2), 133–162 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  19. Cheng, R., Chen, J., Mokbel, M., Chow, C.: Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: Proceedings of the IEEE 24th International Conference on Data Engineering (ICDE’08), pp. 973–982. IEEE, Los Alamitos (2008)

  20. Cheng, R., Chen, L., Chen, J., Xie, X.: Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (EDBT’09), pp. 672–683. ACM, New York (2009)

  21. Cheng, R., Xie, X., Yiu, M.L., Chen, J., Sun, L.: UV-diagram: a Voronoi diagram for uncertain data. In: Proceedings of the IEEE 26th International Conference on Data Engineering (ICDE’10), pp. 796–807. IEEE, Los Alamitos (2010)

  22. Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data and expected ranks. In: Proceedings of the IEEE 25th International Conference on Data Engineering (ICDE’09), pp. 305–316. IEEE, Los Alamitos (2009)

  23. Dalvi, N., Ré, C., Suciu, D.: Probabilistic databases: diamonds in the dirt. Commun. ACM 52(7), 86–94 (2009)

    Article  Google Scholar 

  24. de Berg, M., Cheong, O., van Kreveld, M., Overmars, M., Schwarzkopf, O.: Computational Geometry: Algorithms and Applications, 2nd edn. Springer, Berlin (2000)

    Book  MATH  Google Scholar 

  25. Driscoll, J.R., Sarnak, N., Sleator, D.D., Tarjan, R.E.: Making data structures persistent. J. Comput. Syst. Sci. 38(1), 86–124 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  26. Guibas, L., Hershberger, J., Snoeyink, J.: Compact interval trees: a data structure for convex hulls. In: Proceedings of the 1st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’90), pp. 169–178. SIAM, Philadelphia (1990)

  27. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data (SIGMOD’84), pp. 47–57. ACM, New York (1984)

  28. Har-Peled, S.: Geometric Approximation Algorithms. Mathematical Surveys and Monographs, vol. 173. American Mathematical Society, Providence (2011)

  29. Har-Peled, S., Kumar, N.: Down the rabbit hole: robust proximity search and density estimation in sublinear space. SIAM J. Comput. 43(4), 1486–1511 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  30. Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD’08), pp. 673–686. ACM, New York (2008)

  31. Jooyandeh, M., Mohades, A., Mirzakhah, M.: Uncertain Voronoi diagram. Inf. Process. Lett. 109(13), 709–712 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  32. Kamousi, P., Chan, T.M., Suri, S.: Closest pair and the post office problem for stochastic points. In: Dehne, F., Iacono, J., Sack, J.-R. (eds.) Algorithms and Data Structures (WADS’11). Lecture Notes in Computer Science, vol. 6844, pp. 548–559. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  33. Kriegel, H., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: Kotagiri, Ramamohanarao, et al. (eds.) Advances in Databases: Concepts, Systems and Applications (DASFAA’07). Lecture Notes in Computer Science, vol. 4443, pp. 337–348. Springer, Berlin (2007)

    Chapter  Google Scholar 

  34. Li, Y., Li, F., Yi, K., Yao, B., Wang, M.: Flexible aggregate similarity search. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (SIGMOD’11), pp. 1009–1020. ACM, New York (2011)

  35. Li, H., Lu, H., Huang, B., Huang, Z.: Two ellipse-based pruning methods for group nearest neighbor queries. In: Proceedings of the 13th Annual ACM International Workshop on Geographic Information Systems (GIS’05), pp. 192–199. ACM, New York (2005)

  36. Li, F., Yao, B., Kumar, P.: Group enclosing queries. IEEE Trans. Knowl. Data Eng. 23(10), 1526–1540 (2011)

    Article  Google Scholar 

  37. Lian, X., Chen, L.: Probabilistic group nearest neighbor queries in uncertain databases. IEEE Trans. Knowl. Data Eng. 20(6), 809–824 (2008)

    Article  Google Scholar 

  38. Ljosa, V., Singh, A.K.: APLA: indexing arbitrary probability distributions. In: Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE’07), pp. 946–955. IEEE, Los Alamitos (2007)

  39. Löffler, M., van Kreveld, M.: Largest bounding box, smallest diameter, and related problems on imprecise points. Comput. Geom. 43(4), 419–433 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  40. Luo, Y., Chen, H., Furuse, K., Ohbo, N.: Efficient methods in finding aggregate nearest neighbor by projection-based filtering. In: Gervasi, O., Gavrilova, M.L. (eds.) Computational Science and Its Applications (ICCSA’07). Lecture Notes in Computer Science, vol. 4707, pp. 821–833. Springer, Berlin (2007)

    Google Scholar 

  41. Matoušek, J.: Reporting points in halfspaces. Comput. Geom. 2(3), 169–186 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  42. Papadias, D., Shen, Q., Tao, Y., Mouratidis, K.: Group nearest neighbor queries. In: Proceedings of the 20th International Conference on Data Engineering (ICDE’04), pp. 301–312. IEEE, Washington, DC (2004)

  43. Papadopoulos, A.N., Manolopoulos, Y.: Nearest Neighbor Search: A Database Perspective. Series in Computer Science. Springer, New York (2005)

    MATH  Google Scholar 

  44. Renz, M., Mamoulis, N., Emrich, T., Tang, Y., Cheng, R., Züfle, A., Zhang, P.: Voronoi-based nearest neighbor search for multi-dimensional uncertain databases. In: Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE’13), pp. 158–169. IEEE, Washington, DC (2013)

  45. Sarnak, N., Tarjan, R.E.: Planar point location using persistent search trees. Commun. ACM 29(7), 669–679 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  46. Sember, J., Evans, W.: Guaranteed Voronoi diagrams of uncertain sites. In: Proceedings of the 20th Canadian Conference on Computational Geometry (CCCG’08), pp. 207–210. CCCG (2008)

  47. Shakhnarovich, G., Darrell, T., Indyk, P.: Nearest-neighbor searching and metric space dimensions. In: Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, pp. 15–59. MIT Press, Cambridge, MA (2006)

  48. Sharifzadeh, M., Shahabi, C.: VoR-tree: R-trees with Voronoi diagrams for efficient processing of spatial nearest neighbor queries. VLDB 3(1–2), 1231–1242 (2010)

    Google Scholar 

  49. Sharir, M., Agarwal, P.K.: Davenport–Schinzel Sequences and Their Geometric Applications. Cambridge University Press, Cambridge (1995)

    MATH  Google Scholar 

  50. Soliman, M.A., Ilyas, I.F., Chang, K.C.-.C.: Top-\(k\) query processing in uncertain databases. In: Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE’07), pp. 896–905. IEEE, Los Alamitos (2007)

  51. van Kreveld, M., Löffler, M., Mitchell, J.S.B.: Preprocessing imprecise points and splitting triangulations. SIAM J. Comput. 39(7), 2990–3000 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  52. Wang, H., Zhang, W.: On top-\(k\) weighted sum aggregate nearest and farthest neighbors in the \(L_1\) plane. arXiv:1211.5084 (2012)

  53. Yiu, M.L., Mamoulis, N., Papadias, D.: Aggregate nearest neighbor queries in road networks. IEEE Trans. Knowl. Data Eng. 17(6), 820–833 (2005)

    Article  Google Scholar 

  54. Yuen, S.M., Tao, Y., Xiao, X., Pei, J., Zhang, D.: Superseding nearest neighbor search on uncertain spatial databases. IEEE Trans. Knowl. Data Eng. 22(7), 1041–1055 (2010)

    Article  Google Scholar 

Download references

Acknowledgements

The work of P. Agarwal and W. Zhang was supported by NSF under Grants CCF-09-40671, CCF-10-12254, and CCF-11-61359, by ARO Grants W911NF-07-1-0376 and W911NF-08-1-0452, and by an ERDC contract W9132V-11-C-0003. The work of A. Efrat and S. Sankararaman was supported by NSF CAREER Grant 0348000.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Swaminathan Sankararaman.

Additional information

Editor in Charge: Kenneth Clarkson

A preliminary version appeared as “Nearest-neighbor searching under uncertainty, Proc. 31st ACM Sympos. Principles of Database Systems, 2012, 225–236”.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Agarwal, P.K., Efrat, A., Sankararaman, S. et al. Nearest-Neighbor Searching Under Uncertainty I. Discrete Comput Geom 58, 705–745 (2017). https://doi.org/10.1007/s00454-017-9903-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00454-017-9903-x

Keywords

Mathematics Subject Classification

Navigation