skip to main content
10.1145/2213556.2213588acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Nearest-neighbor searching under uncertainty

Published: 21 May 2012 Publication History

Abstract

Nearest-neighbor queries, which ask for returning the nearest neighbor of a query point in a set of points, are important and widely studied in many fields because of a wide range of applications. In many of these applications, such as sensor databases, location based services, face recognition, and mobile data, the location of data is imprecise. We therefore study nearest neighbor queries in a probabilistic framework in which the location of each input point and/or query point is specified as a probability density function and the goal is to return the point that minimizes the expected distance, which we refer to as the expected nearest neighbor (ENN). We present methods for computing an exact ENN or an ε-approximate ENN, for a given error parameter 0 < ε 0 < 1, under different distance functions. These methods build an index of near-linear size and answer ENN queries in polylogarithmic or sublinear time, depending on the underlying function. As far as we know, these are the first nontrivial methods for answering exact or ε-approximate ENN queries with provable performance guarantees.

References

[1]
P. K. Agarwal, S.-W. Cheng, Y. Tao, and K. Yi, Indexing uncertain data, Proc. ACM Symposium on Principles of Database Systems, 2009, pp. 137--146.
[2]
P. K. Agarwal, S. Har-Peled, M. Sharir, and Y. Wang, Hausdorff distance under translation for points and balls, ACM Transactions on Algorithms, 6 (2010), 71:1--71:26.
[3]
P. K. Agarwal and J. Matousek, Ray shooting and parametric search, SIAM Journal on Computing, 22 (1993), 794--806.
[4]
C. C. Aggarwal, Managing and Mining Uncertain Data, Springer, 2009.
[5]
A. Andoni and P. Indyk, Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions, Communications of the ACM, 51 (2008), 117--122.
[6]
S. Arya, T. Malamatos, and D. M. Mount, Space-time tradeoffs for approximate nearest neighbor searching, Journal of the ACM, 57 (2009), 1:1--1:54.
[7]
F. Aurenhammer and R. Klein, Voronoi diagrams, in: Handbook of Computational Geometry (J. E. Goodman and J. O'Rourke, eds.), Elsevier Science Publishers, Amsterdam, 2000, pp. 201--290.
[8]
G. Beskales, M. A. Soliman, and I. F. IIyas, Efficient search for the top-k probable nearest neighbors in uncertain databases, Proc. International Conference on Very Large Databases, 1 (2008), 326--339.
[9]
S. Cabello, Approximation algorithms for spreading points, Journal of Algorithmss, 62 (2007), 49--73.
[10]
S. Cabello and M. J. van Kreveld, Approximation algorithms for aligning points, Proc. 19th ACM Symposium on Computational Geometry, 2003, pp. 20--28.
[11]
R. Cheng, J. Chen, M. Mokbel, and C.-Y. Chow, Probabilistic verifiers: Evaluating constrained nearest-neighbor queries over uncertain data, Proc. IEEE International Conference on Data Engineering, 2008, pp. 973--982.
[12]
R. Cheng, L. Chen, J. Chen, and X. Xie, Evaluating probability threshold k-nearest-neighbor queries over uncertain data, Proc. 12th International Conference on Extending Database Technology: Advances in Database Technology, 2009, pp. 672--683.
[13]
R. Cheng, X. Xie, M. L. Yiu, J. Chen, and L. Sun, Uv-diagram: A voronoi diagram for uncertain data, Proc. IEEE International Conference on Data Engineering, 2010, pp. 796--807.
[14]
K. L. Clarkson, Nearest-neighbor searching and metric space dimensions, Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, (2006), 15--59.
[15]
N. N. Dalvi, C. Ré, and D. Suciu, Probabilistic databases: diamonds in the dirt, Communications of the ACM, 52 (2009), 86--94.
[16]
M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf, Computational Geometry: Algorithms and Applications, Springer-Verlag, 2000.
[17]
A. Guttman, R-trees: a dynamic index structure for spatial searching, Proc. ACM SIGMOD International Conference on Management of Data, 1984, pp. 47--57.
[18]
S. Har-Peled, Geometric Approximation Algorithms, American Mathematical Society, 2011.
[19]
M. Hua, J. Pei, W. Zhang, and X. Lin, Ranking queries on uncertain data: a probabilistic threshold approach, Proc. ACM SIGMOD International Conference on Management of Data, 2008, pp. 673--686.
[20]
P. Indyk, Nearest neighbors in high-dimensional spaces, in: Handbook of Discrete and Computational Geometry (J. E. Goodman and J. O'Rourke, eds.), CRC Press LLC, 2004.
[21]
M. Jooyandeh, A. Mohades, and M. Mirzakhah, Uncertain voronoi diagram, Information Processing Letters, 109 (2009), 709--712.
[22]
P. Kamousi, T. M. Chan, and S. Suri, Closest pair and the post office problem for stochastic points, Proc. 12th International Conference on Algorithms and Data Structures, 2011, pp. 548--559.
[23]
H.-P. Kriegel, P. Kunath, and M. Renz, Probabilistic nearest-neighbor query on uncertain objects, Proc. 12th International Conference on Database Systems for Advanced Applications, 2007, pp. 337--348.
[24]
F. Li, B. Yao, and P. Kumar, Group enclosing queries, IEEE Transactions on Knowledge and Data Engineering, 23 (2011), 1526 --1540.
[25]
H. Li, H. Lu, B. Huang, and Z. Huang, Two ellipse-based pruning methods for group nearest neighbor queries, Proc. 13th Annual ACM International Workshop on Geographic Information Systems, 2005, pp. 192--199.
[26]
Y. Li, F. Li, K. Yi, B. Yao, and M. Wang, Flexible aggregate similarity search, Proc. ACM SIGMOD International Conference on Management of Data, 2011, pp. 1009--1020.
[27]
X. Lian and L. Chen, Probabilistic group nearest neighbor queries in uncertain databases, IEEE Transactions on Knowledge and Data Engineering, 20 (2008), 809--824.
[28]
V. Ljosa and A. Singh, Apla: Indexing arbitrary probability distributions, Proc. IEEE International Conference on Data Engineering, 2007, pp. 946--955.
[29]
M. Löffler and M. J. van Kreveld, Largest bounding box, smallest diameter, and related problems on imprecise points, Computational Geometry, 43 (2010), 419--433.
[30]
Y. Luo, H. Chen, K. Furuse, and N. Ohbo, Efficient methods in finding aggregate nearest neighbor by projection-based filtering, Proc. 12th International Conference on Computational Science and Its Applications, 2007, pp. 821--833.
[31]
D. Papadias, Q. Shen, Y. Tao, and K. Mouratidis, Group nearest neighbor queries, Proc. IEEE International Conference on Data Engineering, 2004, pp. 301--312.
[32]
N. Sarnak and R. E. Tarjan, Planar point location using persistent search trees, Communications of the ACM, 29 (1986), 669--679.
[33]
J. Sember and W. Evans, Guaranteed voronoi diagrams of uncertain sites, Proc. 20th Canadian Conference on Computational Geometry, 2008.
[34]
M. Sharifzadeh and C. Shahabi, Vor-tree: R-trees with voronoi diagrams for efficient processing of spatial nearest neighbor queries, Proc. International Conference on Very Large Databases, 3 (2010), 1231--1242.
[35]
M. Sharir and P. K. Agarwal, Davenport-Schinzel Sequences and Their Geometric Applications, Cambridge University Press, New York, 1995.
[36]
G. Trajcevski, R. Tamassia, H. Ding, P. Scheuermann, and I. F. Cruz, Continuous probabilistic nearest-neighbor queries for uncertain trajectories, Proc. 12th International Conference on Extending Database Technology: Advances in Database Technology, 2009, pp. 874--885.
[37]
M. J. van Kreveld, M. Löffler, and J. S. B. Mitchell, Preprocessing imprecise points and splitting triangulations, SIAM Journal on Computing, 39 (2010), 2990--3000.
[38]
M. Yiu, N. Mamoulis, and D. Papadias, Aggregate nearest neighbor queries in road networks, IEEE Transactions on, Knowledge and Data Engineering, 17 (2005), 820--833.
[39]
S. M. Yuen, Y. Tao, X. Xiao, J. Pei, and D. Zhang, Superseding nearest neighbor search on uncertain spatial databases, IEEE Transactions on Knowledge and Data Engineering, 22 (2010), 1041--1055.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '12: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems
May 2012
332 pages
ISBN:9781450312486
DOI:10.1145/2213556
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 May 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. approximate nearest neighbor
  2. expected nearest neighbor (enn)
  3. indexing uncertain data
  4. nearest-neighbor queries

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '12
Sponsor:

Acceptance Rates

PODS '12 Paper Acceptance Rate 26 of 101 submissions, 26%;
Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media