Abstract
A nearest neighbor (NN) query, which returns the most similar object to a user-specified query object, plays an important role in a wide range of applications and hence has received considerable attention. In many such applications, e.g., sensor data collection and location-based services, objects are inherently uncertain. Furthermore, due to the ever increasing generation of massive datasets, the importance of distributed databases, which deal with such data objects, has been growing. One emerging challenge is to efficiently process probabilistic NN queries over distributed uncertain databases. The straightforward approach, that each local site forwards its own database to the central server, is communication-expensive, so we have to minimize communication cost for the NN object retrieval. In this paper, we focus on two important queries, namely top-k probable NN queries and probabilistic star queries, and propose efficient algorithms to process them over distributed uncertain databases. Extensive experiments on both real and synthetic data have demonstrated that our algorithms significantly reduce communication cost.







Similar content being viewed by others
Notes
As long as a distance can be computed, our algorithms function, hence we can apply any distance metrics.
References
AbdulAzeem, Y.M., ElDesouky, A.I., Ali, H.A.: A framework for ranking uncertain distributed database. DKE 92, 1–19 (2014)
Antova, L., Jansen, T., Koch, C., Olteanu, D.: Fast and simple relational processing of uncertain data. In: ICDE, pp. 983–992 (2008)
Atallah, M.J., Qi, Y.: Computing all skyline probabilities for uncertain data. In: PODS, pp. 279–287 (2009)
Bernecker, T., Emrich, T., Kriegel, H.P., Renz, M., Zankl, S., Züfle, A.: Efficient probabilistic reverse nearest neighbor query processing on uncertain data. PVLDB 4(10), 669–680 (2011)
Beskales, G., Soliman, M.A., IIyas, I.F.: Efficient search for the top-k probable nearest neighbors in uncertain databases. PVLDB 1(1), 326–339 (2008)
Börzsönyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: ICDE, pp. 421–430 (2001)
Cheema, M.A., Lin, X., Wang, W., Zhang, W., Pei, J.: Probabilistic reverse nearest neighbor queries on uncertain data. IEEE TKDE 22(4), 550–564 (2010)
Chen, J., Cheng, R., Mokbel, M., Chow, C.Y.: Scalable processing of snapshot and continuous nearest-neighbor queries over one-dimensional uncertain data. VLDB J. 18(5), 1219–1240 (2009)
Cheng, R., Chen, L., Chen, J., Xie, X.: Evaluating probability threshold k-nearest-neighbor queries over uncertain data. In: EDBT, pp. 672–683 (2009)
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Querying imprecise data in moving object environments. IEEE TKDE 16(9), 1112–1127 (2004)
Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data and expected ranks. In: ICDE, pp. 305–316 (2009)
Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB J. 16(4), 523–544 (2007)
Ding, X., Jin, H.: Efficient and progressive algorithms for distributed skyline queries over uncertain data. IEEE TKDE 24(8), 1448–1462 (2012)
Fu, T.Y., Peng, W.C., Lee, W.C.: Parallelizing itinerary-based knn query processing in wireless sensor networks. IEEE TKDE 22(5), 711–729 (2010)
Ge, T., Zdonik, S., Madden, S.: Top-k queries on uncertain data: on score distribution and typical answers. In: SIGMOD, pp. 375–388 (2009)
Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: SIGMOD, pp. 673–686 (2008)
Kriegel, H.P., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: DASFAA, pp. 337–348 (2007)
Li, F., Yi, K., Jestes, J.: Ranking distributed probabilistic data. In: SIGMOD, pp. 361–374 (2009)
Li, J., Saha, B., Deshpande, A.: A unified approach to ranking in probabilistic databases. PVLDB 2(1), 502–513 (2009)
Li, X., Wang, Y., Li, X., Wang, X., Yu, J.: Gdps: an efficient approach for skyline queries over distributed uncertain data. Big Data Res. 1, 23–36 (2014)
Lian, X., Chen, L.: Probabilistic group nearest neighbor queries in uncertain databases. IEEE TKDE 20(6), 809–824 (2008)
Lian, X., Chen, L.: Shooting top-k stars in uncertain databases. VLDB J. 20(6), 819–840 (2011)
Lin, X., Xu, J., Hu, H., Lee, W.: Authenticating location-based skyline queries in arbitrary subspaces. IEEE TKDE 26(6), 1479–1493 (2014)
Liu, X., Yang, D., Ye, M., Lee, W.: U-skyline: a new skyline query for uncertain databases. IEEE TKDE 25(4), 945–960 (2013)
Pei, J., Jiang, B., Lin, X., Yuan, Y.: Probabilistic skylines on uncertain data. In: VLDB, pp. 15–26 (2007)
Pripužić, K., Žarko, I.P., Aberer, K.: Distributed processing of continuous sliding-window k-nn queries for data stream filtering. World Wide Web 14(5–6), 465–494 (2011)
Sarma, A.D., Benjelloun, O., Halevy, A., Widom, J.: Working models for uncertain data. In: ICDE, pp. 7–27 (2006)
Soliman, M.A., Ilyas, I.F., Chen-Chuan Chang, K.: Top-k query processing in uncertain databases. In: ICDE, pp. 896–905 (2007)
Tang, M., Li, F., Phillips, J.M., Jestes, J.: Efficient threshold monitoring for distributed probabilistic data. In: ICDE, pp. 1120–1131 (2012)
Tang, M., Li, F., Tao, Y.: Distributed online tracking. In: SIGMOD (2015)
Wang, Y., Li, X., Li, X., Wang, Y.: A Survey of Queries Over Uncertain Data. Knowledge and Information Systems. Springer, London (2013)
Ye, M., Lee, W., Lee, D., Liu, X.: Distributed processing of probabilistic top-k queries in wireless sensor networks. IEEE TKDE 25(6), 76–91 (2013)
Yiu, M.L., Mamoulis, N.: Efficient processing of top-k dominating queries on multi-dimensional data. In: VLDB, pp. 483–494 (2007)
Yuen, S.M., Tao, Y., Xiao, X., Pei, J., Zhang, D.: Superseding nearest neighbor search on uncertain spatial databases. IEEE TKDE 22(7), 1041–1055 (2010)
Zhang, X., Chomicki, J.: Semantics and evaluation of top-k queries in probabilistic databases. Distrib Parallel Databases 26(1), 67–126 (2009)
Zhang, Y., Lin, X., Zhu, G., Zhang, W., Lin, Q.: Efficient rank based knn query processing over uncertain data. In: ICDE, pp. 28–39 (2010)
Acknowledgments
This research is partially supported by the Grant-in-Aid for Scientific Research (A) (26240013) of MEXT, Japan, and JST, Strategic International Collaborative Research Program, SICORP.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Amagata, D., Sasaki, Y., Hara, T. et al. Probabilistic nearest neighbor query processing on distributed uncertain data. Distrib Parallel Databases 34, 259–287 (2016). https://doi.org/10.1007/s10619-015-7183-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-015-7183-0