Skip to main content
Log in

Probabilistic inverse ranking queries in uncertain databases

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Query processing in the uncertain database has become increasingly important due to the wide existence of uncertain data in many real applications. Different from handling precise data, the uncertain query processing needs to consider the data uncertainty and answer queries with confidence guarantees. In this paper, we formulate and tackle an important query, namely probabilistic inverse ranking (PIR) query, which retrieves possible ranks of a given query object in an uncertain database with confidence above a probability threshold. We present effective pruning methods to reduce the PIR search space, which can be seamlessly integrated into an efficient query procedure. Moreover, we tackle the problem of PIR query processing in high dimensional spaces, which reduces high dimensional uncertain data to a lower dimensional space. Furthermore, we study three interesting and useful aggregate PIR queries, that is, MAX, top-m, and AVG  PIRs. Moreover, we also study an important query type, PIR with uncertain query object (namely UQ-PIR), and design specific rules to facilitate the pruning. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approaches over both real and synthetic data sets, under various experimental settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient similarity search in sequence databases. In: Proceedings of the 4th International Conference of Foundations of Data Organization and Algorithms (1993)

  2. Antova, L., Koch, C., Olteanu, D.: \({10^{10^{6}}}\) worlds and beyond: efficient representation and processing of incomplete information. In: Proceedings of the 23rd International Conference on Data Engineering (2007)

  3. Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the ACM SIGMOD International Conference on Management of Data

  4. Benjelloun, O., Das Sarma, A., Halevy, A.Y., Widom, J.: ULDBs: databases with uncertainty and lineage. In: Proceedings of the 32nd International Conference on Very Large Data Bases (2006)

  5. Böhm, C., Pryakhin, A., Schubert, M.: The Gauss-tree: efficient object identification in databases of probabilistic feature vectors. In: Proceedings of the 22nd International Conference on Data Engineering (2006)

  6. Boulos, J., Dalvi, N.N., Mandhani, B., Mathur, S., Ré, C., Suciu, D.: Mystiq: a system for finding more answers by using probabilities. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2005)

  7. Bruno, N., Chaudhuri, S., Gravano, L.: Top-k selection queries over relational databases: mapping strategies and performance evaluation. ACM Trans. Database Sys (2002)

  8. Chang, K.C.-C., Hwang, S.-W.: Minimal probing: supporting expensive predicates for top-k queries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2002)

  9. Chang, Y.-C., Bergman, L.D., Castelli, V., Li, C.-S., Lo, M.-L., Smith, J.R.: The Onion technique: indexing for linear optimization queries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2000)

  10. Chen, J., Cheng, R.: Efficient evaluation of imprecise location-dependent queries. In: Proceedings of the 23th International Conference on Data Engineering (2007)

  11. Chen, L., Ozsu, M.T., Oria, V.: Robust and fast similarity search for moving object trajectories. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2005)

  12. Cheng, R., Chen, J.: Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: Proceedings of the 24th International Conference on Data Engineering (2008)

  13. Cheng, R., Kalashnikov, D., Prabhakar, S.: Querying imprecise data in moving object environments. IEEE Trans. Knowledge Data Eng. 16(9) (2004)

  14. Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2003)

  15. Cheng, R., Singh, S., Prabhakar, S.: U-DBMS: a database system for managing constantly-evolving data. In: Proceedings of the 31st International Conference on Very Large Data Bases (2005)

  16. Cheng, R., Xia, Y., Prabhakar, S., Shah, R., Vitter, J.: Efficient indexing methods for probabilistic threshold queries over uncertain data. In: Proceedings of the 30th International Conference on Very Large Data Bases (2004)

  17. Cheng, R., Zhang, Y., Bertino, E., Prabhakar, S.: Preserving user location privacy in mobile data management infrastructures. In: Privacy Enhancing Technologies (2006)

  18. Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data and expected ranks. In: Proceedings of the 25th International Conference on Data Engineering (2009)

  19. Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB J. 16(4) (2007)

  20. Das, G., Gunopulos, D., Koudas, N., Tsirogiannis, D.: Answering top-k queries using views. In: Proceedings of the 32nd International Conference on Very Large Data Bases (2006)

  21. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium Principles of Database Systems (2001)

  22. Faradjian, A., Gehrke, J., Bonnet, P.: GADT: A probability space ADT for representing and querying the physical world. In: Proceedings of the 18th International Conference on Data Engineering (2002)

  23. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1984)

  24. Hristidis, V., Koudas, N., Papakonstantinou, Y.: PREFER: a system for the efficient execution of multi-parametric ranked queries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2001)

  25. Hristidis, V., Papakonstantinou, Y.: Algorithms and applications for answering ranked queries using ranked views. VLDB J. 13(1) (2004)

  26. Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2008)

  27. Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Supporting top-k join queries in relational databases. VLDB J. 13(3) (2004)

  28. Jampani, R., Xu, F., Wu, M., Perez, L.L., Jermaine, C., Haas, P.J.: Mcdb: a monte carlo approach to managing uncertain data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2008)

  29. Ravi Kanth, K.V., Agrawal, D., Singh, A.: Dimensionality reduction for similarity searching in dynamic databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1998)

  30. Kriegel, H.-P., Kunath, P., Pfeifle, M., Renz, M.: Probabilistic similarity join on uncertain data. In: Proceedings of the 11th International Conference on Database Systems for Advanced Applications (2006)

  31. Kriegel, H.-P., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: Proceedings of the 12th International Conference on Database Systems for Advanced Applications (2007)

  32. Lazaridis, I., Mehrotra, S.: Progressive approximate aggregate queries with a multi-resolution tree structure. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2001)

  33. Li, C.: Enabling data retrieval: by ranking and beyond. In: Ph.D. Dissertation, University of Illinois at Urbana-Champaign (2007)

  34. Li, C., Chang, K.C.-C., Ilyas, I.F., Song, S.: RankSQL: query algebra and optimization for relational top-k queries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2005)

  35. Li, J., Tao, Y., Xiao, X.: Preservation of proximity privacy in publishing numerical sensitive data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2008)

  36. Lian, X., Chen, L.: Monochromatic and bichromatic reverse skyline search over uncertain databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2008)

  37. Lian, X., Chen, L.: Probabilistic group nearest neighbor queries in uncertain databases. IEEE Trans. Knowl. Data Eng. (2008)

  38. Lian, X., Chen, L.: Probabilistic ranked queries in uncertain databases. In: Proceedings of the International Conference on Extending Database Technology (2008)

  39. Lian, X., Chen, L.: Efficient processing of probabilistic reverse nearest neighbor queries over uncertain data. VLDB J. (2009)

  40. Lian, X., Chen, L.: Probabilistic inverse ranking queries over uncertain data. In: Proceedings of the 14th International Conference on Database Systems for Advanced Applications (2009)

  41. Ljosa, V., Singh, A.K.: APLA: indexing arbitrary probability distributions. In: Proceedings of the 23th International Conference on Data Engineering (2007)

  42. Ljosa, V., Singh, A.K.: Top-k spatial joins of probabilistic objects. In: Proceedings of the 24th International Conference on Data Engineering (2008)

  43. Luo, Y., Lin, X., Wang, W., Zhou, X.: Spark: Top-k keyword query in relational databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2007)

  44. Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Sys. 29(2) (2004)

  45. Mokbel, M.F., Chow, C.-Y., Aref, W.G.: The new casper: query processing for location services without compromising privacy. In: Proceedings of the 32nd International Conference on Very Large Data Bases (2006)

  46. Papadimitriou, S., Li, F., Kollios, G., Yu, P.S.: Time series compressibility and privacy. In: Proceedings of the 33rd International Conference on Very Large Data Bases (2007)

  47. Pei, J., Jiang, B., Lin, X., Yuan, Y.: Probabilistic skylines on uncertain data. In: Proceedings of the 33rd International Conference on Very Large Data Bases (2007)

  48. Prabhakar, S., Mayfield, C., Cheng, R., Singh, S., Shah, R., Neville, J., Hambrusch, S.: Database support for pdf attributes. In: Proceedings of the 24th International Conference on Data Engineering (2008)

  49. Re, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: Proceedings of the 23th International Conference on Data Engineering (2007)

  50. Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The a-tree: an index structure for high-dimensional spaces using relative approximation. In: Proceedings of the 26th International Conference on Very Large Data Bases (2000)

  51. Soliman, M.A., Ilyas, I.F., Chang, K.C.: Top-k query processing in uncertain databases. In: Proceedings of the 23th International Conference on Data Engineering (2007)

  52. Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., B.K., Prabhakar, S.: Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: Proceedings of the 31st International Conference on Very Large Data Bases (2005)

  53. Tao Y., Hristidis V., Papadias D., Papakonstantinou Y.: Branch-and-bound processing of ranked queries. Inf. Syst. 32(3), 424–445 (2007)

    Article  Google Scholar 

  54. Tao, Y., Papadias, D., Lian, X.: Reverse kNN search in arbitrary dimensionality. In: Proceedings of the 30th International Conference on Very Large Data Bases (2004)

  55. Tao, Y., Papadias, D., Lian, X., Xiao, X.: Multidimensional reverse k NN search. VLDB J. (2005)

  56. Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. In: Proceedings of the 30th International Conference on Very Large Data Bases (2004)

  57. Theodoridis, Y., Sellis, T.: A model for the prediction of R-tree performance. In: Proceedings of the 15th ACM SIGACT-SIGMOD-SIGART Symposium Principles of Database Systems (1996)

  58. Wang, D.Z., Michelakis, E., Garofalakis, M.N., Hellerstein, J.M.: Bayesstore: managing large, uncertain data repositories with probabilistic graphical models. Proc. VLDB Endow. 1(1) (2008)

  59. Xin, D., Chen, C., Han, J.: Towards robust indexing for ranked queries. In: Proceedings of the 32nd International Conference on Very Large Data Bases (2006)

  60. Xin, D., Han, J., Chang, K.C.: Progressive and selective merge: computing top-k with ad-hoc ranking functions. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2007)

  61. Xue, W., Luo, Q., Chen, L., Liu, Y.: Contour map matching for event detection in sensor networks. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2006)

  62. Yi, K., Li, F., Kollios, G., Srivastava, D.: Efficient processing of top-k queries in uncertain databases. In: Proceedings of the 24th International Conference on Data Engineering (2008)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lian, X., Chen, L. Probabilistic inverse ranking queries in uncertain databases. The VLDB Journal 20, 107–127 (2011). https://doi.org/10.1007/s00778-010-0195-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-010-0195-5

Keywords

Navigation