Skip to main content
Log in

Supporting ranking queries on uncertain and incomplete data

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Large databases with uncertain information are becoming more common in many applications including data integration, location tracking, and Web search. In these applications, ranking records with uncertain attributes introduces new problems that are fundamentally different from conventional ranking. Specifically, uncertainty in records’ scores induces a partial order over records, as opposed to the total order that is assumed in the conventional ranking settings. In this paper, we present a new probabilistic model, based on partial orders, to encapsulate the space of possible rankings originating from score uncertainty. Under this model, we formulate several ranking query types with different semantics. We describe and analyze a set of efficient query evaluation algorithms. We show that our techniques can be used to solve the problem of rank aggregation in partial orders under two widely adopted distance metrics. In addition, we design sampling techniques based on Markov chains to compute approximate query answers. Our experimental evaluation uses both real and synthetic data. The experimental study demonstrates the efficiency and effectiveness of our techniques under various configurations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Sarma, A.D., Benjelloun, O., Halevy, A., Widom, J.: Working models for uncertain data. In: ICDE (2006)

  2. Benjelloun, O., Sarma, A.D., Halevy, A., Widom, J.: Uldbs: databases with uncertainty and lineage. In: VLDB (2006)

  3. Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB (2004)

  4. Chang, K.C.-C., Hwang, S.: Minimal probing: supporting expensive predicates for top-k queries. In: SIGMOD (2002)

  5. Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4) (2008)

  6. Wolf, G., Khatri, H., Chokshi, B., Fan, J., Chen, Y., Kambhampati, S.: Query processing over incomplete autonomous databases. In: VLDB (2007)

  7. Wu, X., Barbará, D.: Learning missing values from summary constraints. SIGKDD Explor. 4(1) (2002)

  8. Chomicki, J.: Preference formulas in relational queries. ACM Trans. Database Syst. 28(4) (2003)

  9. Chan, C.-Y., Jagadish, H.V., Tan, K.-L., Tung, A.K.H., Zhang, Z.: Finding k-dominant skylines in high dimensional space. In: SIGMOD (2006)

  10. Tao, Y., Xiao, X., Pei, J.: Efficient skyline and top-k retrieval in subspaces. TKDE 19(8) (2007)

  11. Brightwell, G., Winkler, P.: Counting linear extensions is #p-complete. In: STOC (1991)

  12. Cheng, R., Prabhakar, S., Kalashnikov, D.V.: Querying imprecise data in moving object environments. In: ICDE (2003)

  13. Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: WWW (2001)

  14. Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J.M., Hong, W.: Model-based approximate querying in sensor networks. VLDB J. 14(4) (2005)

  15. Abiteboul, S., Kanellakis, P., Grahne, G.: On the representation and querying of sets of possible worlds. In: SIGMOD (1987)

  16. Soliman, M.A., Ilyas, I.F., Chang, K.C.-C.: Top-k query processing in uncertain databases. In: ICDE (2007)

  17. Zhang, X., Chomicki, J.: On the semantics and evaluation of top-k queries in probabilistic databases. In: ICDE Workshops (2008)

  18. Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: SIGMOD (2008)

  19. O’Leary, D.P.: Multidimensional integration: partition and conquer. Comput. Sci. Eng. 6(6) (2004)

  20. Jerrum, M., Sinclair, A.: The markov chain monte carlo method: an approach to approximate counting and integration. Approximation algorithms for NP-hard problems (1997)

  21. Hastings, W.K.: Monte carlo sampling methods using markov chains and their applications. Biometrika 57(1) (1970)

  22. Gelman, A., Rubin, D.B.: Inference from iterative simulation using multiple sequences. Stat. Sci. 7(4) (1992)

  23. Cowles, M.K., Carlin, B.P.: Markov chain Monte Carlo convergence diagnostics: a comparative review. J. Am. Stat. Assoc. 91(434) (1996)

  24. Kenyon-Mathieu, C., Schudy, W.: How to rank with few errors. In: STOC (2007)

  25. van Acker, P.: Transitivity revisited. Ann. Oper. Res. 23(1–4) (1990)

  26. Intriligator, M.D.: A probabilistic model of social choice. Rev. Econ. Stud. 40(4) (1973)

  27. Fishburn, P.C.: Probabilistic social choice based on simple voting comparisons. Rev. Econ. Stud. 51(4) (1984)

  28. Ilyas, I.F., Markl, V., Haas, P.J., Brown, P., Aboulnaga, A.: Cords: automatic discovery of correlations and soft functional dependencies. In: SIGMOD (2004)

  29. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 1(1) (2001)

  30. Xin, D., Han, J., Chang, K.C.-C.: Progressive and selective merge: computing top-k with ad-hoc ranking functions. In: SIGMOD (2007)

  31. The R project for statistical computing: http://www.r-project.org

  32. Bubley, R., Dyer, M.: Faster random generation of linear extensions. In: SODA (1998)

  33. Re, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE (2007)

  34. Wu, M., Jermaine, C.: A Bayesian method for guessing the extreme values in a data set. In: VLDB (2007)

  35. Li, J., Saha, B., Deshpande, A.: A unified approach to ranking in probabilistic databases. PVLDB 2(1) (2009)

  36. Li, J., Deshpande, A.: Consensus answers for queries over probabilistic databases. In: PODS (2009)

  37. Little R., Rubin D.B.: Statistical Analysis with Missing Data. Wiley & Sons, New York (1987)

    MATH  Google Scholar 

  38. Rubin D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley & Sons, New York (1987)

    Book  Google Scholar 

  39. Ola, A., Ozsoyoglu, G.: Incomplete relational database models based on intervals. IEEE TKDE 05(2) (1993)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed A. Soliman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Soliman, M.A., Ilyas, I.F. & Ben-David, S. Supporting ranking queries on uncertain and incomplete data. The VLDB Journal 19, 477–501 (2010). https://doi.org/10.1007/s00778-009-0176-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-009-0176-8

Keywords

Navigation