Supporting ranking queries on uncertain and incomplete data

Soliman, Mohamed A.; Ilyas, Ihab F.; Ben-David, Shalev

doi:10.1007/s00778-009-0176-8

Supporting ranking queries on uncertain and incomplete data

Regular Paper
Published: 10 February 2010

Volume 19, pages 477–501, (2010)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Mohamed A. Soliman¹,
Ihab F. Ilyas¹ &
Shalev Ben-David¹

336 Accesses
Explore all metrics

Abstract

Large databases with uncertain information are becoming more common in many applications including data integration, location tracking, and Web search. In these applications, ranking records with uncertain attributes introduces new problems that are fundamentally different from conventional ranking. Specifically, uncertainty in records’ scores induces a partial order over records, as opposed to the total order that is assumed in the conventional ranking settings. In this paper, we present a new probabilistic model, based on partial orders, to encapsulate the space of possible rankings originating from score uncertainty. Under this model, we formulate several ranking query types with different semantics. We describe and analyze a set of efficient query evaluation algorithms. We show that our techniques can be used to solve the problem of rank aggregation in partial orders under two widely adopted distance metrics. In addition, we design sampling techniques based on Markov chains to compute approximate query answers. Our experimental evaluation uses both real and synthetic data. The experimental study demonstrates the efficiency and effectiveness of our techniques under various configurations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Sarma, A.D., Benjelloun, O., Halevy, A., Widom, J.: Working models for uncertain data. In: ICDE (2006)
Benjelloun, O., Sarma, A.D., Halevy, A., Widom, J.: Uldbs: databases with uncertainty and lineage. In: VLDB (2006)
Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB (2004)
Chang, K.C.-C., Hwang, S.: Minimal probing: supporting expensive predicates for top-k queries. In: SIGMOD (2002)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4) (2008)
Wolf, G., Khatri, H., Chokshi, B., Fan, J., Chen, Y., Kambhampati, S.: Query processing over incomplete autonomous databases. In: VLDB (2007)
Wu, X., Barbará, D.: Learning missing values from summary constraints. SIGKDD Explor. 4(1) (2002)
Chomicki, J.: Preference formulas in relational queries. ACM Trans. Database Syst. 28(4) (2003)
Chan, C.-Y., Jagadish, H.V., Tan, K.-L., Tung, A.K.H., Zhang, Z.: Finding k-dominant skylines in high dimensional space. In: SIGMOD (2006)
Tao, Y., Xiao, X., Pei, J.: Efficient skyline and top-k retrieval in subspaces. TKDE 19(8) (2007)
Brightwell, G., Winkler, P.: Counting linear extensions is #p-complete. In: STOC (1991)
Cheng, R., Prabhakar, S., Kalashnikov, D.V.: Querying imprecise data in moving object environments. In: ICDE (2003)
Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: WWW (2001)
Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J.M., Hong, W.: Model-based approximate querying in sensor networks. VLDB J. 14(4) (2005)
Abiteboul, S., Kanellakis, P., Grahne, G.: On the representation and querying of sets of possible worlds. In: SIGMOD (1987)
Soliman, M.A., Ilyas, I.F., Chang, K.C.-C.: Top-k query processing in uncertain databases. In: ICDE (2007)
Zhang, X., Chomicki, J.: On the semantics and evaluation of top-k queries in probabilistic databases. In: ICDE Workshops (2008)
Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: SIGMOD (2008)
O’Leary, D.P.: Multidimensional integration: partition and conquer. Comput. Sci. Eng. 6(6) (2004)
Jerrum, M., Sinclair, A.: The markov chain monte carlo method: an approach to approximate counting and integration. Approximation algorithms for NP-hard problems (1997)
Hastings, W.K.: Monte carlo sampling methods using markov chains and their applications. Biometrika 57(1) (1970)
Gelman, A., Rubin, D.B.: Inference from iterative simulation using multiple sequences. Stat. Sci. 7(4) (1992)
Cowles, M.K., Carlin, B.P.: Markov chain Monte Carlo convergence diagnostics: a comparative review. J. Am. Stat. Assoc. 91(434) (1996)
Kenyon-Mathieu, C., Schudy, W.: How to rank with few errors. In: STOC (2007)
van Acker, P.: Transitivity revisited. Ann. Oper. Res. 23(1–4) (1990)
Intriligator, M.D.: A probabilistic model of social choice. Rev. Econ. Stud. 40(4) (1973)
Fishburn, P.C.: Probabilistic social choice based on simple voting comparisons. Rev. Econ. Stud. 51(4) (1984)
Ilyas, I.F., Markl, V., Haas, P.J., Brown, P., Aboulnaga, A.: Cords: automatic discovery of correlations and soft functional dependencies. In: SIGMOD (2004)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 1(1) (2001)
Xin, D., Han, J., Chang, K.C.-C.: Progressive and selective merge: computing top-k with ad-hoc ranking functions. In: SIGMOD (2007)
The R project for statistical computing: http://www.r-project.org
Bubley, R., Dyer, M.: Faster random generation of linear extensions. In: SODA (1998)
Re, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE (2007)
Wu, M., Jermaine, C.: A Bayesian method for guessing the extreme values in a data set. In: VLDB (2007)
Li, J., Saha, B., Deshpande, A.: A unified approach to ranking in probabilistic databases. PVLDB 2(1) (2009)
Li, J., Deshpande, A.: Consensus answers for queries over probabilistic databases. In: PODS (2009)
Little R., Rubin D.B.: Statistical Analysis with Missing Data. Wiley & Sons, New York (1987)
MATH Google Scholar
Rubin D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley & Sons, New York (1987)
Book Google Scholar
Ola, A., Ozsoyoglu, G.: Incomplete relational database models based on intervals. IEEE TKDE 05(2) (1993)

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Waterloo, Waterloo, Canada
Mohamed A. Soliman, Ihab F. Ilyas & Shalev Ben-David

Authors

Mohamed A. Soliman
View author publications
You can also search for this author inPubMed Google Scholar
Ihab F. Ilyas
View author publications
You can also search for this author inPubMed Google Scholar
Shalev Ben-David
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Mohamed A. Soliman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Soliman, M.A., Ilyas, I.F. & Ben-David, S. Supporting ranking queries on uncertain and incomplete data. The VLDB Journal 19, 477–501 (2010). https://doi.org/10.1007/s00778-009-0176-8

Download citation

Received: 12 May 2009
Revised: 28 November 2009
Accepted: 12 December 2009
Published: 10 February 2010
Issue Date: August 2010
DOI: https://doi.org/10.1007/s00778-009-0176-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Supporting ranking queries on uncertain and incomplete data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficiency comparisons for partially rank-ordered set sampling

Top-k Queries Over Uncertain Scores

Rank Aggregation: Models and Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Supporting ranking queries on uncertain and incomplete data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficiency comparisons for partially rank-ordered set sampling

Top-k Queries Over Uncertain Scores

Rank Aggregation: Models and Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now