ABSTRACT
This paper investigates the problem of efficiently computing the confidences of distinct tuples in the answers to conjunctive queries with inequalities (<) on tuple-independent probabilistic databases. This problem is fundamental to probabilistic databases and was recently stated open.
Our contributions are of both theoretical and practical importance. We define a class of tractable queries with inequalities, and generalize existing results on #P-hardness of query evaluation, now in the presence of inequalities.
For the tractable queries, we introduce a confidence computation technique based on efficient compilation of the lineage of the query answer into Ordered Binary Decision Diagrams (OBDDs), whose sizes are linear in the number of variables of the lineage.
We implemented a secondary-storage variant of our technique in PostgreSQL. This variant does not need to materialize the OBDD, but computes, in one scan over the lineage, the probabilities of OBDD fragments and combines them on the fly. Experiments with probabilistic TPC-H data show up to two orders of magnitude improvements when compared with state-of-the-art approaches.
- E. Adar and C. Re. "Managing Uncertainty in Social Networks". IEEE Data Eng. Bull. 30(2), 2007.Google Scholar
- L. Antova, C. Koch, and D. Olteanu. "10106 Worlds and Beyond: Efficient Representation and Processing of Incomplete Information". In Proc. ICDE 2007.Google Scholar
- O. Benjelloun, A. D. Sarma, A. Halevy, and J. Widom. "ULDBs: Databases with Uncertainty and Lineage". In Proc. VLDB 2006. Google ScholarDigital Library
- P. Dagum, R. M. Karp, M. Luby, and S. M. Ross. "An Optimal Algorithm for Monte Carlo Estimation". SIAM J. Comput. 29 5): 1484--1496, 2000. Google ScholarDigital Library
- N. Dalvi and D. Suciu. "Efficient Query Evaluation on Probabilistic Databases". VLDB Journal 16 4), 2007. Google ScholarDigital Library
- N. Dalvi and D. Suciu. "Management of Probabilistic Data: Foundations and Challenges". In Proc. PODS 2007. Google ScholarDigital Library
- N. Dalvi and D. Suciu. "The Dichotomy of Conjunctive Queries on Probabilistic Structures". In Proc. PODS 2007. Google ScholarDigital Library
- A. Darwiche and P. Marquis. "A knowlege compilation map". Journal of AI Research 17: 229--264, 2002. Google ScholarDigital Library
- J. Huang, L. Antova, C. Koch, and D. Olteanu. "MayBMS: A Probabilistic Database Management System". In Proc. SIGMOD 2009. Google ScholarDigital Library
- R. Jampani, F. Xu, M. Wu, L. L. Perez, C. M. Jermaine, and P. J. Haas. "MCDB: a Monte Carlo Approach to Managing Uncertain Data". In Proc. SIGMOD 2008. Google ScholarDigital Library
- R. M. Karp and M. Luby. "Monte-Carlo Algorithms for Enumeration and Reliability Problems". In Proc. FOCS pages 56--64, 1983. Google ScholarDigital Library
- C. Koch and D. Olteanu. "Conditioning Probabilistic Databases". PVLDB 1(1), 2008. Google ScholarDigital Library
- C. Meinel and T. Theobald. Algorithms and Data Structures in VLSI Design Springer-Verlag, 1998. Google ScholarDigital Library
- D. Olteanu and J. Huang. "Using OBDDs for Efficient Query Evaluation on Probabilistic Databases". In Proc. SUM 2008. Google ScholarDigital Library
- D. Olteanu, J. Huang, and C. Koch. "SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases". In Proc. ICDE 2009. Google ScholarDigital Library
- C. Re, N. Dalvi, and D. Suciu. Efficient top-k query evaluation on probabilistic data. In Proc. ICDE 2007.Google ScholarCross Ref
Index Terms
- Secondary-storage confidence computation for conjunctive queries with inequalities
Recommendations
Consensus answers for queries over probabilistic databases
PODS '09: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsWe address the problem of finding a "best" deterministic query answer to a query over a probabilistic database. For this purpose, we propose the notion of a consensus world (or a consensus answer) which is a deterministic world (answer) that minimizes ...
Answering Conjunctive Queries with Inequalities
In this paper, we study the complexity of answering conjunctive queries (CQ) with inequalities (ź). In particular, we are interested in comparing the complexity of the query with and without inequalities. The main contribution of our work is a novel ...
The dichotomy of conjunctive queries on probabilistic structures
PODS '07: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsWe show that for every conjunctive query, the complexity of evaluating it on a probabilistic database is either PTIME or P-complete, and we give an algorithm for deciding whether a given conjunctive query is PTIME or P-complete. The dichotomy property ...
Comments