skip to main content
10.1145/1739041.1739082acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Bridging the gap between intensional and extensional query evaluation in probabilistic databases

Published: 22 March 2010 Publication History

Abstract

There are two broad approaches to query evaluation over probabilistic databases: (1) Intensional Methods proceed by manipulating expressions over symbolic events associated with uncertain tuples. This approach is very general and can be applied to any query, but requires an expensive postprocessing phase, which involves some general-purpose probabilistic inference. (2) Extensional Methods, on the other hand, evaluate the query by translating operations over symbolic events to a query plan; extensional methods scale well, but they are restricted to safe queries.
In this paper, we bridge this gap by proposing an approach that can translate the evaluation of any query into extensional operators, followed by some post-processing that requires probabilistic inference. Our approach uses characteristics of the data to adapt smoothly between the two evaluation strategies. If the query is safe or becomes safe because of the data instance, then the evaluation is completely extensional and inside the database. If the query/data combination departs from the ideal setting of a safe query, then some intensional processing is performed, whose complexity depends only on the distance from the ideal setting.

References

[1]
L. Antova, T. Jansen, C. Koch, and D. Olteanu. Fast and simple relational processing of uncertain data. In ICDE, pages 983--992, 2008.
[2]
O. Benjelloun, A. D. Sarma, A. Y. Halevy, and J. Widom. Uldbs: Databases with uncertainty and lineage. In VLDB, pages 953--964, 2006.
[3]
H. L. Bodlaender. A linear time algorithm for finding tree-decompositions of small treewidth. In STOC '93: Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, pages 226--234, New York, NY, USA, 1993. ACM.
[4]
J. Boulos, N. N. Dalvi, B. Mandhani, S. Mathur, C. Ré, and D. Suciu. Mystiq: a system for finding more answers by using probabilities. In SIGMOD Conference, pages 891--893, 2005.
[5]
R. G. Cowell, S. L. Lauritzen, A. P. David, and D. J. Spiegelhalter. Probabilistic Networks and Expert Systems. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1999.
[6]
P. Dagum and M. Luby. An optimal approximation algorithm for bayesian inference. Artif. Intell., 93(1--2):1--27, 1997.
[7]
N. Dalvi and D. Suciu. Management of probabilistic data: foundations and challenges. In PODS, pages 1--12, New York, NY, USA, 2007. ACM Press.
[8]
N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, pages 864--875, 2004.
[9]
N. N. Dalvi and D. Suciu. The dichotomy of conjunctive queries on probabilistic structures. In PODS, pages 293--302, 2007.
[10]
E. Fischer, J. A. Makowsky, and E. V. Ravve. Counting truth assignments of formulas of bounded tree-width or clique-width. Discrete Applied Mathematics, 156(4):511--529, 2008.
[11]
J. Huang, L. Antova, C. Koch, and D. Olteanu. "MayBMS: A Probabilistic Database Management System". In Proc. SIGMOD, 2009.
[12]
J. Huang and A. Darwichc. Using dpll for efficient obdd construction. In SAT, 2004.
[13]
R. Jampani, F. Xu, M. Wu, L. L. Perez, C. M. Jermaine, and P. J. Haas. Mcdb: a monte carlo approach to managing uncertain data. In SIGMOD Conference, pages 687--700, 2008.
[14]
A. Jha, D. Olteanu, and D. Suciu. Bridging the gap between intensional and extensional query evaluation in probabilistic databases. Technical Report, UW-CSE-10-01-01, 2010.
[15]
M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. An introduction to variational methods for graphical models. Machine Learning, 37(2):183--233, 1999.
[16]
C. Koch and D. Olteanu. Conditioning probabilistic databases. PVLDB, 1(1):313--325, 2008.
[17]
D. Olteanu and J. Huang. Using obdds for efficient query evaluation on probabilistic databases. In SUM, pages 326--340, 2008.
[18]
D. Olteanu, J. Huang, and C. Koch. Sprout: Lazy vs. eager query plans for tuple-independent probabilistic databases. In ICDE, pages 640--651, 2009.
[19]
D. Olteanu, J. Huang, and C. Koch. Approximate confidence computation on probabilistic databases. In ICDE, 2010.
[20]
J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1988.
[21]
C. Re, N. N. Dalvi, and D. Suciu. Efficient top-k query evaluation on probabilistic data. In ICDE, pages 886--895, 2007.
[22]
I. Rish. Efficient reasoning in graphical models. PhD thesis, UCI, 1999. Chair-Dechter, Rina.
[23]
A. D. Sarma, M. Theobald, and J. Widom. Exploiting lineage for confidence computation in uncertain and probabilistic databases. In ICDE, pages 1023--1032, 2008.
[24]
M. Sauerhoff, I. Wegener, and R. Werchner. Optimal ordered binary decision diagrams for read-once formulas. Discrete Applied Mathematics, 103(1--3):237--258, 2000.
[25]
P. Sen and A. Deshpande. Representing and querying correlated tuples in probabilistic databases. In In ICDE, 2007.
[26]
J. S. Yedidia, W. T. Freeman, and Y. Weiss. Generalized belief propagation. In NIPS, pages 689--695, 2000.

Cited By

View all
  • (2023)Probabilistic Query Evaluation: The Combined FPRAS LandscapeProceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3584372.3588677(339-347)Online publication date: 18-Jun-2023
  • (2020)Queries with difference on probabilistic databasesProceedings of the VLDB Endowment10.14778/3402707.34027414:11(1051-1062)Online publication date: 3-Jun-2020
  • (2019)Connecting Knowledge Compilation Classes Width ParametersTheory of Computing Systems10.1007/s00224-019-09930-2Online publication date: 10-Jun-2019
  • Show More Cited By

Recommendations

Reviews

Giuseppina Carla Gini

In this paper, Jha, Olteanu, and Suciu investigate efficient computation in probabilistic databases. They give an algorithmic solution to the evaluation of queries over probabilistic databases that integrates the extensional methods with the intensional ones. The authors immediately approach the difficult problem of handling conjunctive queries in probabilistic databases. Such queries can be tractable (safe) and answered using an extensional plan, or they can be nondeterministic polynomial-time hard (NP-hard) (unsafe), for which an extensional plan does not exist. In the latter case, the usual solution is to use intensional inference techniques that rely on probabilistic inference methods or Bayesian network representations. In order to fill this gap, the authors propose a new safety criterion. In practice, any plan is considered safe, except for some possible offending tuples. They introduce partial lineage as a way to combine numeric expressions with symbols-symbols represent the offending tuples in the data. So, an extensional way only treats the nonoffending tuples. An improvement to the existing way, this solution treats the entire unsafe plan with intensional inference. The paper includes definitions, a few demonstrations of theorems, and extensive examples. It provides experimental results for the method's implementation and compares it to MayBMS, a state-of-the-art database management system (DBMS) for probabilistic databases. The comparison shows a clear improvement in execution times. This clear and interesting paper suggests many possible extensions. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT '10: Proceedings of the 13th International Conference on Extending Database Technology
March 2010
741 pages
ISBN:9781605589459
DOI:10.1145/1739041
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 March 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. probabilistic databases
  2. query processing

Qualifiers

  • Research-article

Funding Sources

Conference

EDBT/ICDT '10
EDBT/ICDT '10: EDBT/ICDT '10 joint conference
March 22 - 26, 2010
Lausanne, Switzerland

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Probabilistic Query Evaluation: The Combined FPRAS LandscapeProceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3584372.3588677(339-347)Online publication date: 18-Jun-2023
  • (2020)Queries with difference on probabilistic databasesProceedings of the VLDB Endowment10.14778/3402707.34027414:11(1051-1062)Online publication date: 3-Jun-2020
  • (2019)Connecting Knowledge Compilation Classes Width ParametersTheory of Computing Systems10.1007/s00224-019-09930-2Online publication date: 10-Jun-2019
  • (2018)Provenance and Probabilities in Relational DatabasesACM SIGMOD Record10.1145/3186549.318655146:4(5-15)Online publication date: 22-Feb-2018
  • (2017)Dissociation and propagation for approximate lifted inference with standard relational database management systemsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-016-0434-526:1(5-30)Online publication date: 1-Feb-2017
  • (2015)Approximate lifted inference with probabilistic databasesProceedings of the VLDB Endowment10.14778/2735479.27354948:5(629-640)Online publication date: 1-Jan-2015
  • (2015)Representing and processing lineages over uncertain data based on the Bayesian networkApplied Soft Computing10.1016/j.asoc.2015.07.04737:C(345-362)Online publication date: 1-Dec-2015
  • (2014)Oblivious bounds on the probability of boolean functionsACM Transactions on Database Systems10.1145/253264139:1(1-34)Online publication date: 6-Jan-2014
  • (2012)On the tractability of query compilation and bounded treewidthProceedings of the 15th International Conference on Database Theory10.1145/2274576.2274603(249-261)Online publication date: 26-Mar-2012
  • (2012)Local structure and determinism in probabilistic databasesProceedings of the 2012 ACM SIGMOD International Conference on Management of Data10.1145/2213836.2213879(373-384)Online publication date: 20-May-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media