ABSTRACT
In probabilistic databases the data is uncertain and is modeled by a probability distribution. The central problem in probabilistic databases is query evaluation, which requires performing not only traditional data processing such as joins, projections, unions, but also probabilistic inference in order to compute the probability of each item in the answer. At their core, probabilistic databases are a proposal to integrate logic with probability theory. This paper accompanies a talk given as part of the Gems of PODS series, and describes several results in probabilistic databases, explaining their significance in the broader context of model counting, probabilistic inference, and Statistical Relational Models.
- Antoine Amarilli, Pierre Bourhis, and Pierre Senellart. 2015. Provenance Circuits for Trees and Treelike Instances. In Automata, Languages, and Programming - 42nd International Colloquium, ICALP 2015, Kyoto, Japan, July 6--10, 2015, Proceedings, Part II. 56--68. https://doi.org/10.1007/978--3--662--47666--6_5Google ScholarCross Ref
- Antoine Amarilli and Ismail Ilkan Ceylan. 2020. A Dichotomy for Homomorphism-Closed Queries on Probabilistic Graphs. In 23rd International Conference on Database Theory, ICDT 2020, March 30-April 2, 2020, Copenhagen, Denmark. 5:1--5:20. https://doi.org/10.4230/LIPIcs.ICDT.2020.5Google ScholarCross Ref
- Antoine Amarilli and Benny Kimelfeld. 2019. Uniform Reliability of Self-Join-Free Conjunctive Queries. arXiv:1908.07093 [cs.DB]Google Scholar
- Marcelo Arenas, Pablo Barceló, and Mikaël Monet. 2019. Counting Problems over Incomplete Databases. CoRRabs/1912.11064 (2019). arXiv:1912.11064 http://arxiv.org/abs/1912.11064Google Scholar
- Fahiem Bacchus, Shannon Dalmao, and Toniann Pitassi. 2003. Algorithms and Complexity Results for #SAT and Bayesian Inference. In FOCS. 340--351.Google Scholar
- Vince Bárány, Balder ten Cate, Benny Kimelfeld, Dan Olteanu, and Zografoula Vagena. 2017. Declarative Probabilistic Programming with Datalog. ACM Trans. Database Syst.42, 4 (2017), 22:1--22:35. https://doi.org/10.1145/3132700Google ScholarDigital Library
- Roberto J. Bayardo, Jr., and J. D. Pehoushek. 2000. Counting Models using Connected Components. In AAAI. 157--162.Google Scholar
- Paul Beame, Guy Van den Broeck, Eric Gribkoff, and Dan Suciu. 2015. Symmetric Weighted First-Order Model Counting. InProceedings of the 34th ACM Symposiumon Principles of Database Systems, PODS 2015, Melbourne, Victoria, Australia, May 31 - June 4, 2015. 313--328. https://doi.org/10.1145/2745754.2745760Google ScholarDigital Library
- Paul Beame, Jerry Li, Sudeepa Roy, and Dan Suciu. 2017. Exact Model Counting of Query Expressions: Limitations of Propositional Methods. ACM Trans. Database Syst. 42, 1 (2017), 1:1--1:46. https://doi.org/10.1145/2984632Google ScholarDigital Library
- George Beskales, Mohamed A. Soliman, Ihab F. Ilyas, Shai Ben-David, and Yubin Kim. 2010. ProbClean: A probabilistic duplicate detection system. In Proceedings of the 26th International Conference on Data Engineering, ICDE 2010, March 1--6,2010, Long Beach, California, USA. 1193--1196. https://doi.org/10.1109/ICDE.2010.5447744Google ScholarCross Ref
- Johann Brault-Baron, Florent Capelli, and Stefan Mengel. 2015. Understanding Model Counting for beta-acyclic CNF-formulas. In 32nd International Symposiumon Theoretical Aspects of Computer Science (STACS 2015) (Leibniz International Proceedings in Informatics (LIPIcs)), Ernst W. Mayr and Nicolas Ollinger (Eds.),Vol. 30. Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 143--156. https://doi.org/10.4230/LIPIcs.STACS.2015.143Google ScholarCross Ref
- Ismail Ilkan Ceylan, Adnan Darwiche, and Guy Van den Broeck. 2016. Open-World Probabilistic Databases. In Principles of Knowledge Representation and Reasoning: Proceedings of the Fifteenth International Conference, KR 2016, Cape Town, South Africa, April 25--29, 2016. 339--348. http://www.aaai.org/ocs/index.php/KR/KR16/paper/view/12908Google Scholar
- Robert G. Cowell, Steffen L. Lauritzen, A. Philip David, and David J. Spiegelhalter. 1999. Probabilistic Networks and Expert Systems. Springer-Verlag New York, Inc., Secaucus, NJ, USA.Google ScholarDigital Library
- Nilesh N. Dalvi, Gerome Miklau, and Dan Suciu. 2005. Asymptotic Conditional Probabilities for Conjunctive Queries. In Database Theory - ICDT 2005, 10th International Conference, Edinburgh, UK, January 5--7, 2005, Proceedings. 289--305.https://doi.org/10.1007/978--3--540--30570--5_20Google ScholarCross Ref
- Nilesh N. Dalvi and Dan Suciu. 2004. Efficient Query Evaluation on Probabilistic Databases. In (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, Toronto, Canada, August 31 - September 3 2004. 864--875. https://doi.org/10.1016/B978-012088469--8.50076-0Google ScholarCross Ref
- Nilesh N. Dalvi and Dan Suciu. 2007. Management of probabilistic data: foundations and challenges. In Proceedings of the Twenty-Sixth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 11--13, 2007, Beijing,China. 1--12. https://doi.org/10.1145/1265530.1265531Google ScholarDigital Library
- Nilesh N. Dalvi and Dan Suciu. 2012. The dichotomy of probabilistic inference for unions of conjunctive queries. J. ACM 59, 6 (2012), 30:1--30:87. https://doi.org/10.1145/2395116.2395119Google ScholarDigital Library
- Adnan Darwiche. 2001. Decomposable negation normal form. J. ACM48, 4(2001), 608--647.Google Scholar
- Adnan Darwiche. 2001. On the Tractable Counting of Theory Models and its Application to Truth Maintenance and Belief Revision. Journal of Applied Non-Classical Logics 11, 1--2 (2001), 11--34.Google ScholarCross Ref
- Adnan Darwiche. 2009. Modeling and Reasoning with Bayesian Networks. Cambridge University Press.Google Scholar
- Adnan Darwiche and Pierre Marquis. 2002. A knowledge compilation map. J. Artif. Int. Res. 17, 1 (Sept. 2002), 229--264.Google ScholarDigital Library
- Martin Davis, George Logemann, and Donald Loveland. 1962. A machine program for theorem-proving. Commun. ACM5, 7 (1962), 394--397.Google Scholar
- Martin Davis and Hilary Putnam. 1960. A Computing Procedure for Quantification Theory. J. ACM 7, 3 (1960), 201--215.Google ScholarDigital Library
- Guy Van den Broeck, Wannes Meert, and Adnan Darwiche. 2014. Skolemization for Weighted First-Order Model Counting. In Principles of Knowledge Representation and Reasoning: Proceedings of the Fourteenth International Conference, KR 2014, Vienna, Austria, July 20--24, 2014. http://www.aaai.org/ocs/index.php/KR/KR14/paper/view/8012Google Scholar
- Guy Van den Broeck and Dan Suciu. 2017. Query Processing on Probabilistic Data: A Survey. Foundations and Trends in Databases 7, 3--4 (2017), 197--341. https://doi.org/10.1561/1900000052Google ScholarCross Ref
- Pedro M. Domingos and Daniel Lowd. 2009. Markov Logic: An Interface Layer for Artificial Intelligence. Morgan & Claypool Publishers. https://doi.org/10.2200/S00206ED1V01Y200907AIM007Google ScholarCross Ref
- Robert Fink and Dan Olteanu. 2016. Dichotomies for Queries with Negationin Probabilistic Databases. ACM Trans. Database Syst. 41, 1 (2016), 4:1--4:47. https://doi.org/10.1145/2877203Google ScholarDigital Library
- Tal Friedman and Guy Van den Broeck. 2019. On Constrained Open-World Probabilistic Databases. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10--16, 2019.5722--5729. https://doi.org/10.24963/ijcai.2019/793Google ScholarCross Ref
- Tal Friedman and Guy Van den Broeck. 2020. Symbolic Querying of Vector Spaces: Probabilistic Databases Meets Relational Embeddings. CoRRabs/2002.10029(2020). arXiv:2002.10029 https://arxiv.org/abs/2002.10029Google Scholar
- Norbert Fuhr and Thomas Rölleke. 1997. A Probabilistic Relational Algebra forthe Integration of Information Retrieval and Database Systems. ACM Trans. Inf. Syst.15, 1 (1997), 32--66. https://doi.org/10.1145/239041.239045Google ScholarDigital Library
- Wolfgang Gatterbauer and Dan Suciu. 2014. Oblivious bounds on the probability of boolean functions. ACM Trans. Database Syst. 39, 1 (2014), 5:1--5:34. https://doi.org/10.1145/2532641Google ScholarDigital Library
- Wolfgang Gatterbauer and Dan Suciu. 2015. Approximate Lifted Inference with Probabilistic Databases. PVLDB 8, 5 (2015), 629--640. https://doi.org/10.14778/2735479.2735494Google ScholarDigital Library
- Ben Taskar Lise Getoor (Ed.). 2007.Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning).Google Scholar
- Martin Charles Golumbic, Aviad Mintz, and Udi Rotics. 2006. Factoring and recognition of read-once functions using cographs and normality and the read-ability of functions associated with partial k-trees. Discret. Appl. Math. 154, 10(2006), 1465--1477. https://doi.org/10.1016/j.dam.2005.09.016Google ScholarCross Ref
- Carla P. Gomes, Ashish Sabharwal, and Bart Selman. 2009. Model Counting. In Handbook of Satisfiability. IOS Press, 633--654.Google Scholar
- Eric Gribkoff, Guy Van den Broeck, and Dan Suciu. 2014. Understanding the Complexity of Lifted Inference and Asymmetric Weighted Model Counting. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, UAI 2014, Quebec City, Quebec, Canada, July 23--27, 2014. 280--289. https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=2463&proceeding_id=30Google Scholar
- Eric Gribkoff and Dan Suciu. 2016. SlimShot: In-Database Probabilistic Inference for Knowledge Bases. PVLDB 9, 7 (2016), 552--563. https://doi.org/10.14778/2904483.2904487Google ScholarDigital Library
- Martin Grohe and Peter Lindner. 2019. Probabilistic Databases with an Infinite Open-World Assumption. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019. 17--31. https://doi.org/10.1145/3294052.3319681Google ScholarDigital Library
- Martin Grohe and Peter Lindner. 2020. Infinite Probabilistic Databases. In 23rd International Conference on Database Theory, ICDT 2020, March 30-April 2, 2020,Copenhagen, Denmark. 16:1--16:20. https://doi.org/10.4230/LIPIcs.ICDT.2020.16Google ScholarCross Ref
- Jiewen Huang, Lyublena Antova, Christoph Koch, and Dan Olteanu. 2009. May BMS: a probabilistic database management system. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009. 1071--1074. https://doi.org/10.1145/1559845.1559984Google ScholarDigital Library
- Jinbo Huang and Adnan Darwiche. 2005. DPLL with a Trace: From SAT to Knowledge Compilation. In IJCAI. 156--162.Google Scholar
- Jinbo Huang and Adnan Darwiche. 2007. The Language of Search. JAIR 29 (2007),191--219.Google ScholarCross Ref
- Tomasz Imielinski and Witold Lipski Jr. 1984. Incomplete Information in Relational Databases. J. ACM 31, 4 (1984), 761--791. https://doi.org/10.1145/1634.1886Google ScholarDigital Library
- Abhay Kumar Jha, Vibhav Gogate, Alexandra Meliou, and Dan Suciu. 2010. Lifted Inference Seen from the Other Side : The Tractable Features. In Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6--9 December 2010, Vancouver, British Columbia, Canada. 973--981. http://papers.nips.cc/paper/4067-lifted-inference-seen-from-the-other-side-the-tractable-featuresGoogle Scholar
- Abhay Kumar Jha and Dan Suciu. 2012. Probabilistic Databases with Marko Views. PVLDB 5, 11 (2012), 1160--1171. https://doi.org/10.14778/2350229.2350236Google ScholarDigital Library
- Abhay Kumar Jha and Dan Suciu. 2013. Knowledge Compilation Meets Database Theory: Compiling Queries to Decision Diagrams. Theory Comput. Syst. 52, 3(2013), 403--440. https://doi.org/10.1007/s00224-012--9392--5Google ScholarCross Ref
- Henry A. Kautz. 2020. The Third AI Summer. https://www.cs.rochester.edu/u/kautz/talks/Kautz%20Engelmore%20Lecture.pdf. The "Robert S. EngelmoreMemorial Award Lecture" at AAAI.Google Scholar
- Batya Kenig and Avigdor Gal. 2015. On the Impact of Junction-Tree Topology on Weighted Model Counting. In Scalable Uncertainty Management - 9th International Conference, SUM 2015, Québec City, QC, Canada, September 16--18, 2015. Proceedings. 83--98. https://doi.org/10.1007/978--3--319--23540-0_6Google ScholarCross Ref
- Kristian Kersting. 2012. Lifted Probabilistic Inference. In ECAI 2012 - 20th European Conference on Artificial Intelligence. Including Prestigious Applications of Artificial Intelligence (PAIS-2012) System Demonstrations Track, Montpellier, France,August 27--31, 2012. 33--38. https://doi.org/10.3233/978--1--61499-098--7--33Google ScholarCross Ref
- Benny Kimelfeld and Yehoshua Sagiv. 2008. Modeling and querying probabilistic XML data. SIGMOD Rec. 37, 4 (2008), 69--77. https://doi.org/10.1145/1519103.1519115Google ScholarDigital Library
- Angelika Kimmig, Bart Demoen, Luc De Raedt, Vítor Santos Costa, and Ricardo Rocha. 2011. On the implementation of the probabilistic logic programming language ProbLog. Theory Pract. Log. Program.11, 2--3 (2011), 235--262. https://doi.org/10.1017/S1471068410000566Google ScholarDigital Library
- Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models - Principles and Techniques. MIT Press.Google Scholar
- Richard E. Ladner. 1975. On the Structure of Polynomial Time Reducibility. J. ACM 22, 1 (1975), 155--171. https://doi.org/10.1145/321864.321877Google ScholarDigital Library
- Leonid Libkin. 2004. Elements of Finite Model Theory. Springer. https://doi.org/10.1007/978--3--662-07003--1Google ScholarCross Ref
- Leonid Libkin. 2014. Incomplete data: what went wrong, and how to fix it. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS'14, Snowbird, UT, USA, June 22--27, 2014. 1--13. https://doi.org/10.1145/2594538.2594561Google ScholarDigital Library
- Leonid Libkin. 2018. Certain Answers Meet Zero-One Laws. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Houston, TX, USA, June 10--15, 2018. 195--207. https://doi.org/10.1145/3196959.3196983Google ScholarDigital Library
- Stephen M. Majercik and Michael L. Littman. 1998. Using caching to solve larger probabilistic planning problems. In AAAI(Madison, Wisconsin, USA). 954--959.Google Scholar
- Mikaël Monet. 2019. Solving a Special Case of the Intensional vs Extensional Conjecture in Probabilistic Databases. CoRRabs/1912.11864 (2019). arXiv:1912.11864http://arxiv.org/abs/1912.11864Google Scholar
- Christian Muise, Sheila A. McIlraith, J. Christopher Beck, and Eric I. Hsu. 2012. Dsharp: fast d-DNNF compilation with sharp SAT. In Canadian AI(Toronto, ON, Canada). 356--361.Google Scholar
- Mathias Niepert and Guy Van den Broeck. 2014. Tractability through Exchange-ability: A New Perspective on Efficient Probabilistic Inference. In Proceedings ofthe Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27 -31, 2014,Québec City, Québec, Canada. 2467--2475. http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8615Google Scholar
- Dan Olteanu and Jiewen Huang. 2008. Using OBDDs for Efficient Query Evaluation on Probabilistic Databases. In Scalable Uncertainty Management, Second International Conference, SUM 2008, Naples, Italy, October 1--3, 2008. Proceedings.326--340. https://doi.org/10.1007/978--3--540--87993-0_26Google ScholarCross Ref
- Judea Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann. http://www.amazon.ca/exec/obidos/redirect?tag=citeulike09--20&path=ASIN/1558604790Google Scholar
- David Poole. 2003. First-order probabilistic inference. In IJCAI-03, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, August 9--15, 2003. 985--991. http://ijcai.org/Proceedings/03/Papers/142.pdfGoogle Scholar
- J. Scott Provan and Michael O. Ball. 1983. The Complexity of Counting Cuts and of Computing the Probability that a Graph is Connected. SIAM J. Comput. 12, 4(1983), 777--788.Google ScholarDigital Library
- Luc De Raedt, Sebastijan Dumanci?, Robin Manhaeve, and Giuseppe Marra. 2020. From Statistical Relational to Neuro-Symbolic Artificial Intelligence. CoRRabs/2003.08316v1 (2020). arXiv:2003.08316 https://arxiv.org/abs/2003.08316Google Scholar
- Luc De Raedt, Kristian Kersting, Sriraam Natarajan, and David Poole. 2016. Statistical Relational Artificial Intelligence: Logic, Probability, and Computation. Morgan & Claypool Publishers. https://doi.org/10.2200/S00692ED1V01Y201601AIM032Google ScholarCross Ref
- Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas, and Christopher Ré. 2017. HoloClean: Holistic Data Repairs with Probabilistic Inference. PVLDB 10, 11 (2017), 1190--1201. https://doi.org/10.14778/3137628.3137631Google ScholarDigital Library
- Dan Roth. 1996. On the Hardness of Approximate Reasoning. Artif. Intell. 82, 1--2(1996), 273--302.Google ScholarDigital Library
- Stuart J. Russell. 2015. Unifying logic and probability. Commun. ACM58, 7 (2015),88--97. https://doi.org/10.1145/2699411Google ScholarDigital Library
- Christopher De Sa, Ihab F. Ilyas, Benny Kimelfeld, Christopher Ré, and Theodoros Rekatsinas. 2019. A Formal Framework for Probabilistic Unclean Databases. In 22nd International Conference on Database Theory, ICDT 2019, March 26--28, 2019, Lisbon, Portugal. 6:1--6:18. https://doi.org/10.4230/LIPIcs.ICDT.2019.6Google ScholarCross Ref
- Tian Sang, Fahiem Bacchus, Paul Beame, Henry A. Kautz, and Toniann Pitassi. 2004. Combining Component Caching and Clause Learning for Effective Model Counting. In SAT.Google Scholar
- Anish Das Sarma, Omar Benjelloun, Alon Y. Halevy, Shubha U. Nabar, and Jennifer Widom. 2009. Representing uncertain data: models, properties, and algorithms. VLDB J. 18, 5 (2009), 989--1019. https://doi.org/10.1007/s00778-009-0147-0Google ScholarDigital Library
- Julia Stoyanovich, Susan B. Davidson, Tova Milo, and Val Tannen. 2011. Deriving probabilistic databases with inference ensembles. In Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11--16, 2011, Hannover, Germany. 303--314. https://doi.org/10.1109/ICDE.2011.5767854Google ScholarDigital Library
- Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. 2011. Probabilistic Databases. Morgan & Claypool Publishers. https://doi.org/10.2200/S00362ED1V01Y201105DTM016Google ScholarCross Ref
- Marc Thurley. 2006. sharpSAT: counting models with advanced component caching and implicit BCP. In SAT(Seattle, WA). 424--429.Google Scholar
- BA Trakhtenbrot. 1950. The impossibility of an algorithm for the decidability problem on finite classes. In Doklady AN SSR, Vol. 70. 569--572.Google Scholar
- Leslie G. Valiant. 1979. The Complexity of Computing the Permanent. Theor. Comput. Sci.8 (1979), 189--201. https://doi.org/10.1016/0304--3975(79)90044--6Google ScholarCross Ref
- Leslie G. Valiant. 1979. The Complexity of Enumeration and Reliability Problems. SIAM J. Comput. 8, 3 (1979), 410--421. https://doi.org/10.1137/0208032Google ScholarDigital Library
- Moshe Y. Vardi. 1982. The Complexity of Relational Query Languages (Extended Abstract). In Proceedings of the 14th Annual ACM Symposium on Theory of Computing, May 5--7, 1982, San Francisco, California, USA. 137--146. https://doi.org/10.1145/800070.802186Google ScholarDigital Library
- Kai Zeng, Shi Gao, Barzan Mozafari, and Carlo Zaniolo. 2014. The analytical bootstrap: a new method for fast error estimation in approximate query processing. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22--27, 2014. 277--288. https://doi.org/10.1145/2588555.2588579Google ScholarDigital Library
- Ce Zhang, Christopher Ré, Michael J. Cafarella, Jaeho Shin, Feiran Wang, and Sen Wu. 2017. DeepDive: declarative knowledge base construction. Commun. ACM 60, 5 (2017), 93--102. https://doi.org/10.1145/3060586Google ScholarDigital Library
Recommendations
Probabilistic Group Nearest Neighbor Queries in Uncertain Databases
The importance of query processing over uncertain data has recently arisen due to its wide usage in many real-world applications. In the context of uncertain databases, previous work have studied many query types such as nearest neighbor query, range ...
Answering Frequent Probabilistic Inference Queries in Databases
Existing solutions for probabilistic inference queries mainly focus on answering a single inference query, but seldom address the issues of efficiently returning results for a sequence of frequent queries, which is more popular and practical in many ...
Top-k best probability queries and semantics ranking properties on probabilistic databases
There has been much interest in answering top-k queries on probabilistic data in various applications such as market analysis, personalized services, and decision making. In probabilistic relational databases, the most common problem in answering top-k ...
Comments