10 Years of Probabilistic Querying – What Next?

Theobald, Martin; De Raedt, Luc; Dylla, Maximilian; Kimmig, Angelika; Miliaraki, Iris

doi:10.1007/978-3-642-40683-6_1

Martin Theobald¹⁹,
Luc De Raedt²⁰,
Maximilian Dylla²¹,
Angelika Kimmig²⁰ &
…
Iris Miliaraki²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8133))

Included in the following conference series:

East European Conference on Advances in Databases and Information Systems

1084 Accesses
2 Citations

Abstract

Over the past decade, the two research areas of probabilistic databases and probabilistic programming have intensively studied the problem of making structured probabilistic inference scalable, but—so far—both areas developed almost independently of one another. While probabilistic databases have focused on describing tractable query classes based on the structure of query plans and data lineage, probabilistic programming has contributed sophisticated inference techniques based on knowledge compilation and lifted (first-order) inference. Both fields have developed their own variants of—both exact and approximate—top-k algorithms for query evaluation, and both investigate query optimization techniques known from SQL, Datalog, and Prolog, which all calls for a more intensive study of the commonalities and integration of the two fields. Moreover, we believe that natural-language processing and information extraction will remain a driving factor and in fact a longstanding challenge for developing expressive representation models which can be combined with structured probabilistic inference—also for the next decades to come.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Tutorial on Query Answering and Reasoning over Probabilistic Knowledge Bases

Challenges for Efficient Query Evaluation on Structured Probabilistic Data

Dissociation and propagation for approximate lifted inference with standard relational database management systems

Article 16 July 2016

References

Abiteboul, S., Kanellakis, P., Grahne, G.: On the representation and querying of sets of possible worlds. Theor. Comput. Sci. 78(1), 159–187 (1991)
Article MathSciNet MATH Google Scholar
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the Web. In: IJCAI, pp. 2670–2676 (2007)
Google Scholar
Benjelloun, O., Sarma, A.D., Halevy, A.Y., Theobald, M., Widom, J.: Databases with uncertainty and lineage. VLDB J. 17(2), 243–264 (2008)
Article Google Scholar
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the Web of Data. J. Web Sem. 7(3), 154–165 (2009)
Article Google Scholar
Boulos, J., Dalvi, N.N., Mandhani, B., Mathur, S., Ré, C., Suciu, D.: MYSTIQ: a system for finding more answers by using probabilities. In: SIGMOD, pp. 891–893 (2005)
Google Scholar
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr., E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI (2010)
Google Scholar
Cheney, J., Chiticariu, L., Tan, W.-C.: Provenance in databases: Why, how, and where. Found. Trends Databases 1, 379–474 (2009)
Article Google Scholar
Corro, L.D., Gemulla, R.: ClausIE: clause-based open information extraction. In: WWW, pp. 355–366 (2013)
Google Scholar
Dalvi, N.N., Suciu, D.: The dichotomy of conjunctive queries on probabilistic structures. In: PODS, pp. 293–302 (2007)
Google Scholar
Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB J. 16(4), 523–544 (2007)
Article Google Scholar
Darwiche, A., Marquis, P.: A knowledge compilation map. Journal of Artificial Intelligence Research 17(1), 229–264 (2002)
MathSciNet MATH Google Scholar
Raedt, L.D., Kimmig, A., Toivonen, H.: ProbLog: A probabilistic Prolog and its application in link discovery. In: IJCAI, pp. 2462–2467 (2007)
Google Scholar
de Braz, R.S., Amir, E., Roth, D.: Lifted first-order probabilistic inference. In: Getoor, L., Taskar, B. (eds.) Introduction to Statistical Relational Learning. MIT Press (2007)
Google Scholar
Van den Broeck, G.: On the completeness of first-order knowledge compilation for lifted probabilistic inference. In: NIPS, pp. 1386–1394 (2011)
Google Scholar
Van den Broeck, G., Taghipour, N., Meert, W., Davis, J., De Raedt, L.: Lifted probabilistic inference by first-order knowledge compilation. In: IJCAI, pp. 2178–2185 (2011)
Google Scholar
Dietterich, T.G., Domingos, P., Getoor, L., Muggleton, S., Tadepalli, P.: Structured machine learning: the next ten years. Machine Learning 73(1), 3–23 (2008)
Article Google Scholar
Domingos, P., Lowd, D.: Markov Logic: An Interface Layer for Artificial Intelligence. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers (2009)
Google Scholar
Dylla, M., Miliaraki, I., Theobald, M.: Top-k query processing in probabilistic databases with non-materialized views. In: ICDE, pp. 122–133 (2013)
Google Scholar
Etzioni, O., Banko, M., Cafarella, M.J.: Machine reading. In: AAAI Spring Symposium: Machine Reading, pp. 1–5 (2007)
Google Scholar
Etzioni, O., Fader, A., Christensen, J., Soderland, S.: Mausam: Open information extraction: The second generation. In: IJCAI, pp. 3–10 (2011)
Google Scholar
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Article MathSciNet MATH Google Scholar
Fierens, D., Van den Broeck, G., Thon, I., Gutmann, B., De Raedt, L.: Inference in probabilistic logic programs using weighted CNF’s. In: UAI, pp. 211–220 (2011)
Google Scholar
Fink, R., Han, L., Olteanu, D.: Aggregation in probabilistic databases via knowledge compilation. PVLDB 5(5), 490–501 (2012)
Google Scholar
Fink, R., Olteanu, D.: On the optimal approximation of queries using tractable propositional languages. In: ICDT, pp. 174–185 (2011)
Google Scholar
Fink, R., Olteanu, D., Rath, S.: Providing support for full relational algebra in probabilistic databases. In: ICDE, pp. 315–326 (2011)
Google Scholar
Galárraga, L.A., Teflioudi, C., Hose, K., Suchanek, F.M.: AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In: WWW, pp. 413–422 (2013)
Google Scholar
Getoor, L., Taskar, B.: An Introduction to Statistical Relational Learning. MIT Press (2007)
Google Scholar
Goodman, N.D., Mansinghka, V.K., Roy, D.M., Bonawitz, K., Tenenbaum, J.B.: Church: A language for generative models. In: UAI, pp. 220–229 (2008)
Google Scholar
Guptaand, R., Sarawagi, S.: Creating probabilistic databases from information extraction models. In: VLDB, pp. 965–976 (2006)
Google Scholar
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194, 28–61 (2013)
Article MathSciNet Google Scholar
Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)
Google Scholar
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40, 11:1–11:58 (2008)
Google Scholar
Jha, A.K., Gogate, V., Meliou, A., Suciu, D.: Lifted inference seen from the other side: The tractable features. In: NIPS, pp. 973–981 (2010)
Google Scholar
Jha, A.K., Suciu, D.: Knowledge compilation meets database theory: compiling queries to decision diagrams. In: ICDT, pp. 162–173 (2011)
Google Scholar
Jha, A.K., Suciu, D.: On the tractability of query compilation and bounded treewidth. In: ICDT, pp. 249–261 (2012)
Google Scholar
Jha, A.K., Suciu, D.: Probabilistic databases with MarkoViews. PVLDB 5(11), 1160–1171 (2012)
Google Scholar
Kanagal, B., Deshpande, A.: Lineage processing over correlated probabilistic databases. In: SIGMOD, pp. 675–686 (2010)
Google Scholar
Kersting, K.: Lifted probabilistic inference. In: ECAI, pp. 33–38 (2012)
Google Scholar
Kimmig, A., Demoen, B., De Raedt, L., Costa, V.S., Rocha, R.: On the implementation of the probabilistic logic programming language ProbLog. Theory and Practice of Logic Programming 11, 235–262 (2011)
Article MathSciNet MATH Google Scholar
Koch, C., Olteanu, D.: Conditioning probabilistic databases. PVLDB 1(1), 313–325 (2008)
Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional Random Fields: Probabilistic models for segmenting and labeling sequence data. In: ICML, pp. 282–289 (2001)
Google Scholar
Li, J., Saha, B., Deshpande, A.: A unified approach to ranking in probabilistic databases. PVLDB 2(1), 502–513 (2009)
Google Scholar
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed GraphLab: A framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012)
Google Scholar
McCallum, A., Schultz, K., Singh, S.: FactorIE: Probabilistic programming via imperatively defined factor graphs. In: NIPS, pp. 1249–1257 (2009)
Google Scholar
Milch, B., Zettlemoyer, L.S., Kersting, K., Haimes, M., Kaelbling, L.P.: Lifted probabilistic inference with counting formulas. In: AAAI, pp. 1062–1068 (2008)
Google Scholar
Mutsuzaki, M., Theobald, M., de Keijzer, A., Widom, J., Agrawal, P., Benjelloun, O., Sarma, A.D., Murthy, R., Sugihara, T.: Trio-One: Layering uncertainty and lineage on a conventional DBMS. In: CIDR, pp. 269–274 (2007)
Google Scholar
Nakashole, N., Theobald, M., Weikum, G.: Scalable knowledge harvesting with high precision and high recall. In: WSDM, pp. 227–236 (2011)
Google Scholar
Nakashole, N., Weikum, G., Suchanek, F.M.: Discovering and exploring relations on the Web. PVLDB 5(12), 1982–1985 (2012)
Google Scholar
Niu, F., Ré, C., Doan, A., Shavlik, J.W.: Tuffy: Scaling up statistical inference in Markov Logic Networks using an RDBMS. PVLDB 4(6), 373–384 (2011)
Google Scholar
Olteanu, D., Huang, J.: Using OBDDs for efficient query evaluation on probabilistic databases. In: Greco, S., Lukasiewicz, T. (eds.) SUM 2008. LNCS (LNAI), vol. 5291, pp. 326–340. Springer, Heidelberg (2008)
Chapter Google Scholar
Olteanu, D., Wen, H.: Ranking query answers in probabilistic databases: Complexity and efficient algorithms. In: ICDE, pp. 282–293 (2012)
Google Scholar
Pfeffer, A.: IBAL: A probabilistic rational programming language. In: IJCAI, pp. 733–740 (2001)
Google Scholar
Poole, D.: The independent choice logic for modelling multiple agents under uncertainty. Artificial Intelligence 94(1-2), 7–56 (1997)
Article MathSciNet MATH Google Scholar
Poole, D.: First-order probabilistic inference. In: IJCAI, pp. 985–991 (2003)
Google Scholar
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 257–286 (1989)
Google Scholar
De Raedt, L., Frasconi, P., Kersting, K., Muggleton, S.H. (eds.): Probabilistic Inductive Logic Programming. LNCS (LNAI), vol. 4911. Springer, Heidelberg (2008)
MATH Google Scholar
Ré, C., Dalvi, N.N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE, pp. 886–895 (2007)
Google Scholar
Renkens, J., Van den Broeck, G., Nijssen, S.: k-optimal: A novel approximate inference algorithm for ProbLog. Machine Learning 89(3), 215–231 (2012)
Article MathSciNet MATH Google Scholar
Riedel, S.: Improving the accuracy and efficiency of MAP inference for Markov Logic. In: UAI, pp. 468–475 (2008)
Google Scholar
Roth, D.: On the hardness of approximate reasoning. Artif. Intell. 82, 273–302 (1996)
Article Google Scholar
Sarma, A.D., Theobald, M., Widom, J.: Exploiting lineage for confidence computation in uncertain and probabilistic databases. In: ICDE, pp. 1023–1032 (2008)
Google Scholar
Sato, T.: A statistical learning method for logic programs with distribution semantics. In: ICLP, pp. 715–729 (1995)
Google Scholar
Sen, P., Deshpande, A., Getoor, L.: PrDB: managing and exploiting rich correlations in probabilistic databases. VLDB J. 18(5), 1065–1090 (2009)
Article Google Scholar
Sen, P., Deshpande, A., Getoor, L.: Read-once functions and query evaluation in probabilistic databases. PVLDB 3(1), 1068–1079 (2010)
Google Scholar
Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers (2011)
Google Scholar
Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. In: VLDB, pp. 648–659 (2004)
Google Scholar
Van den Broeck, G.: Lifted Inference and Learning in Statistical Relational Models. PhD thesis, Informatics Section, Department of Computer Science, Faculty of Engineering Science, Katholieke Universiteit Leuven (January 2013)
Google Scholar
Van den Broeck, G., Taghipour, N., Meert, W., Davis, J., De Raedt, L.: Lifted probabilistic inference by first-order knowledge compilation. In: IJCAI, pp. 2178–2185 (2011)
Google Scholar
Wang, D.Z., Franklin, M.J., Garofalakis, M.N., Hellerstein, J.M.: Querying probabilistic information extraction. PVLDB 3(1), 1057–1067 (2010)
Google Scholar
Wang, D.Z., Michelakis, E., Franklin, M.J., Garofalakis, M.N., Hellerstein, J.M.: Probabilistic declarative information extraction. In: ICDE, pp. 173–176 (2010)
Google Scholar
Weikum, G., Theobald, M.: From information to knowledge: harvesting entities and relationships from Web sources. In: PODS, pp. 65–76 (2010)
Google Scholar
Widom, J.: Trio: A system for data, uncertainty, and lineage. In: Managing and Mining Uncertain Data. Springer (2008)
Google Scholar
Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for Web scale RDF data. In: SIGMOD (to appear, 2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Universiteit Antwerpen, Middelheimlaan 1, 2020, Antwerp, Belgium
Martin Theobald
Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001, Heverlee, Belgium
Luc De Raedt & Angelika Kimmig
Max Planck Institut Informatik, Campus E1.4, 66123, Saarbrücken, Germany
Maximilian Dylla & Iris Miliaraki

Authors

Martin Theobald
View author publications
You can also search for this author in PubMed Google Scholar
Luc De Raedt
View author publications
You can also search for this author in PubMed Google Scholar
Maximilian Dylla
View author publications
You can also search for this author in PubMed Google Scholar
Angelika Kimmig
View author publications
You can also search for this author in PubMed Google Scholar
Iris Miliaraki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Università di Genova, Italy
Barbara Catania
DIBRIS, Università di Genova, Italy
Giovanna Guerrini
Department of Software Engineering Faculty of Mathematics and Physics, Charles University, Malostranské nám. 25, 11800, Prague 1, Czech Republic
Jaroslav Pokorný

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Theobald, M., De Raedt, L., Dylla, M., Kimmig, A., Miliaraki, I. (2013). 10 Years of Probabilistic Querying – What Next?. In: Catania, B., Guerrini, G., Pokorný, J. (eds) Advances in Databases and Information Systems. ADBIS 2013. Lecture Notes in Computer Science, vol 8133. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40683-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-40683-6_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40682-9
Online ISBN: 978-3-642-40683-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics