Abstract
Over the past decade, the two research areas of probabilistic databases and probabilistic programming have intensively studied the problem of making structured probabilistic inference scalable, but—so far—both areas developed almost independently of one another. While probabilistic databases have focused on describing tractable query classes based on the structure of query plans and data lineage, probabilistic programming has contributed sophisticated inference techniques based on knowledge compilation and lifted (first-order) inference. Both fields have developed their own variants of—both exact and approximate—top-k algorithms for query evaluation, and both investigate query optimization techniques known from SQL, Datalog, and Prolog, which all calls for a more intensive study of the commonalities and integration of the two fields. Moreover, we believe that natural-language processing and information extraction will remain a driving factor and in fact a longstanding challenge for developing expressive representation models which can be combined with structured probabilistic inference—also for the next decades to come.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abiteboul, S., Kanellakis, P., Grahne, G.: On the representation and querying of sets of possible worlds. Theor. Comput. Sci. 78(1), 159–187 (1991)
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the Web. In: IJCAI, pp. 2670–2676 (2007)
Benjelloun, O., Sarma, A.D., Halevy, A.Y., Theobald, M., Widom, J.: Databases with uncertainty and lineage. VLDB J. 17(2), 243–264 (2008)
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the Web of Data. J. Web Sem. 7(3), 154–165 (2009)
Boulos, J., Dalvi, N.N., Mandhani, B., Mathur, S., Ré, C., Suciu, D.: MYSTIQ: a system for finding more answers by using probabilities. In: SIGMOD, pp. 891–893 (2005)
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr., E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI (2010)
Cheney, J., Chiticariu, L., Tan, W.-C.: Provenance in databases: Why, how, and where. Found. Trends Databases 1, 379–474 (2009)
Corro, L.D., Gemulla, R.: ClausIE: clause-based open information extraction. In: WWW, pp. 355–366 (2013)
Dalvi, N.N., Suciu, D.: The dichotomy of conjunctive queries on probabilistic structures. In: PODS, pp. 293–302 (2007)
Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB J. 16(4), 523–544 (2007)
Darwiche, A., Marquis, P.: A knowledge compilation map. Journal of Artificial Intelligence Research 17(1), 229–264 (2002)
Raedt, L.D., Kimmig, A., Toivonen, H.: ProbLog: A probabilistic Prolog and its application in link discovery. In: IJCAI, pp. 2462–2467 (2007)
de Braz, R.S., Amir, E., Roth, D.: Lifted first-order probabilistic inference. In: Getoor, L., Taskar, B. (eds.) Introduction to Statistical Relational Learning. MIT Press (2007)
Van den Broeck, G.: On the completeness of first-order knowledge compilation for lifted probabilistic inference. In: NIPS, pp. 1386–1394 (2011)
Van den Broeck, G., Taghipour, N., Meert, W., Davis, J., De Raedt, L.: Lifted probabilistic inference by first-order knowledge compilation. In: IJCAI, pp. 2178–2185 (2011)
Dietterich, T.G., Domingos, P., Getoor, L., Muggleton, S., Tadepalli, P.: Structured machine learning: the next ten years. Machine Learning 73(1), 3–23 (2008)
Domingos, P., Lowd, D.: Markov Logic: An Interface Layer for Artificial Intelligence. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers (2009)
Dylla, M., Miliaraki, I., Theobald, M.: Top-k query processing in probabilistic databases with non-materialized views. In: ICDE, pp. 122–133 (2013)
Etzioni, O., Banko, M., Cafarella, M.J.: Machine reading. In: AAAI Spring Symposium: Machine Reading, pp. 1–5 (2007)
Etzioni, O., Fader, A., Christensen, J., Soderland, S.: Mausam: Open information extraction: The second generation. In: IJCAI, pp. 3–10 (2011)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Fierens, D., Van den Broeck, G., Thon, I., Gutmann, B., De Raedt, L.: Inference in probabilistic logic programs using weighted CNF’s. In: UAI, pp. 211–220 (2011)
Fink, R., Han, L., Olteanu, D.: Aggregation in probabilistic databases via knowledge compilation. PVLDB 5(5), 490–501 (2012)
Fink, R., Olteanu, D.: On the optimal approximation of queries using tractable propositional languages. In: ICDT, pp. 174–185 (2011)
Fink, R., Olteanu, D., Rath, S.: Providing support for full relational algebra in probabilistic databases. In: ICDE, pp. 315–326 (2011)
Galárraga, L.A., Teflioudi, C., Hose, K., Suchanek, F.M.: AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In: WWW, pp. 413–422 (2013)
Getoor, L., Taskar, B.: An Introduction to Statistical Relational Learning. MIT Press (2007)
Goodman, N.D., Mansinghka, V.K., Roy, D.M., Bonawitz, K., Tenenbaum, J.B.: Church: A language for generative models. In: UAI, pp. 220–229 (2008)
Guptaand, R., Sarawagi, S.: Creating probabilistic databases from information extraction models. In: VLDB, pp. 965–976 (2006)
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194, 28–61 (2013)
Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40, 11:1–11:58 (2008)
Jha, A.K., Gogate, V., Meliou, A., Suciu, D.: Lifted inference seen from the other side: The tractable features. In: NIPS, pp. 973–981 (2010)
Jha, A.K., Suciu, D.: Knowledge compilation meets database theory: compiling queries to decision diagrams. In: ICDT, pp. 162–173 (2011)
Jha, A.K., Suciu, D.: On the tractability of query compilation and bounded treewidth. In: ICDT, pp. 249–261 (2012)
Jha, A.K., Suciu, D.: Probabilistic databases with MarkoViews. PVLDB 5(11), 1160–1171 (2012)
Kanagal, B., Deshpande, A.: Lineage processing over correlated probabilistic databases. In: SIGMOD, pp. 675–686 (2010)
Kersting, K.: Lifted probabilistic inference. In: ECAI, pp. 33–38 (2012)
Kimmig, A., Demoen, B., De Raedt, L., Costa, V.S., Rocha, R.: On the implementation of the probabilistic logic programming language ProbLog. Theory and Practice of Logic Programming 11, 235–262 (2011)
Koch, C., Olteanu, D.: Conditioning probabilistic databases. PVLDB 1(1), 313–325 (2008)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional Random Fields: Probabilistic models for segmenting and labeling sequence data. In: ICML, pp. 282–289 (2001)
Li, J., Saha, B., Deshpande, A.: A unified approach to ranking in probabilistic databases. PVLDB 2(1), 502–513 (2009)
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed GraphLab: A framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012)
McCallum, A., Schultz, K., Singh, S.: FactorIE: Probabilistic programming via imperatively defined factor graphs. In: NIPS, pp. 1249–1257 (2009)
Milch, B., Zettlemoyer, L.S., Kersting, K., Haimes, M., Kaelbling, L.P.: Lifted probabilistic inference with counting formulas. In: AAAI, pp. 1062–1068 (2008)
Mutsuzaki, M., Theobald, M., de Keijzer, A., Widom, J., Agrawal, P., Benjelloun, O., Sarma, A.D., Murthy, R., Sugihara, T.: Trio-One: Layering uncertainty and lineage on a conventional DBMS. In: CIDR, pp. 269–274 (2007)
Nakashole, N., Theobald, M., Weikum, G.: Scalable knowledge harvesting with high precision and high recall. In: WSDM, pp. 227–236 (2011)
Nakashole, N., Weikum, G., Suchanek, F.M.: Discovering and exploring relations on the Web. PVLDB 5(12), 1982–1985 (2012)
Niu, F., Ré, C., Doan, A., Shavlik, J.W.: Tuffy: Scaling up statistical inference in Markov Logic Networks using an RDBMS. PVLDB 4(6), 373–384 (2011)
Olteanu, D., Huang, J.: Using OBDDs for efficient query evaluation on probabilistic databases. In: Greco, S., Lukasiewicz, T. (eds.) SUM 2008. LNCS (LNAI), vol. 5291, pp. 326–340. Springer, Heidelberg (2008)
Olteanu, D., Wen, H.: Ranking query answers in probabilistic databases: Complexity and efficient algorithms. In: ICDE, pp. 282–293 (2012)
Pfeffer, A.: IBAL: A probabilistic rational programming language. In: IJCAI, pp. 733–740 (2001)
Poole, D.: The independent choice logic for modelling multiple agents under uncertainty. Artificial Intelligence 94(1-2), 7–56 (1997)
Poole, D.: First-order probabilistic inference. In: IJCAI, pp. 985–991 (2003)
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 257–286 (1989)
De Raedt, L., Frasconi, P., Kersting, K., Muggleton, S.H. (eds.): Probabilistic Inductive Logic Programming. LNCS (LNAI), vol. 4911. Springer, Heidelberg (2008)
Ré, C., Dalvi, N.N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE, pp. 886–895 (2007)
Renkens, J., Van den Broeck, G., Nijssen, S.: k-optimal: A novel approximate inference algorithm for ProbLog. Machine Learning 89(3), 215–231 (2012)
Riedel, S.: Improving the accuracy and efficiency of MAP inference for Markov Logic. In: UAI, pp. 468–475 (2008)
Roth, D.: On the hardness of approximate reasoning. Artif. Intell. 82, 273–302 (1996)
Sarma, A.D., Theobald, M., Widom, J.: Exploiting lineage for confidence computation in uncertain and probabilistic databases. In: ICDE, pp. 1023–1032 (2008)
Sato, T.: A statistical learning method for logic programs with distribution semantics. In: ICLP, pp. 715–729 (1995)
Sen, P., Deshpande, A., Getoor, L.: PrDB: managing and exploiting rich correlations in probabilistic databases. VLDB J. 18(5), 1065–1090 (2009)
Sen, P., Deshpande, A., Getoor, L.: Read-once functions and query evaluation in probabilistic databases. PVLDB 3(1), 1068–1079 (2010)
Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers (2011)
Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. In: VLDB, pp. 648–659 (2004)
Van den Broeck, G.: Lifted Inference and Learning in Statistical Relational Models. PhD thesis, Informatics Section, Department of Computer Science, Faculty of Engineering Science, Katholieke Universiteit Leuven (January 2013)
Van den Broeck, G., Taghipour, N., Meert, W., Davis, J., De Raedt, L.: Lifted probabilistic inference by first-order knowledge compilation. In: IJCAI, pp. 2178–2185 (2011)
Wang, D.Z., Franklin, M.J., Garofalakis, M.N., Hellerstein, J.M.: Querying probabilistic information extraction. PVLDB 3(1), 1057–1067 (2010)
Wang, D.Z., Michelakis, E., Franklin, M.J., Garofalakis, M.N., Hellerstein, J.M.: Probabilistic declarative information extraction. In: ICDE, pp. 173–176 (2010)
Weikum, G., Theobald, M.: From information to knowledge: harvesting entities and relationships from Web sources. In: PODS, pp. 65–76 (2010)
Widom, J.: Trio: A system for data, uncertainty, and lineage. In: Managing and Mining Uncertain Data. Springer (2008)
Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for Web scale RDF data. In: SIGMOD (to appear, 2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Theobald, M., De Raedt, L., Dylla, M., Kimmig, A., Miliaraki, I. (2013). 10 Years of Probabilistic Querying – What Next?. In: Catania, B., Guerrini, G., Pokorný, J. (eds) Advances in Databases and Information Systems. ADBIS 2013. Lecture Notes in Computer Science, vol 8133. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40683-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-40683-6_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40682-9
Online ISBN: 978-3-642-40683-6
eBook Packages: Computer ScienceComputer Science (R0)