Skip to main content

10 Years of Probabilistic Querying – What Next?

  • Conference paper
Advances in Databases and Information Systems (ADBIS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8133))

Abstract

Over the past decade, the two research areas of probabilistic databases and probabilistic programming have intensively studied the problem of making structured probabilistic inference scalable, but—so far—both areas developed almost independently of one another. While probabilistic databases have focused on describing tractable query classes based on the structure of query plans and data lineage, probabilistic programming has contributed sophisticated inference techniques based on knowledge compilation and lifted (first-order) inference. Both fields have developed their own variants of—both exact and approximate—top-k algorithms for query evaluation, and both investigate query optimization techniques known from SQL, Datalog, and Prolog, which all calls for a more intensive study of the commonalities and integration of the two fields. Moreover, we believe that natural-language processing and information extraction will remain a driving factor and in fact a longstanding challenge for developing expressive representation models which can be combined with structured probabilistic inference—also for the next decades to come.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Abiteboul, S., Kanellakis, P., Grahne, G.: On the representation and querying of sets of possible worlds. Theor. Comput. Sci. 78(1), 159–187 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  2. Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the Web. In: IJCAI, pp. 2670–2676 (2007)

    Google Scholar 

  3. Benjelloun, O., Sarma, A.D., Halevy, A.Y., Theobald, M., Widom, J.: Databases with uncertainty and lineage. VLDB J. 17(2), 243–264 (2008)

    Article  Google Scholar 

  4. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the Web of Data. J. Web Sem. 7(3), 154–165 (2009)

    Article  Google Scholar 

  5. Boulos, J., Dalvi, N.N., Mandhani, B., Mathur, S., Ré, C., Suciu, D.: MYSTIQ: a system for finding more answers by using probabilities. In: SIGMOD, pp. 891–893 (2005)

    Google Scholar 

  6. Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr., E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI (2010)

    Google Scholar 

  7. Cheney, J., Chiticariu, L., Tan, W.-C.: Provenance in databases: Why, how, and where. Found. Trends Databases 1, 379–474 (2009)

    Article  Google Scholar 

  8. Corro, L.D., Gemulla, R.: ClausIE: clause-based open information extraction. In: WWW, pp. 355–366 (2013)

    Google Scholar 

  9. Dalvi, N.N., Suciu, D.: The dichotomy of conjunctive queries on probabilistic structures. In: PODS, pp. 293–302 (2007)

    Google Scholar 

  10. Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB J. 16(4), 523–544 (2007)

    Article  Google Scholar 

  11. Darwiche, A., Marquis, P.: A knowledge compilation map. Journal of Artificial Intelligence Research 17(1), 229–264 (2002)

    MathSciNet  MATH  Google Scholar 

  12. Raedt, L.D., Kimmig, A., Toivonen, H.: ProbLog: A probabilistic Prolog and its application in link discovery. In: IJCAI, pp. 2462–2467 (2007)

    Google Scholar 

  13. de Braz, R.S., Amir, E., Roth, D.: Lifted first-order probabilistic inference. In: Getoor, L., Taskar, B. (eds.) Introduction to Statistical Relational Learning. MIT Press (2007)

    Google Scholar 

  14. Van den Broeck, G.: On the completeness of first-order knowledge compilation for lifted probabilistic inference. In: NIPS, pp. 1386–1394 (2011)

    Google Scholar 

  15. Van den Broeck, G., Taghipour, N., Meert, W., Davis, J., De Raedt, L.: Lifted probabilistic inference by first-order knowledge compilation. In: IJCAI, pp. 2178–2185 (2011)

    Google Scholar 

  16. Dietterich, T.G., Domingos, P., Getoor, L., Muggleton, S., Tadepalli, P.: Structured machine learning: the next ten years. Machine Learning 73(1), 3–23 (2008)

    Article  Google Scholar 

  17. Domingos, P., Lowd, D.: Markov Logic: An Interface Layer for Artificial Intelligence. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers (2009)

    Google Scholar 

  18. Dylla, M., Miliaraki, I., Theobald, M.: Top-k query processing in probabilistic databases with non-materialized views. In: ICDE, pp. 122–133 (2013)

    Google Scholar 

  19. Etzioni, O., Banko, M., Cafarella, M.J.: Machine reading. In: AAAI Spring Symposium: Machine Reading, pp. 1–5 (2007)

    Google Scholar 

  20. Etzioni, O., Fader, A., Christensen, J., Soderland, S.: Mausam: Open information extraction: The second generation. In: IJCAI, pp. 3–10 (2011)

    Google Scholar 

  21. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  22. Fierens, D., Van den Broeck, G., Thon, I., Gutmann, B., De Raedt, L.: Inference in probabilistic logic programs using weighted CNF’s. In: UAI, pp. 211–220 (2011)

    Google Scholar 

  23. Fink, R., Han, L., Olteanu, D.: Aggregation in probabilistic databases via knowledge compilation. PVLDB 5(5), 490–501 (2012)

    Google Scholar 

  24. Fink, R., Olteanu, D.: On the optimal approximation of queries using tractable propositional languages. In: ICDT, pp. 174–185 (2011)

    Google Scholar 

  25. Fink, R., Olteanu, D., Rath, S.: Providing support for full relational algebra in probabilistic databases. In: ICDE, pp. 315–326 (2011)

    Google Scholar 

  26. Galárraga, L.A., Teflioudi, C., Hose, K., Suchanek, F.M.: AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In: WWW, pp. 413–422 (2013)

    Google Scholar 

  27. Getoor, L., Taskar, B.: An Introduction to Statistical Relational Learning. MIT Press (2007)

    Google Scholar 

  28. Goodman, N.D., Mansinghka, V.K., Roy, D.M., Bonawitz, K., Tenenbaum, J.B.: Church: A language for generative models. In: UAI, pp. 220–229 (2008)

    Google Scholar 

  29. Guptaand, R., Sarawagi, S.: Creating probabilistic databases from information extraction models. In: VLDB, pp. 965–976 (2006)

    Google Scholar 

  30. Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194, 28–61 (2013)

    Article  MathSciNet  Google Scholar 

  31. Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. PVLDB 4(11), 1123–1134 (2011)

    Google Scholar 

  32. Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40, 11:1–11:58 (2008)

    Google Scholar 

  33. Jha, A.K., Gogate, V., Meliou, A., Suciu, D.: Lifted inference seen from the other side: The tractable features. In: NIPS, pp. 973–981 (2010)

    Google Scholar 

  34. Jha, A.K., Suciu, D.: Knowledge compilation meets database theory: compiling queries to decision diagrams. In: ICDT, pp. 162–173 (2011)

    Google Scholar 

  35. Jha, A.K., Suciu, D.: On the tractability of query compilation and bounded treewidth. In: ICDT, pp. 249–261 (2012)

    Google Scholar 

  36. Jha, A.K., Suciu, D.: Probabilistic databases with MarkoViews. PVLDB 5(11), 1160–1171 (2012)

    Google Scholar 

  37. Kanagal, B., Deshpande, A.: Lineage processing over correlated probabilistic databases. In: SIGMOD, pp. 675–686 (2010)

    Google Scholar 

  38. Kersting, K.: Lifted probabilistic inference. In: ECAI, pp. 33–38 (2012)

    Google Scholar 

  39. Kimmig, A., Demoen, B., De Raedt, L., Costa, V.S., Rocha, R.: On the implementation of the probabilistic logic programming language ProbLog. Theory and Practice of Logic Programming 11, 235–262 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  40. Koch, C., Olteanu, D.: Conditioning probabilistic databases. PVLDB 1(1), 313–325 (2008)

    Google Scholar 

  41. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional Random Fields: Probabilistic models for segmenting and labeling sequence data. In: ICML, pp. 282–289 (2001)

    Google Scholar 

  42. Li, J., Saha, B., Deshpande, A.: A unified approach to ranking in probabilistic databases. PVLDB 2(1), 502–513 (2009)

    Google Scholar 

  43. Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed GraphLab: A framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012)

    Google Scholar 

  44. McCallum, A., Schultz, K., Singh, S.: FactorIE: Probabilistic programming via imperatively defined factor graphs. In: NIPS, pp. 1249–1257 (2009)

    Google Scholar 

  45. Milch, B., Zettlemoyer, L.S., Kersting, K., Haimes, M., Kaelbling, L.P.: Lifted probabilistic inference with counting formulas. In: AAAI, pp. 1062–1068 (2008)

    Google Scholar 

  46. Mutsuzaki, M., Theobald, M., de Keijzer, A., Widom, J., Agrawal, P., Benjelloun, O., Sarma, A.D., Murthy, R., Sugihara, T.: Trio-One: Layering uncertainty and lineage on a conventional DBMS. In: CIDR, pp. 269–274 (2007)

    Google Scholar 

  47. Nakashole, N., Theobald, M., Weikum, G.: Scalable knowledge harvesting with high precision and high recall. In: WSDM, pp. 227–236 (2011)

    Google Scholar 

  48. Nakashole, N., Weikum, G., Suchanek, F.M.: Discovering and exploring relations on the Web. PVLDB 5(12), 1982–1985 (2012)

    Google Scholar 

  49. Niu, F., Ré, C., Doan, A., Shavlik, J.W.: Tuffy: Scaling up statistical inference in Markov Logic Networks using an RDBMS. PVLDB 4(6), 373–384 (2011)

    Google Scholar 

  50. Olteanu, D., Huang, J.: Using OBDDs for efficient query evaluation on probabilistic databases. In: Greco, S., Lukasiewicz, T. (eds.) SUM 2008. LNCS (LNAI), vol. 5291, pp. 326–340. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  51. Olteanu, D., Wen, H.: Ranking query answers in probabilistic databases: Complexity and efficient algorithms. In: ICDE, pp. 282–293 (2012)

    Google Scholar 

  52. Pfeffer, A.: IBAL: A probabilistic rational programming language. In: IJCAI, pp. 733–740 (2001)

    Google Scholar 

  53. Poole, D.: The independent choice logic for modelling multiple agents under uncertainty. Artificial Intelligence 94(1-2), 7–56 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  54. Poole, D.: First-order probabilistic inference. In: IJCAI, pp. 985–991 (2003)

    Google Scholar 

  55. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 257–286 (1989)

    Google Scholar 

  56. De Raedt, L., Frasconi, P., Kersting, K., Muggleton, S.H. (eds.): Probabilistic Inductive Logic Programming. LNCS (LNAI), vol. 4911. Springer, Heidelberg (2008)

    MATH  Google Scholar 

  57. Ré, C., Dalvi, N.N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE, pp. 886–895 (2007)

    Google Scholar 

  58. Renkens, J., Van den Broeck, G., Nijssen, S.: k-optimal: A novel approximate inference algorithm for ProbLog. Machine Learning 89(3), 215–231 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  59. Riedel, S.: Improving the accuracy and efficiency of MAP inference for Markov Logic. In: UAI, pp. 468–475 (2008)

    Google Scholar 

  60. Roth, D.: On the hardness of approximate reasoning. Artif. Intell. 82, 273–302 (1996)

    Article  Google Scholar 

  61. Sarma, A.D., Theobald, M., Widom, J.: Exploiting lineage for confidence computation in uncertain and probabilistic databases. In: ICDE, pp. 1023–1032 (2008)

    Google Scholar 

  62. Sato, T.: A statistical learning method for logic programs with distribution semantics. In: ICLP, pp. 715–729 (1995)

    Google Scholar 

  63. Sen, P., Deshpande, A., Getoor, L.: PrDB: managing and exploiting rich correlations in probabilistic databases. VLDB J. 18(5), 1065–1090 (2009)

    Article  Google Scholar 

  64. Sen, P., Deshpande, A., Getoor, L.: Read-once functions and query evaluation in probabilistic databases. PVLDB 3(1), 1068–1079 (2010)

    Google Scholar 

  65. Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic Databases. Synthesis Lectures on Data Management. Morgan & Claypool Publishers (2011)

    Google Scholar 

  66. Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. In: VLDB, pp. 648–659 (2004)

    Google Scholar 

  67. Van den Broeck, G.: Lifted Inference and Learning in Statistical Relational Models. PhD thesis, Informatics Section, Department of Computer Science, Faculty of Engineering Science, Katholieke Universiteit Leuven (January 2013)

    Google Scholar 

  68. Van den Broeck, G., Taghipour, N., Meert, W., Davis, J., De Raedt, L.: Lifted probabilistic inference by first-order knowledge compilation. In: IJCAI, pp. 2178–2185 (2011)

    Google Scholar 

  69. Wang, D.Z., Franklin, M.J., Garofalakis, M.N., Hellerstein, J.M.: Querying probabilistic information extraction. PVLDB 3(1), 1057–1067 (2010)

    Google Scholar 

  70. Wang, D.Z., Michelakis, E., Franklin, M.J., Garofalakis, M.N., Hellerstein, J.M.: Probabilistic declarative information extraction. In: ICDE, pp. 173–176 (2010)

    Google Scholar 

  71. Weikum, G., Theobald, M.: From information to knowledge: harvesting entities and relationships from Web sources. In: PODS, pp. 65–76 (2010)

    Google Scholar 

  72. Widom, J.: Trio: A system for data, uncertainty, and lineage. In: Managing and Mining Uncertain Data. Springer (2008)

    Google Scholar 

  73. Zeng, K., Yang, J., Wang, H., Shao, B., Wang, Z.: A distributed graph engine for Web scale RDF data. In: SIGMOD (to appear, 2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Theobald, M., De Raedt, L., Dylla, M., Kimmig, A., Miliaraki, I. (2013). 10 Years of Probabilistic Querying – What Next?. In: Catania, B., Guerrini, G., Pokorný, J. (eds) Advances in Databases and Information Systems. ADBIS 2013. Lecture Notes in Computer Science, vol 8133. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40683-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40683-6_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40682-9

  • Online ISBN: 978-3-642-40683-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics