Skip to main content
Log in

Efficient query evaluation on probabilistic databases

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

We describe a framework for supporting arbitrarily complex SQL queries with “uncertain” predicates. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is query evaluation. We describe an optimization algorithm that can compute efficiently most queries. We show, however, that the data complexity of some queries is #P-complete, which implies that these queries do not admit any efficient evaluation methods. For these queries we describe both an approximation algorithm and a Monte-Carlo simulation algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal, S., Chaudhuri, S., Das, G., Gionis, A.: Automated ranking of database query results. CIDR (2003)

  2. Bacchus, F., Grove, A.J., Halpern, J.Y., Koller, D.: From statistical knowledge bases to degrees of belief. Artif. Intell. 87(1/2), 75–143 (1996)

    Article  Google Scholar 

  3. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley (1999)

  4. Barbará, D., Garcia-Molina, H., Porter, D.: The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4(5), 487–502 (1992)

    Article  Google Scholar 

  5. Cavallo, R., Pittarelli, M.: The theory of probabilistic databases. VLDB 71–81 (1987)

  6. Chaudhuri, S., Das, G., Narasayya, V.: Dbexplorer: A system for keyword search over relational databases. In: Proceedings of the 18th International Conference on Data Engineering. San Jose, USA (2002)

  7. Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. SIGMOD 551–562 (2003)

  8. Dey, D., Sarkar, S.: A probabilistic relational model and algebra. ACM Trans. Database Syst. 21(3), 339–369 (1996)

    Article  Google Scholar 

  9. Eiter, T., Lu, J.J., Lukasiewicz, T., Subrahmanian, V.S.: Probabilistic object bases. ACM Trans. Database Syst. 26(3), 264–312 (2001)

    Article  Google Scholar 

  10. Fagin, R., Halpern, J.Y.: Reasoning about knowledge and probability. In: Theoretical Aspects of Reasoning about Knowledge, pp. 277–293. San Francisco (1988)

  11. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. PODS 102–113 (2001)

  12. Fuhr, N., Rolleke, T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst. 15(1), 32–66 (1997)

    Article  Google Scholar 

  13. Gradel, E., Gurevich, Y., Hirch, C.: The complexity of query reliability. PODS 227–234 (1998)

  14. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: Xrank: Ranked keyword search over xml documents. SIGMOD 16–27 (2003)

  15. Hristidis, V., Papakonstantinou, Y.: Discover: Keyword search in relational databases. In: Proceedings of the 28th Internatinal Conference Very Large Data Bases, VLDB (2002)

  16. Hung, E., Getoor, L., Subrahmanian, V.S.: Probabilistic interval xml. ICDE (2003)

  17. Hung, E., Getoor, L., Subrahmanian, V.S.: Pxml: A probabilistic semistructured data model and algebra. ICDE (2003)

  18. Karp, R., Luby, M.: Monte-carlo algorithms for enumeration and reliability problems. STOC (1983)

  19. Lakshmanan, L.V.S., Leone, N., Ross, R., Subrahmanian, V.S.: Probview: a flexible probabilistic database system. ACM Trans. Database Syst. 22(3), 419–469 (1997)

    Article  Google Scholar 

  20. Motro, A.: Vague: a user interface to relational databases that permits vague queries. ACM Trans. Inf. Syst. 6(3), 187–214 (1988)

    Article  Google Scholar 

  21. Movie database: http://kdd.ics.uci.edu/database-s/movies/movies.html

  22. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)

    Article  Google Scholar 

  23. Ng, R.T., Subrahmanian, V.S.: Probabilistic logic programming. Inf. Comput. 101(2), 150–201 (1992)

    Article  MATH  Google Scholar 

  24. Nierman, A., Jagadish, H.V.: ProTDB: Probabilistic data in XML. VLDB (2002)

  25. Nottelmann, H., Fuhr, N.: Combining DAML+OIL, XSLT and probabilistic logics for uncertain schema mappings in MIND. ECDL (2003)

  26. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., ó San Francisco, CA, USA (1988)

  27. Provan, J.S., Ball, M.O.: The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput. 12(4), 777–788 (1983)

    Article  MATH  Google Scholar 

  28. Ross, R., Subrahmanian, V., Grant, J.: Aggregate operators in probabilistic databases. J. ACM 52(1), 54–101 (2005)

    Article  Google Scholar 

  29. Sadri, F.: Reliability of answers to queries in relational databases. TKDE 3(2), 245–251 (1991)

    Google Scholar 

  30. Sadri, F.: Aggregate operations in the information source tracking method. Theor. Comput. Sci. 133(2), 421–442 (1994)

    Article  MATH  Google Scholar 

  31. Sadri, F.: Information source tracking method: Efficiency issues. TKDE 7(6), 947–954 (1995)

    Google Scholar 

  32. Sadri, F.: Integrity constraints in the information source tracking method. IEEE Transactions on Knowledge and Data Engineering 7(1), 106–119 (1995)

    Article  Google Scholar 

  33. Stoer, M., Wagner, F.: A simple min cut algorithm. Algorithms–ESA ‘94 pp. 141–147 (1994)

  34. Theobald, A., Weikum, G.: The xxl search engine: ranked retrieval of xml data using indexes and ontologies. SIGMOD 615–615 (2002)

  35. Ullman, J.D., Widom, J.: First Course in Database Systems, 2nd ed. Prentice Hall (1997)

  36. Valiant, L.: The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 410–421 (1979)

    Article  MATH  Google Scholar 

  37. Wordnet 2.0: A lexical database for the english language: http://www.cogsci.princeton.edu/wn/ (2003)

  38. Zimanyi, E.: Query evaluation in probabilistic databases. Theor. Comput. Sci. 171(1/2), 179–219 (1997)

    Article  MATH  Google Scholar 

  39. Zobel, J., Dart, P.W.: Phonetic string matching: Lessons from information retrieval. In: Research and Development in Information Retrieval, pp. 166–172 (1996)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nilesh Dalvi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dalvi, N., Suciu, D. Efficient query evaluation on probabilistic databases. The VLDB Journal 16, 523–544 (2007). https://doi.org/10.1007/s00778-006-0004-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-006-0004-3

Keywords

Navigation