Efficient query evaluation on probabilistic databases

Dalvi, Nilesh; Suciu, Dan

doi:10.1007/s00778-006-0004-3

Efficient query evaluation on probabilistic databases

Regular Paper
Published: 10 June 2006

Volume 16, pages 523–544, (2007)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Nilesh Dalvi¹ &
Dan Suciu¹

923 Accesses
345 Citations
3 Altmetric
Explore all metrics

Abstract

We describe a framework for supporting arbitrarily complex SQL queries with “uncertain” predicates. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is query evaluation. We describe an optimization algorithm that can compute efficiently most queries. We show, however, that the data complexity of some queries is #P-complete, which implies that these queries do not admit any efficient evaluation methods. For these queries we describe both an approximation algorithm and a Monte-Carlo simulation algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal, S., Chaudhuri, S., Das, G., Gionis, A.: Automated ranking of database query results. CIDR (2003)
Bacchus, F., Grove, A.J., Halpern, J.Y., Koller, D.: From statistical knowledge bases to degrees of belief. Artif. Intell. 87(1/2), 75–143 (1996)
Article Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley (1999)
Barbará, D., Garcia-Molina, H., Porter, D.: The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4(5), 487–502 (1992)
Article Google Scholar
Cavallo, R., Pittarelli, M.: The theory of probabilistic databases. VLDB 71–81 (1987)
Chaudhuri, S., Das, G., Narasayya, V.: Dbexplorer: A system for keyword search over relational databases. In: Proceedings of the 18th International Conference on Data Engineering. San Jose, USA (2002)
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. SIGMOD 551–562 (2003)
Dey, D., Sarkar, S.: A probabilistic relational model and algebra. ACM Trans. Database Syst. 21(3), 339–369 (1996)
Article Google Scholar
Eiter, T., Lu, J.J., Lukasiewicz, T., Subrahmanian, V.S.: Probabilistic object bases. ACM Trans. Database Syst. 26(3), 264–312 (2001)
Article Google Scholar
Fagin, R., Halpern, J.Y.: Reasoning about knowledge and probability. In: Theoretical Aspects of Reasoning about Knowledge, pp. 277–293. San Francisco (1988)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. PODS 102–113 (2001)
Fuhr, N., Rolleke, T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst. 15(1), 32–66 (1997)
Article Google Scholar
Gradel, E., Gurevich, Y., Hirch, C.: The complexity of query reliability. PODS 227–234 (1998)
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: Xrank: Ranked keyword search over xml documents. SIGMOD 16–27 (2003)
Hristidis, V., Papakonstantinou, Y.: Discover: Keyword search in relational databases. In: Proceedings of the 28th Internatinal Conference Very Large Data Bases, VLDB (2002)
Hung, E., Getoor, L., Subrahmanian, V.S.: Probabilistic interval xml. ICDE (2003)
Hung, E., Getoor, L., Subrahmanian, V.S.: Pxml: A probabilistic semistructured data model and algebra. ICDE (2003)
Karp, R., Luby, M.: Monte-carlo algorithms for enumeration and reliability problems. STOC (1983)
Lakshmanan, L.V.S., Leone, N., Ross, R., Subrahmanian, V.S.: Probview: a flexible probabilistic database system. ACM Trans. Database Syst. 22(3), 419–469 (1997)
Article Google Scholar
Motro, A.: Vague: a user interface to relational databases that permits vague queries. ACM Trans. Inf. Syst. 6(3), 187–214 (1988)
Article Google Scholar
Movie database: http://kdd.ics.uci.edu/database-s/movies/movies.html
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)
Article Google Scholar
Ng, R.T., Subrahmanian, V.S.: Probabilistic logic programming. Inf. Comput. 101(2), 150–201 (1992)
Article MATH Google Scholar
Nierman, A., Jagadish, H.V.: ProTDB: Probabilistic data in XML. VLDB (2002)
Nottelmann, H., Fuhr, N.: Combining DAML+OIL, XSLT and probabilistic logics for uncertain schema mappings in MIND. ECDL (2003)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., ó San Francisco, CA, USA (1988)
Provan, J.S., Ball, M.O.: The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput. 12(4), 777–788 (1983)
Article MATH Google Scholar
Ross, R., Subrahmanian, V., Grant, J.: Aggregate operators in probabilistic databases. J. ACM 52(1), 54–101 (2005)
Article Google Scholar
Sadri, F.: Reliability of answers to queries in relational databases. TKDE 3(2), 245–251 (1991)
Google Scholar
Sadri, F.: Aggregate operations in the information source tracking method. Theor. Comput. Sci. 133(2), 421–442 (1994)
Article MATH Google Scholar
Sadri, F.: Information source tracking method: Efficiency issues. TKDE 7(6), 947–954 (1995)
Google Scholar
Sadri, F.: Integrity constraints in the information source tracking method. IEEE Transactions on Knowledge and Data Engineering 7(1), 106–119 (1995)
Article Google Scholar
Stoer, M., Wagner, F.: A simple min cut algorithm. Algorithms–ESA ‘94 pp. 141–147 (1994)
Theobald, A., Weikum, G.: The xxl search engine: ranked retrieval of xml data using indexes and ontologies. SIGMOD 615–615 (2002)
Ullman, J.D., Widom, J.: First Course in Database Systems, 2nd ed. Prentice Hall (1997)
Valiant, L.: The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 410–421 (1979)
Article MATH Google Scholar
Wordnet 2.0: A lexical database for the english language: http://www.cogsci.princeton.edu/wn/ (2003)
Zimanyi, E.: Query evaluation in probabilistic databases. Theor. Comput. Sci. 171(1/2), 179–219 (1997)
Article MATH Google Scholar
Zobel, J., Dart, P.W.: Phonetic string matching: Lessons from information retrieval. In: Research and Development in Information Retrieval, pp. 166–172 (1996)

Download references

Author information

Authors and Affiliations

University of Washington, Seattle, WA, USA
Nilesh Dalvi & Dan Suciu

Authors

Nilesh Dalvi
View author publications
You can also search for this author in PubMed Google Scholar
Dan Suciu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nilesh Dalvi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dalvi, N., Suciu, D. Efficient query evaluation on probabilistic databases. The VLDB Journal 16, 523–544 (2007). https://doi.org/10.1007/s00778-006-0004-3

Download citation

Received: 29 November 2002
Revised: 11 July 2005
Accepted: 25 October 2005
Published: 10 June 2006
Issue Date: October 2007
DOI: https://doi.org/10.1007/s00778-006-0004-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient query evaluation on probabilistic databases

Abstract

Access this article

Similar content being viewed by others

Challenges for Efficient Query Evaluation on Structured Probabilistic Data

Dissociation and propagation for approximate lifted inference with standard relational database management systems

10 Years of Probabilistic Querying – What Next?

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient query evaluation on probabilistic databases

Abstract

Access this article

Similar content being viewed by others

Challenges for Efficient Query Evaluation on Structured Probabilistic Data

Dissociation and propagation for approximate lifted inference with standard relational database management systems

10 Years of Probabilistic Querying – What Next?

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation