Abstract
We describe a framework for supporting arbitrarily complex SQL queries with “uncertain” predicates. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is query evaluation. We describe an optimization algorithm that can compute efficiently most queries. We show, however, that the data complexity of some queries is #P-complete, which implies that these queries do not admit any efficient evaluation methods. For these queries we describe both an approximation algorithm and a Monte-Carlo simulation algorithm.
Similar content being viewed by others
References
Agrawal, S., Chaudhuri, S., Das, G., Gionis, A.: Automated ranking of database query results. CIDR (2003)
Bacchus, F., Grove, A.J., Halpern, J.Y., Koller, D.: From statistical knowledge bases to degrees of belief. Artif. Intell. 87(1/2), 75–143 (1996)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley (1999)
Barbará, D., Garcia-Molina, H., Porter, D.: The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4(5), 487–502 (1992)
Cavallo, R., Pittarelli, M.: The theory of probabilistic databases. VLDB 71–81 (1987)
Chaudhuri, S., Das, G., Narasayya, V.: Dbexplorer: A system for keyword search over relational databases. In: Proceedings of the 18th International Conference on Data Engineering. San Jose, USA (2002)
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. SIGMOD 551–562 (2003)
Dey, D., Sarkar, S.: A probabilistic relational model and algebra. ACM Trans. Database Syst. 21(3), 339–369 (1996)
Eiter, T., Lu, J.J., Lukasiewicz, T., Subrahmanian, V.S.: Probabilistic object bases. ACM Trans. Database Syst. 26(3), 264–312 (2001)
Fagin, R., Halpern, J.Y.: Reasoning about knowledge and probability. In: Theoretical Aspects of Reasoning about Knowledge, pp. 277–293. San Francisco (1988)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. PODS 102–113 (2001)
Fuhr, N., Rolleke, T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst. 15(1), 32–66 (1997)
Gradel, E., Gurevich, Y., Hirch, C.: The complexity of query reliability. PODS 227–234 (1998)
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: Xrank: Ranked keyword search over xml documents. SIGMOD 16–27 (2003)
Hristidis, V., Papakonstantinou, Y.: Discover: Keyword search in relational databases. In: Proceedings of the 28th Internatinal Conference Very Large Data Bases, VLDB (2002)
Hung, E., Getoor, L., Subrahmanian, V.S.: Probabilistic interval xml. ICDE (2003)
Hung, E., Getoor, L., Subrahmanian, V.S.: Pxml: A probabilistic semistructured data model and algebra. ICDE (2003)
Karp, R., Luby, M.: Monte-carlo algorithms for enumeration and reliability problems. STOC (1983)
Lakshmanan, L.V.S., Leone, N., Ross, R., Subrahmanian, V.S.: Probview: a flexible probabilistic database system. ACM Trans. Database Syst. 22(3), 419–469 (1997)
Motro, A.: Vague: a user interface to relational databases that permits vague queries. ACM Trans. Inf. Syst. 6(3), 187–214 (1988)
Movie database: http://kdd.ics.uci.edu/database-s/movies/movies.html
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)
Ng, R.T., Subrahmanian, V.S.: Probabilistic logic programming. Inf. Comput. 101(2), 150–201 (1992)
Nierman, A., Jagadish, H.V.: ProTDB: Probabilistic data in XML. VLDB (2002)
Nottelmann, H., Fuhr, N.: Combining DAML+OIL, XSLT and probabilistic logics for uncertain schema mappings in MIND. ECDL (2003)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., ó San Francisco, CA, USA (1988)
Provan, J.S., Ball, M.O.: The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput. 12(4), 777–788 (1983)
Ross, R., Subrahmanian, V., Grant, J.: Aggregate operators in probabilistic databases. J. ACM 52(1), 54–101 (2005)
Sadri, F.: Reliability of answers to queries in relational databases. TKDE 3(2), 245–251 (1991)
Sadri, F.: Aggregate operations in the information source tracking method. Theor. Comput. Sci. 133(2), 421–442 (1994)
Sadri, F.: Information source tracking method: Efficiency issues. TKDE 7(6), 947–954 (1995)
Sadri, F.: Integrity constraints in the information source tracking method. IEEE Transactions on Knowledge and Data Engineering 7(1), 106–119 (1995)
Stoer, M., Wagner, F.: A simple min cut algorithm. Algorithms–ESA ‘94 pp. 141–147 (1994)
Theobald, A., Weikum, G.: The xxl search engine: ranked retrieval of xml data using indexes and ontologies. SIGMOD 615–615 (2002)
Ullman, J.D., Widom, J.: First Course in Database Systems, 2nd ed. Prentice Hall (1997)
Valiant, L.: The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 410–421 (1979)
Wordnet 2.0: A lexical database for the english language: http://www.cogsci.princeton.edu/wn/ (2003)
Zimanyi, E.: Query evaluation in probabilistic databases. Theor. Comput. Sci. 171(1/2), 179–219 (1997)
Zobel, J., Dart, P.W.: Phonetic string matching: Lessons from information retrieval. In: Research and Development in Information Retrieval, pp. 166–172 (1996)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dalvi, N., Suciu, D. Efficient query evaluation on probabilistic databases. The VLDB Journal 16, 523–544 (2007). https://doi.org/10.1007/s00778-006-0004-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-006-0004-3