Skip to main content
Log in

Representing uncertain data: models, properties, and algorithms

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

In general terms, an uncertain relation encodes a set of possible certain relations. There are many ways to represent uncertainty, ranging from alternative values for attributes to rich constraint languages. Among the possible models for uncertain data, there is a tension between simple and intuitive models, which tend to be incomplete, and complete models, which tend to be nonintuitive and more complex than necessary for many applications. We present a space of models for representing uncertain data based on a variety of uncertainty constructs and tuple-existence constraints. We explore a number of properties and results for these models. We study completeness of the models, as well as closure under relational operations, and we give results relating closure and completeness. We then examine whether different models guarantee unique representations of uncertain data, and for those models that do not, we provide complexity results and algorithms for testing equivalence of representations. The next problem we consider is that of minimizing the size of representation of models, showing that minimizing the number of tuples also minimizes the size of constraints. We show that minimization is intractable in general and study the more restricted problem of maintaining minimality incrementally when performing operations. Finally, we present several results on the problem of approximating uncertain data in an insufficiently expressive model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Christmas Bird Count Homepage. http://www.audobon.org/bird/cbc/

  2. Abiteboul S., Hull R., Vianu V.: Foundations of Databases. Addison-Wesley, Reading (1995)

    MATH  Google Scholar 

  3. Abiteboul, S., Kanellakis, P., Grahne, G.: On the representation and querying of sets of possible Worlds. Theor. Comput. Sci. 78(1) (1991)

  4. Agarwal, S., Keller, A.M., Wiederhold, G., Saraswat, K.: Flexible relation: an approach for integrating data from multiple, possibly inconsistent databases. In: Proceedings of ICDE (1995)

  5. Agrawal, S., Chaudhuri, S., Das, G., Gionis, A.: Automated ranking of database query results. In: Proceedings of CIDR (2003)

  6. Antova, L., Koch, C., Olteanu, D.: MayBMS: managing incomplete information with probabilistic World-set decompositions. In: Proceedings of ICDE (2007)

  7. Antova, L., Koch, C., Olteanu, D.: World-set decompositions: expressiveness and efficient algorithms. In: Proceedings of ICDT (2007)

  8. Arenas, M., Bertossi, L., Chomicki, J.: Answer sets for consistent query answering in inconsistent databases. TPLP 3(4) (2003)

  9. Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in inconsistent databases. In: Proceedings of ACM PODS (1999)

  10. Barbará, D., Garcia-Molina, H, Porter, D.: The management of probabilistic data. TKDE 4(5) (1992)

  11. Barga, R.S., Pu, C.: Accessing imprecise data: an approach based on intervals. IEEE Data Eng. Bull. 16(2) (1993)

  12. Benjelloun, O., Das Sarma, A., Halevy, A., Widom, J.: ULDBs: databases with uncertainty and lineage. In: Proceedings of VLDB (2006)

  13. Boulos, J., Dalvi, N., Mandhani, B., Mathur, S., Re C., Suciu, D.: MYSTIQ: a system for finding more answers by using probabilities. In: Proceedings of ACM SIGMOD (2005)

  14. Bry, F.: Query answering in information systems with integrity constraints. In: Proceedings of the IFIP TC11 Working Group 11.5, First Working Conference on Integrity and Internal Control in Information Systems (1997)

  15. Buckles, B.P., Petry, F.E.: A fuzzy model for relational databases. Int. J. Fuzzy Sets Syst. 7 (1982)

  16. Burdick D., Deshpande P.M., Jayram T.S., Ramakrishnan R., Vaithyanathan S.: OLAP over uncertain and imprecise data. J. VLDB 16(1), 123–144 (2007)

    Article  Google Scholar 

  17. Cali, A., Lembo, D., Rosati, R.: On the decidability and complexity of query answering over inconsistent and incomplete databases. In: Proceedings of ACM PODS (2003)

  18. Cavallo, R., Pittarelli, M.: The theory of probabilistic databases. In: Proceedings of VLDB (1987)

  19. Cheng, R., Singh, S., Prabhakar, S.: U-DBMS: a database system for managing constantly-evolving data. In: Proceedings of VLDB (2005)

  20. Chomicki, J., Marcinkowski, J.: Minimal-change integrity maintenance using tuple deletions

  21. Codd, E.F.: Extending the database relational model to capture more meaning. ACM Trans. Database Syst. 4(4) (1979)

  22. Dalvi, N., Miklau, G., Suciu, D.: Asymptotic conditional probabilities for conjunctive queries. In: Proceedings of ICDT (2005)

  23. Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: Proceedings of VLDB (2004)

  24. Dalvi, N., Suciu, D.: Answering queries from statistics and probabilistic views. In: Proceedings of VLDB (2005)

  25. Das Sarma, A., Benjelloun, O., Halevy, A., Widom, J.: Working models for uncertain data. In: Proceedings of ICDE (2006)

  26. Das Sarma, A., Nabar, S., Widom, J.: Representing uncertain data: uniqueness, equivalence, minimization, and approximation. Technical report, Stanford InfoLab (2005). http://dbpubs.stanford.edu/pub/2005-38

  27. DeMichiel, L.G.: Resolving database incompatibility: an approach to performing relational operations over mismatched domains. IEEE Trans. Knowl. Data Eng. 1(4) (1989)

  28. Dung, P.M.: Integrating data from possibly inconsistent databases. In: COOPIS ’96: Proceedings of the First IFCIS International Conference on Cooperative Information Systems (1996)

  29. Friedman, N., Getoor, L., Koller, D., Pfeffer, A.: Learning probabilistic relational models. In: Proceedings of IJCAI (1999)

  30. Fuhr, N.: A probabilistic framework for vague queries and imprecise information in databases. In: Proceedings of VLDB (1990)

  31. Fuhr, N., Rölleke, T.: A probabilistic NF2 relational algebra for imprecision in databases. Unpublished Manuscript (1997)

  32. Fuhr, N., Rölleke T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM TOIS 14(1) (1997)

  33. Garey M.R., Johnson D.S.: Computers and Intractability. W.H. Freeman, San Francisco (1979)

    MATH  Google Scholar 

  34. Grahne, G.: Dependency satisfaction in databases with incomplete information. In: Proceedings of VLDB (1984)

  35. Grahne, G.: Horn tables—an efficient tool for handling incom- plete information in databases. In: Proceedings of ACM PODS (1989)

  36. Greco, G., Greco, S., Zumpano, E.: A logical framework for querying and repairing inconsistent databases. IEEE Trans. Knowl. Data Eng. 15(6)

  37. Green, T.J., Tannen, V.: Models for incomplete and probabilistic information. In: Proceedings of IIDB Workshop (2006)

  38. Imielinski, T., Lipski, W.: Incomplete information in relational databases. J. ACM 31(4) (1984)

  39. Imielinski, T., Naqvi, S., Vadaparty, K.: Incomplete objects—data model for design and planning applications. In: Proceedings of ACM SIGMOD (1991)

  40. Jampani, R., Perez, L., Wu, M., Xu, F., Jermaine C., Haas, P.J.: Mcdb: A monte carlo approach to managing uncertain data. In: Proceedings of ACM SIGMOD (2008)

  41. Karnaugh, M.: The map method for synthesis of combinational logic circuits. Trans. AIEE. pt I (1953)

  42. Kautz, H., Selman, B.: Knowledge compilation and theory approximation. J. ACM (1996)

  43. Lakshmanan L.V.S., Leone N., Ross R., Subrahmanian V.S.: ProbView: a flexible probabilistic database system. ACM TODS bf 22(3) (1997)

  44. Lee, S.K.: An extended relational database model for uncertain and imprecise information. In: Proceedings of VLDB (1992)

  45. Libkin, L., Wong, L.: Semantic representations and query languages for or-sets. In: Proceedings of ACM PODS (1993)

  46. Liu, K., Sunderraman, R.: Indefinite and maybe information in relational databases. ACM TODS (1990)

  47. McCluskey, E.J.: Minimization of boolean functions. Bell Syst. Tech. J. (1956)

  48. Motro, A.: Management of uncertainty in database systems. Modern database systems: the object model, interoperability, and beyond (1994)

  49. Paschos, V.Th.: Polynomial approximation and graph-coloring. Computing 70(1) (2003)

  50. Pearl J.: Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, Menlo Park (1988)

    Google Scholar 

  51. Purdy, W.: A logic for natural language. J. Formal Logic 32(1) (1991)

  52. Quine, W.: The problem of simplifying truth functions. Am. Math. Monthly 59(1) (1952)

  53. Re, C., Suciu, D.: Materialized views in probabilistic databases for information exchange and query optimization. In: Proceedings of VLDB (2007)

  54. Sanghai, S., Domingos, P., Weld, D.: Dynamic probabilistic relational models. In: Proceedings of IJCAI (2003)

  55. Schmidt R.A.: Relational grammars for knowledge representation. In: Böttner, M., Thümmel, W. (eds) Variable-Free Semantics. Artikulation und Sprache, vol. 3, pp. 162–180. Secolo Verlag, Osnabrück (2000)

    Google Scholar 

  56. Sen, P., Deshpande, A.: Representing and querying correlated tuples in probabilistic databases. In: Proceedings of ICDE (2007)

  57. Theobald, A. Weikum, G.: The XXL search engine: ranked retrieval of XML data using indexes and ontologies. In: Proceedings of ACM SIGMOD (2002)

  58. Vardi, M.Y.: Querying logical databases. In: Proceedings of ACM PODS (1985)

  59. Wang, D.Z., Michelakis, E., Garofalakis, M., Hellerstein, J.M.: Bayesstore: managing large, uncertain data repositories with probabilistic graphical models. In: Proceedings of VLDB (2008)

  60. Widom, J.: Trio: a system for integrated management of data, accuracy, and lineage. In: Proceedings of CIDR (2005)

  61. Wijsen, J.: Condensed representation of database repairs for consistent query answering. In: Proceedings of ICDT (2003)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anish Das Sarma.

Additional information

This work was supported by the National Science Foundation under grants IIS-0324431 and IIS-0414762, by grants from the Boeing and Hewlett-Packard Corporations, by a Microsoft Graduate Fellowship, and by a Stanford Graduate Fellowship from Sequoia Capital.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sarma, A.D., Benjelloun, O., Halevy, A. et al. Representing uncertain data: models, properties, and algorithms. The VLDB Journal 18, 989–1019 (2009). https://doi.org/10.1007/s00778-009-0147-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-009-0147-0

Keywords

Navigation