Skip to main content

Uncertain Data Lineage

  • Reference work entry
  • First Online:
Encyclopedia of Database Systems

Synonyms

Provenance in probabilistic databases

Definition

Lineage, also called Boolean provenance, event expression, or why-provenance, is a form of provenance or origin of the answer(s) to a query executed on a database. Lineage is expressed as a Boolean formula with variables assigned to the tuples in the database, where joint usage of the tuples (by the database join operation) is captured by Boolean conjunction (AND, ∧) and alternative usage (projection or union) by Boolean disjunction (OR, ∨). Uncertain data is typically expressed in the form of a probabilistic database, which is a compact representation of a probability distribution over a set of deterministic database instances (called possible worlds). When an input query is evaluated on such a probabilistic database, instead of a deterministic set of tuples representing the answer, the output is a distribution on possible answers for the possible worlds. The query evaluation problem on uncertain data aims to compute this...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Afrati FN, Vasilakopoulos A. Query containment for databases with uncertainty and lineage. In: Proceedings of the 4th International VLDB Workshop on Management of Uncertain Data; 2010. p. 67–81.

    Google Scholar 

  2. Aggarwal CC. Managing and mining uncertain data. New York: Springer Publishing Company, Incorporated; 2009.

    Book  MATH  Google Scholar 

  3. Akers SB. Binary decision diagrams. IEEE Trans. Comput. 1978;27(6):509–16.

    Article  MATH  Google Scholar 

  4. Amarilli A, Bourhis P, Senellart P. Tractable lineages on treelike instances: limits and extensions. In: Proceedings of the 35th ACM Symposium on Principles of Database Systems; 2016. p. 355–370.

    Google Scholar 

  5. Beame P, Li J, Roy S, Suciu D. Exact model counting of query expressions: limitations of propositional methods. ACM Trans Database Syst. 2017;42(1):1:1–1:46.

    Article  MathSciNet  Google Scholar 

  6. Beame P, Van den Broeck G, Gribkoff E, Suciu D. Symmetric weighted first-order model counting. In: Proceedings of the 34th ACM Symposium on Principles of Database Systems; 2015. p. 313–28.

    Google Scholar 

  7. Benjelloun O, Sarma AD, Hayworth C, Widom J. An introduction to ULDBs and the Trio system. IEEE Data Eng Bull. 2006;29(1):5–16.

    Google Scholar 

  8. Blaustein BT, Seligman L, Morse M, Allen MD, Rosenthal A. PLUS: Synthesizing privacy, lineage, uncertainty and security. In: Proceedings of the Workshops of 24th International Conference on Data Engineering; 2008. p. 242–5.

    Google Scholar 

  9. Bryant RE. Graph-based algorithms for Boolean function manipulation. IEEE Trans Comput 1986;35(8):677–91.

    Article  MATH  Google Scholar 

  10. Buneman P, Khanna S, Tan WC. Why and where: a characterization of data provenance. In: Proceedings of the 8th International Conference on Database Theory; 2001. p. 316–30.

    Chapter  Google Scholar 

  11. Cui Y, Widom J, Wiener JL. Tracing the lineage of view data in a warehousing environment. ACM Trans Database Syst. 2000;25(2):179–227.

    Article  Google Scholar 

  12. Dalvi, N, Suciu, D. Management of probabilistic data: foundations and challenges. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2007. p. 1–12.

    Google Scholar 

  13. Dalvi N, Suciu D. The dichotomy of probabilistic inference for unions of conjunctive queries. J ACM. 2013;59(6):30:1–87.

    Article  MathSciNet  MATH  Google Scholar 

  14. Dalvi NN, Suciu D. Efficient query evaluation on probabilistic databases. In: Proceedings of the 30th International Conference on Very Large Data Bases; 2004. p. 864–75.

    Chapter  Google Scholar 

  15. Fink R, Olteanu D. On the optimal approximation of queries using tractable propositional languages. In: Proceedings of the 14th International Conference on Database Theory; 2011. p. 174–185.

    Google Scholar 

  16. Fink R, Olteanu D. Dichotomies for queries with negation in probabilistic databases. ACM Trans Database Syst. 2016;41(1):4.

    Article  MathSciNet  Google Scholar 

  17. Fink R, Han L, Olteanu D. Aggregation in probabilistic databases via knowledge compilation. Proc VLDB Endow. 2012;5(5):490–501.

    Article  Google Scholar 

  18. Fuhr N, Rölleke T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans Inf Syst 1997;15(1):32–66.

    Article  Google Scholar 

  19. Green TJ. Containment of conjunctive queries on annotated relations. In: Proceedings of the 12th International Conference on Database Theory; 2009. p. 296–309.

    Google Scholar 

  20. Green TJ, Tannen V. Models for incomplete and probabilistic information. IEEE Data Eng Bull. 2006;29(1):17–24.

    Google Scholar 

  21. Green TJ, Karvounarakis G, Tannen V. Provenance semirings. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2007. p. 31–40.

    Google Scholar 

  22. Gurvich VA. Criteria for repetition-freeness of functions in the algebra of logic. Soviet Math Dokl. 1991;43(3):721–6.

    MathSciNet  MATH  Google Scholar 

  23. Huang J, Darwiche A. The language of search. J Artif Intel Res. 2007;29:191–219.

    Article  MathSciNet  MATH  Google Scholar 

  24. Imielinski T, Lipski W Jr. Incomplete information in relational databases. J ACM. 1984;31(4):761–91.

    Article  MathSciNet  MATH  Google Scholar 

  25. Jha AK, Suciu D. Knowledge compilation meets database theory: compiling queries to decision diagrams. In: Proceedings of the 14th International Conference on Database Theory; 2011. p. 162–73.

    Google Scholar 

  26. Kanagal B, Deshpande A. Lineage processing over correlated probabilistic databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2010. p. 675–686.

    Google Scholar 

  27. Karp RM, Luby M. Monte-Carlo algorithms for enumeration and reliability problems. In: Proceedings of the 24th Annual Symposium on Foundations of Computer Science; 1983. p. 56–64.

    Google Scholar 

  28. Khanna S, Roy S, Tannen V. Queries with difference on probabilistic databases. Proc VLDB Endow. 2011;4(11):1051–62.

    Google Scholar 

  29. Masek WJ. A fast algorithm for the string editing problem and decision graph complexity. Master’s thesis, MIT; 1976.

    Google Scholar 

  30. Meiser T, Dylla M, Theobald M. Interactive reasoning in uncertain RDF knowledge bases. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management; 2011. p. 2557–2560.

    Google Scholar 

  31. Newman I. On read-once Boolean functions. In: Paterson MS, editor. Boolean function complexity. Cambridge/New York: Cambridge University Press; 1992. p. 25–34.

    Chapter  MATH  Google Scholar 

  32. Olteanu D, van Schaik SJ. ENFrame: a framework for processing probabilistic data. ACM Trans Database Syst. 2016;41(1):3:1–3:44.

    Article  Google Scholar 

  33. Pearl J. Probabilistic reasoning in intelligent systems: networks of plausible inference. San Francisco: Morgan Kaufmann Publishers Inc; 1988.

    MATH  Google Scholar 

  34. Roy S, Perduca V, Tannen V. Faster query answering in probabilistic databases using read-once functions. In: Proceedings of the 14th International Conference on Database Theory; 2011. p. 232–43.

    Google Scholar 

  35. Sen P, Deshpande A, Getoor L. Read-once functions and query evaluation in probabilistic databases. Proc VLDB Endow. 2010;3(1):1068–79.

    Article  Google Scholar 

  36. Suciu D, Olteanu D, Christopher R, Koch C. Probabilistic databases. 1st ed. San Rafael: Morgan & Claypool Publishers; 2011.

    MATH  Google Scholar 

  37. Valiant LG. The complexity of enumeration and reliability problems. SIAM J Comput. 1979;8(3):410–21.

    Article  MathSciNet  MATH  Google Scholar 

  38. Wegener I. Branching programs and binary decision diagrams: theory and applications. Philadelphia: SIAM; 2000. ISBN:0-89871-458-3.

    Book  MATH  Google Scholar 

  39. Zimányi E. Query evaluation in probabilistic relational databases. Theor Comput Sci. 1997;171(1–2): 179–219.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sudeepa Roy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Roy, S. (2018). Uncertain Data Lineage. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80759

Download citation

Publish with us

Policies and ethics