Synonyms
History; Lineage; Origin; Pedigree; Source
Definition
Let t be a data element in the result of a query Q applied to a dataset D. The provenance of t is the set of all proofs for t according to Q and D. A proof for t according to Q and D is a subset D′ of data elements in D so that t is in the result of applying Q on D′. In some cases, a proof also details the process by which t is derived from Q and D′.
Most work on provenance in databases focused on finding minimal subsets of D that witness the existence of t in the result, as well as which parts of D are t copied from. More general forms of provenance based on annotations (e.g., elements of algebraic structures such as semirings) have also been investigated. Provenance is also important for understanding how data in databases has evolved as a result of updates over time, particularly in curated scientific databases.
Historical Background
Data provenance (or fine-grained provenance) is an account of the derivation of a piece...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Arab B, Gawlick D, Radhakrishnan V, Guo H, Glavic B. A generic provenance middleware for database queries, updates, and transactions. In: Proceedings of the 6th USENIX Workshop on the Theory and Practice of Provenance; 2014.
Archer DW, Delcambre LML, Maier D. User trust and judgments in a curated database with explicit provenance. In: In search of elegance in the theory and practice of computation. Lecture notes in computer science, vol. 8000. Heidelberg: Springer; 2013. p. 89–111.
Benjelloun O, Sarma AD, Halevy AY, Theobald M, Widom J. Databases with uncertainty and lineage. VLDB J. 2008;17(2):243–64.
Bhagwat D, Chiticariu L, Tan W-C, Vijayvargiya G. An annotation management system for relational databases. Very Large Data Bases (VLDB) J. 2005;14(4):373–96.
Buneman P, Chapman A, Cheney J. Provenance management in curated databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2006. p. 539–50.
Buneman P, Khanna S, Tan W-C. Why and where: a characterization of data provenance. In: Proceedings of the 8th International Conference on Database Theory; 2001. p. 316–30.
Buneman P, Khanna S, Tan W-C. On propagation of deletions and annotations through views. In: Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2002. p. 150–8.
Buneman P, Tan W-C. Provenance in databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2007. p. 1171–3. (Tutorial Track).
Cheney J, Chiticariu L, Tan WC. Provenance in databases: why, how, and where. Found Trends Databases. 2009;1(4):379–474.
Chiticariu L, Tan W-C. Debugging schema mappings with routes. In: Proceedings of the 32nd International Conference on Very Large Data Bases; 2006. p. 79–90.
Cui Y, Widom J, Wiener JL. Tracing the lineage of view data in a warehousing environment. ACM Trans Database Syst. 2000;25(2):179–227.
Das Sarma A, Theobald M, Widom J. LIVE: a lineage-supported versioned DBMS. In: Proceedings of the 22nd International Conference on. Scientific and Statistical Database Management; 2010.
Fegaras L. Propagating updates through XML views using lineage tracing. In: Proceedings of the 26th International Conference on Data Engineering; 2010. p. 309–20.
Glavic B, Alonso G. Perm: processing provenance and data on the same data model through query rewriting. In: Proceedings of the 25th International Conference on Data Engineering; 2009.
Green TJ, Ives ZG, Tannen V. Reconcilable differences. Theory Comput Syst. 2011;49(2):460–88.
Green TJ, Karvounarakis G, Tannen V. Provenance semirings. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2007.
Karvounarakis G, Green TJ. Semiring-annotated data: queries and provenance. ACM SIGMOD Rec. 2012;41(3):5–14.
Karvounarakis G, Ives ZG, Tannen V. Querying data provenance. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2010.
Wang Y, Madnick SE. A polygen model for heterogeneous database systems: the source tagging perspective. In: Proceedings of the 16th International Conference on Very Large Data Bases; 1990. p. 519–38.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Cheney, J., Tan, WC. (2018). Provenance in Databases. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_283
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_283
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering