Abstract
Provenance is information recording the source, derivation, or history of some information. Provenance tracking has been studied in a variety of settings; however, although many design points have been explored, the mathematical or semantic foundations of data provenance have received comparatively little attention. In this paper, we argue that dependency analysis techniques familiar from program analysis and program slicing provide a formal foundation for forms of provenance that are intended to show how (part of) the output of a query depends on (parts of) its input. We introduce a semantic characterization of such dependency provenance, show that this form of provenance is not computable, and provide dynamic and static approximation techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abadi, M., Banerjee, A., Heintze, N., Riecke, J.G.: A core calculus of dependency. In: POPL, pp. 147–160. ACM Press, New York (1999)
Abadi, M., Lampson, B., Lévy, J.-J.: Analysis and caching of dependencies. In: ICFP, pp. 83–91. ACM Press, New York (1996)
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)
Acar, U.A., Blelloch, G.E., Harper, R.: Selective memoization. In: Proceedings of the 30th Annual ACM Symposium on Principles of Programming Languages, ACM Press, New York (2003)
Benjelloun, O., Sarma, A.D., Halevy, A.Y., Widom, J.: ULDBs: Databases with uncertainty and lineage. In: VLDB, pp. 953–964 (2006)
Bhagwat, D., Chiticariu, L., Tan, W.-C., Vijayvargiya, G.: An annotation management system for relational databases. VLDB Journal 14(4), 373–396 (2005)
Biswas, S.: Dynamic Slicing in Higher-Order Programming Languages. PhD thesis, University of Pennsylvania (1997)
Bose, R., Frew, J.: Lineage retrieval for scientific data processing: a survey. ACM Comput. Surv. 37(1), 1–28 (2005)
Buneman, P., Chapman, A., Cheney, J.: Provenance management in curated databases. In: SIGMOD 2006, pp. 539–550 (2006)
Buneman, P., Cheney, J., Vansummeren, S.: On the expressiveness of implicit provenance in query and update languages. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 209–223. Springer, Heidelberg (2006)
Buneman, P., Khanna, S., Tan, W.-C.: Why and where: A characterization of data provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)
Buneman, P., Khanna, S., Tan, W.-C.: On propagation of deletions and annotations through views. In: PODS, pp. 150–158 (2002)
Buneman, P., Naqvi, S.A., Tannen, V., Wong, L.: Principles of programming with complex objects and collection types. Theor. Comp. Sci. 149(1), 3–48 (1995)
Cheney, J., Ahmed, A., Acar, U.: Provenance as dependency analysis. Technical Report arXiv:0708.2173v1, arXiv.org e-Print archive (2007)
Cui, Y., Widom, J., Wiener, J.L.: Tracing the lineage of view data in a warehousing environment. ACM Trans. Database Syst. 25(2), 179–227 (2000)
Field, J., Tip, F.: Dynamic dependence in term rewriting systems and its application to program slicing. Information and Software Technology 40(11–12), 609–636 (1998)
Moreau, L., Foster, I. (eds.): IPAW 2006. LNCS, vol. 4145. Springer, Heidelberg (2006)
Geerts, F., Kementsietsidis, A., Milano, D.: Mondrian: Annotating and querying databases through colors and blocks. In: ICDE 2006, p. 82 (2006)
Green, T.J., Karvounarakis, G., Tannen, V.: Provenance semirings. In: PODS, pp. 31–40. ACM Press, New York (2007)
Sabelfeld, A., Myers, A.: Language-based information-flow security. IEEE Journal on Selected Areas in Communications 21(1), 5–19 (2003)
Simmhan, Y., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Record 34(3), 31–36 (2005)
Wadler, P.: Comprehending monads. Mathematical Structures in Computer Science 2, 461–493 (1992)
Wang, Y.R., Madnick, S.E.: A polygen model for heterogeneous database systems: The source tagging perspective. In: VLDB, pp. 519–538 (1990)
Weiser, M.: Program slicing. In: ICSE, pp. 439–449. IEEE Press, Piscataway, NJ, USA (1981)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cheney, J., Ahmed, A., Acar, U.A. (2007). Provenance as Dependency Analysis. In: Arenas, M., Schwartzbach, M.I. (eds) Database Programming Languages. DBPL 2007. Lecture Notes in Computer Science, vol 4797. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75987-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-75987-4_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75986-7
Online ISBN: 978-3-540-75987-4
eBook Packages: Computer ScienceComputer Science (R0)