Abstract
In applications such as data warehousing or data exchange, the ability to efficiently generate and query provenance information is crucial to understand the origin of data. In this chapter, we review some of the main contributions of Perm, a DBMS that generates different types of provenance information for complex SQL queries (including nested and correlated subqueries and aggregation). The two key ideas behind Perm are representing data and its provenance together in a single relation and relying on query rewrites to generate this representation. Through this, Perm supports fully integrated, on-demand provenance generation and querying using SQL. Since Perm rewrites a query requesting provenance into a regular SQL query and generates easily optimizable SQL code, its performance greatly benefits from the query optimization techniques provided by the underlying DBMS.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Acar, U., Buneman, P., Cheney, J., van den Bussche, J., Kwasnikowska, N., Vansummeren, S.: A graph model of data and workflow provenance. In: TaPP (2010)
Agrawal, P., Benjelloun, O., Das Sarma, A., Hayworth, C., Nabar, S.U., Sugihara, T., Widom, J.: Trio: A System for Data, Uncertainty, and Lineage. In: VLDB, pp. 1151–1154 (2006)
Amsterdamer, Y., Deutch, D., Tannen, V.: On the Limitations of Provenance for Queries with Difference. In: TaPP (2011)
Amsterdamer, Y., Deutch, D., Tannen, V.: Provenance for Aggregate Queries. In: PODS, pp. 153–164 (2011)
Bhagwat, D., Chiticariu, L., Tan, W.-C., Vijayvargiya, G.: An Annotation Management System for Relational Databases. VLDB Journal 14(4), 373–396 (2005)
Bose, R., Frew, J.: Lineage retrieval for scientific data processing: A survey. ACM Computing Surveys 37(1), 1–28 (2005)
Buneman, P., Khanna, S., Tan, W.-C.: Why and Where: A Characterization of Data Provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)
Cheney, J.: Program Slicing and Data Provenance. IEEE Data Engineering Bulletin 30(4), 22–28 (2007)
Cheney, J.: Causality and the Semantics of Provenance. In: DCM, pp. 63–74 (2010)
Cheney, J., Chiticariu, L., Tan, W.-C.: Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases 1(4), 379–474 (2009)
Chiticariu, L., Tan, W.-C., Vijayvargiya, G.: DBNotes: a Post-it System for Relational Databases based on Provenance. In: SIGMOD, pp. 942–944 (2005)
Cui, Y., Widom, J., Wiener, J.L.: Tracing the Lineage of View Data in a Warehousing Environment. TODS 25(2), 179–227 (2000)
Dayal, U.: Of Nests and Trees: A Unified Approach to Processing Queries That Contain Nested Subqueries, Aggregates, and Quantifiers. In: VLDB, pp. 197–208 (1987)
Foster, J.N., Green, T.J., Tannen, V.: Annotated XML: Queries and Provenance. In: PODS, pp. 271–280 (2008)
Geerts, F., Poggi, A.: On database query languages for K-relations. Journal of Applied Logic 8(2), 173–185 (2010)
Glavic, B.: Perm: Efficient Provenance Support for Relational Databases. PhD thesis, University of Zurich (2010)
Glavic, B., Alonso, G.: Perm: Processing Provenance and Data on the same Data Model through Query Rewriting. In: ICDE, pp. 174–185 (2009)
Glavic, B., Alonso, G.: Provenance for Nested Subqueries. In: EDBT, pp. 982–993 (2009)
Glavic, B., Alonso, G., Miller, R.J., Haas, L.M.: TRAMP: Understanding the Behavior of Schema Mappings through Provenance. In: VLDB, pp. 1314–1325 (2010)
Green, T.J., Ives, Z.G., Tannen, V.: Reconcilable Differences. In: ICDT, pp. 212–224 (2009)
Green, T.J., Karvounarakis, G., Ives, Z.G., Tannen, V.: Update Exchange with Mappings and Provenance. In: VLDB, pp. 675–686 (2007)
Green, T.J., Karvounarakis, G., Tannen, V.: Provenance Semirings. In: PODS, pp. 31–40 (2007)
Green, T.J.: Containment of conjunctive queries on annotated relations. Theory of Computing Systems 49(2), 429–459 (2011)
Karvounarakis, G., Green, T.J.: Semiring-Annotated Data: Queries and Provenance. SIGMOD Record 41(3), 5–14 (2012)
Karvounarakis, G., Ives, Z.G., Tannen, V.: Querying data provenance. In: SIGMOD, pp. 951–962 (2010)
Kementsietsidis, A., Wang, M.: On the Efficiency of Provenance Queries. In: ICDE, pp. 1223–1226 (2009)
Kementsietsidis, A., Wang, M.: Provenance Query Evaluation: What’s so Special about it? In: CIKM, pp. 681–690 (2009)
Kim, W.: On Optimizing an SQL-like Nested Query. TODS 7(3), 443–469 (1982)
Kostylev, E.V., Buneman, P.: Combining dependent annotations for relational algebra. In: ICDT, pp. 196–207 (2012)
Meliou, A., Gatterbauer, W., Moore, K.F., Suciu, D.: The Complexity of Causality and Responsibility for Query Answers and non-Answers. PVLDB 4(1), 34–45 (2010)
Park, J., Nguyen, D., Sandhu, R.: A provenance-based access control model. In: PST, pp. 137–144. IEEE (2012)
Seshadri, P., Pirahesh, H., Leung, T.Y.C.: Complex Query Decorrelation. In: ICDE, pp. 450–458 (1996)
Tan, W.-C.: Containment of Relational Queries with Annotation Propagation. In: DBPL, pp. 37–53 (2003)
Widom, J.: Trio: A System for Managing Data, Uncertainty, and Lineage. In: Managing and Mining Uncertain Data, pp. 113–148 (2008)
Widom, J., Theobald, M., Das Sarma, A.: Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases. In: ICDE, pp. 1023–1032 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Glavic, B., Miller, R.J., Alonso, G. (2013). Using SQL for Efficient Generation and Querying of Provenance Information. In: Tannen, V., Wong, L., Libkin, L., Fan, W., Tan, WC., Fourman, M. (eds) In Search of Elegance in the Theory and Practice of Computation. Lecture Notes in Computer Science, vol 8000. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41660-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-41660-6_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41659-0
Online ISBN: 978-3-642-41660-6
eBook Packages: Computer ScienceComputer Science (R0)