Skip to main content

Using SQL for Efficient Generation and Querying of Provenance Information

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8000))

Abstract

In applications such as data warehousing or data exchange, the ability to efficiently generate and query provenance information is crucial to understand the origin of data. In this chapter, we review some of the main contributions of Perm, a DBMS that generates different types of provenance information for complex SQL queries (including nested and correlated subqueries and aggregation). The two key ideas behind Perm are representing data and its provenance together in a single relation and relying on query rewrites to generate this representation. Through this, Perm supports fully integrated, on-demand provenance generation and querying using SQL. Since Perm rewrites a query requesting provenance into a regular SQL query and generates easily optimizable SQL code, its performance greatly benefits from the query optimization techniques provided by the underlying DBMS.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Acar, U., Buneman, P., Cheney, J., van den Bussche, J., Kwasnikowska, N., Vansummeren, S.: A graph model of data and workflow provenance. In: TaPP (2010)

    Google Scholar 

  2. Agrawal, P., Benjelloun, O., Das Sarma, A., Hayworth, C., Nabar, S.U., Sugihara, T., Widom, J.: Trio: A System for Data, Uncertainty, and Lineage. In: VLDB, pp. 1151–1154 (2006)

    Google Scholar 

  3. Amsterdamer, Y., Deutch, D., Tannen, V.: On the Limitations of Provenance for Queries with Difference. In: TaPP (2011)

    Google Scholar 

  4. Amsterdamer, Y., Deutch, D., Tannen, V.: Provenance for Aggregate Queries. In: PODS, pp. 153–164 (2011)

    Google Scholar 

  5. Bhagwat, D., Chiticariu, L., Tan, W.-C., Vijayvargiya, G.: An Annotation Management System for Relational Databases. VLDB Journal 14(4), 373–396 (2005)

    Article  Google Scholar 

  6. Bose, R., Frew, J.: Lineage retrieval for scientific data processing: A survey. ACM Computing Surveys 37(1), 1–28 (2005)

    Article  Google Scholar 

  7. Buneman, P., Khanna, S., Tan, W.-C.: Why and Where: A Characterization of Data Provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  8. Cheney, J.: Program Slicing and Data Provenance. IEEE Data Engineering Bulletin 30(4), 22–28 (2007)

    Google Scholar 

  9. Cheney, J.: Causality and the Semantics of Provenance. In: DCM, pp. 63–74 (2010)

    Google Scholar 

  10. Cheney, J., Chiticariu, L., Tan, W.-C.: Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases 1(4), 379–474 (2009)

    Article  Google Scholar 

  11. Chiticariu, L., Tan, W.-C., Vijayvargiya, G.: DBNotes: a Post-it System for Relational Databases based on Provenance. In: SIGMOD, pp. 942–944 (2005)

    Google Scholar 

  12. Cui, Y., Widom, J., Wiener, J.L.: Tracing the Lineage of View Data in a Warehousing Environment. TODS 25(2), 179–227 (2000)

    Article  Google Scholar 

  13. Dayal, U.: Of Nests and Trees: A Unified Approach to Processing Queries That Contain Nested Subqueries, Aggregates, and Quantifiers. In: VLDB, pp. 197–208 (1987)

    Google Scholar 

  14. Foster, J.N., Green, T.J., Tannen, V.: Annotated XML: Queries and Provenance. In: PODS, pp. 271–280 (2008)

    Google Scholar 

  15. Geerts, F., Poggi, A.: On database query languages for K-relations. Journal of Applied Logic 8(2), 173–185 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  16. Glavic, B.: Perm: Efficient Provenance Support for Relational Databases. PhD thesis, University of Zurich (2010)

    Google Scholar 

  17. Glavic, B., Alonso, G.: Perm: Processing Provenance and Data on the same Data Model through Query Rewriting. In: ICDE, pp. 174–185 (2009)

    Google Scholar 

  18. Glavic, B., Alonso, G.: Provenance for Nested Subqueries. In: EDBT, pp. 982–993 (2009)

    Google Scholar 

  19. Glavic, B., Alonso, G., Miller, R.J., Haas, L.M.: TRAMP: Understanding the Behavior of Schema Mappings through Provenance. In: VLDB, pp. 1314–1325 (2010)

    Google Scholar 

  20. Green, T.J., Ives, Z.G., Tannen, V.: Reconcilable Differences. In: ICDT, pp. 212–224 (2009)

    Google Scholar 

  21. Green, T.J., Karvounarakis, G., Ives, Z.G., Tannen, V.: Update Exchange with Mappings and Provenance. In: VLDB, pp. 675–686 (2007)

    Google Scholar 

  22. Green, T.J., Karvounarakis, G., Tannen, V.: Provenance Semirings. In: PODS, pp. 31–40 (2007)

    Google Scholar 

  23. Green, T.J.: Containment of conjunctive queries on annotated relations. Theory of Computing Systems 49(2), 429–459 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  24. Karvounarakis, G., Green, T.J.: Semiring-Annotated Data: Queries and Provenance. SIGMOD Record 41(3), 5–14 (2012)

    Article  Google Scholar 

  25. Karvounarakis, G., Ives, Z.G., Tannen, V.: Querying data provenance. In: SIGMOD, pp. 951–962 (2010)

    Google Scholar 

  26. Kementsietsidis, A., Wang, M.: On the Efficiency of Provenance Queries. In: ICDE, pp. 1223–1226 (2009)

    Google Scholar 

  27. Kementsietsidis, A., Wang, M.: Provenance Query Evaluation: What’s so Special about it? In: CIKM, pp. 681–690 (2009)

    Google Scholar 

  28. Kim, W.: On Optimizing an SQL-like Nested Query. TODS 7(3), 443–469 (1982)

    Article  MATH  Google Scholar 

  29. Kostylev, E.V., Buneman, P.: Combining dependent annotations for relational algebra. In: ICDT, pp. 196–207 (2012)

    Google Scholar 

  30. Meliou, A., Gatterbauer, W., Moore, K.F., Suciu, D.: The Complexity of Causality and Responsibility for Query Answers and non-Answers. PVLDB 4(1), 34–45 (2010)

    Google Scholar 

  31. Park, J., Nguyen, D., Sandhu, R.: A provenance-based access control model. In: PST, pp. 137–144. IEEE (2012)

    Google Scholar 

  32. Seshadri, P., Pirahesh, H., Leung, T.Y.C.: Complex Query Decorrelation. In: ICDE, pp. 450–458 (1996)

    Google Scholar 

  33. Tan, W.-C.: Containment of Relational Queries with Annotation Propagation. In: DBPL, pp. 37–53 (2003)

    Google Scholar 

  34. Widom, J.: Trio: A System for Managing Data, Uncertainty, and Lineage. In: Managing and Mining Uncertain Data, pp. 113–148 (2008)

    Google Scholar 

  35. Widom, J., Theobald, M., Das Sarma, A.: Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases. In: ICDE, pp. 1023–1032 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Glavic, B., Miller, R.J., Alonso, G. (2013). Using SQL for Efficient Generation and Querying of Provenance Information. In: Tannen, V., Wong, L., Libkin, L., Fan, W., Tan, WC., Fourman, M. (eds) In Search of Elegance in the Theory and Practice of Computation. Lecture Notes in Computer Science, vol 8000. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41660-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41660-6_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41659-0

  • Online ISBN: 978-3-642-41660-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics