skip to main content
10.1145/2457317.2457320acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

A declarative approach to customize workflow provenance

Published: 18 March 2013 Publication History

Abstract

Provenance describes the origin, context, derivation, and ownership of data products and is becoming increasingly important in scientific applications. This information can be used, e.g., to explain, debug, and reproduce the results of computational experiments, or to determine the validity and quality of data products. In contrast, it may be infeasible or undesirable to share complete provenance of a scientific experiment. Towards finding a balance between these requirements, we develop a framework and a system that allows scientists to declaratively specify their provenance data publication and customization requirements. Using this system, scientists can specify which parts of the provenance data are to be included in the result and which parts should be hidden, or anonymized. However, arbitrary application of these specifications may not maintain provenance data integrity. Thus, we allow scientists to specify provenance data integrity requirements, in form of provenance policies, along with their provenance data publication and customization requirements. Our system then systematically applies all the publication and customization requirements on the provenance data and ensures all the provenance policies as specified by the scientist.

References

[1]
M. Anand, S. Bowers, and B. Ludäscher. Provenance browser: Displaying and querying scientific workflow provenance graphs. In Data Engineering (ICDE), 2010 IEEE 26th International Conference on, pages 1201--1204. IEEE, 2010.
[2]
S. Borzsony, D. Kossmann, and K. Stocker. The skyline operator. In Data Engineering, 2001. Proceedings. 17th International Conference on, pages 421--430. IEEE, 2001.
[3]
R. Bose and J. Frew. Lineage retrieval for scientific data processing: a survey. ACM Computing Surveys (CSUR), 37(1):1--28, 2005.
[4]
A. Chebotko, S. Chang, S. Lu, F. Fotouhi, and P. Yang. Scientific workflow provenance querying with security views. In Web-Age Information Management, 2008. WAIM'08. The Ninth International Conference on, pages 349--356. IEEE, 2008.
[5]
S. Davidson, S. Khanna, S. Roy, and S. Boulakia. Privacy issues in scientific workflow provenance. In Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric Science, pages 1--6. ACM, 2010.
[6]
S. B. Davidson, S. C. Boulakia, A. Eyal, B. Ludäscher, T. M. McPhillips, S. Bowers, M. K. Anand, and J. Freire. Provenance in scientific workflow systems. IEEE Data Engineering Bulletin, 30(4):44--50, 2007.
[7]
S. B. Davidson, S. Khanna, V. Tannen, S. Roy, Y. Chen, T. Milo, and J. Stoyanovich. Enabling privacy in provenance-aware workflow systems. In CIDR, pages 215--218, 2011.
[8]
S. Dey, D. Zinn, and B. Ludäscher. PROPUB: Towards a Declarative Approach for Publishing Customized, Policy-Aware Provenance. In 23rd Intl. Conf. on Scientific and Statistical Database Management Conference (SSDBM), LNCS 6809, Portland, Oregon, 2011. Springer.
[9]
S. Dey, D. Zinn, and B. Ludäscher. Reconciling provenance policy conflicts by inventing anonymous nodes. In The Semantic Web: ESWC 2011 Workshops, pages 172--185. Springer, 2012.
[10]
A. Dolgert, L. Gibbons, C. Jones, V. Kuznetsov, M. Riedewald, D. Riley, G. Sharp, and P. Wittich. Provenance in high-energy physics workflows. Computing in Science & Engineering, 10(3):22--29, 2008.
[11]
J. Freire, D. Koop, E. Santos, and C. T. Silva. Provenance for Computational Tasks: A Survey. Computing in Science and Engineering, 10(3):11--21, 2008.
[12]
D. Hull, K. Wolstencroft, R. Stevens, C. A. Goble, M. R. Pocock, P. Li, and T. Oinn. Taverna: a tool for building and running workflows of services. Nucleic Acids Research, 34:729--732, 2006.
[13]
B. L. James Cheney, Anthony Finkelstein and S. Vansummeren. Principles of provenance (dagstuhl seminar 12091). Dagstuhl Reports, 2(2):84--113, 2012.
[14]
B. Ludäscher, I. Altintas, and C. Berkley. Scientific Workflow Management and the Kepler System. Concurrency and Computation: Practice and Experience, 18:1039--1065, 2005.
[15]
B. Ludäscher, S. Bowers, and T. M. McPhillips. Scientific workflows. In Encyclopedia of Database Systems, pages 2507--2511. Springer, 2009.
[16]
T. McPhillips, S. Bowers, D. Zinn, and B. Ludäscher. Scientific workflow design for mere mortals. Future Generation Computer Systems, 25(5):541--551, 2009.
[17]
P. Missier, B. Ludäscher, S. Bowers, S. Dey, A. Sarkar, B. Shrestha, I. Altintas, M. Anand, and C. Goble. Linking multiple workflow provenance traces for interoperable collaborative science. In Workflows in Support of Large-Scale Science (WORKS), 2010 5th Workshop on, pages 1--8. IEEE.
[18]
L. Moreau, B. Clifford, J. Freire, J. Futrelle, Y. Gil, P. Groth, N. Kwasnikowska, S. Miles, P. Missier, J. Myers, B. Plale, Y. Simmhan, E. Stephan, and J. V. den Bussche. The open provenance model core specification (v1.1). Future Generation Computer Systems, 27(6):743--756, 2011.
[19]
L. Moreau, B. Ludäscher, I. Altintas, R. Barga, S. Bowers, S. Callahan, J. Chin, B. Clifford, S. Cohen, S. Cohen-Boulakia, et al. Special issue: The first provenance challenge. Concurrency and Computation: Practice and Experience, 20(5):409--418, 2008.
[20]
Y. Simmhan, B. Plale, and D. Gannon. A survey of data provenance in e-science. ACM SIGMOD Record, 34(3):31--36, 2005.

Cited By

View all
  • (2015)Obscuring Provenance Confidential Information via Graph TransformationTrust Management IX10.1007/978-3-319-18491-3_8(109-125)Online publication date: 30-Apr-2015

Index Terms

  1. A declarative approach to customize workflow provenance

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      EDBT '13: Proceedings of the Joint EDBT/ICDT 2013 Workshops
      March 2013
      423 pages
      ISBN:9781450315999
      DOI:10.1145/2457317
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 March 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article

      Conference

      EDBT/ICDT '13

      Acceptance Rates

      EDBT '13 Paper Acceptance Rate 7 of 10 submissions, 70%;
      Overall Acceptance Rate 7 of 10 submissions, 70%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 01 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2015)Obscuring Provenance Confidential Information via Graph TransformationTrust Management IX10.1007/978-3-319-18491-3_8(109-125)Online publication date: 30-Apr-2015

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media