Abstract
Data provenance, i.e., the lineage and processing history of data, is becoming increasingly important in scientific applications. Provenance information can be used, e.g., to explain, debug, and reproduce the results of computational experiments, or to determine the validity and quality of data products. In collaborative science settings, it may be infeasible or undesirable to publish the complete provenance of a data product. We develop a framework that allows data publishers to “customize” provenance data prior to exporting it. For example, users can specify which parts of the provenance graph are to be included in the result and which parts should be hidden, anonymized, or abstracted. However, such user-defined provenance customization needs to be carefully counterbalanced with the need to faithfully report all relevant data and process dependencies. To this end, we propose ProPub (Provenance Publisher), a framework and system which allows the user (i) to state provenance publication and customization requests, (ii) to specify provenance policies that should be obeyed, (iii) to check whether the policies are satisfied, and (iv) to repair policy violations and reconcile conflicts between user requests and provenance policies should they occur. In the ProPub approach, policies as well as customization requests are expressed as logic rules. By using a declarative, logic-based framework, ProPub can first check and then enforce integrity constraints (ICs), e.g., by rejecting inconsistent user requests, or by repairing violated ICs according to a given conflict resolution strategy.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Gil, Y., Deelman, E., Ellisman, M., Fahringer, T., Fox, G., Gannon, D., Goble, C., Livny, M., Moreau, L., Myers, J.: Examining the challenges of scientific workflows. Computer 40(12), 24–32 (2007)
Taylor, I.J., Deelman, E., Gannon, D., Shields, M.S. (eds.): Workflows for eScience. Springer, Heidelberg (2007)
Ludäscher, B., Bowers, S., McPhillips, T.M.: Scientific Workflows. In: Encyclopedia of Database Systems, pp. 2507–2511 (2009)
Davidson, S.B., Boulakia, S.C., Eyal, A., Ludäscher, B., McPhillips, T.M., Bowers, S., Anand, M.K., Freire, J.: Provenance in Scientific Workflow Systems. IEEE Data Engineering Bulletin 30(4), 44–50 (2007)
Anand, M., Bowers, S., Ludascher, B.: Provenance browser: Displaying and querying scientific workflow provenance graphs. In: International Conference, pp. 1201–1204 (2010)
Davidson, S., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: SIGMOD Conference, Citeseer, pp. 1345–1350 (2008)
Miles, S., Deelman, E., Groth, P., Vahi, K., Mehta, G., Moreau, L.: Connecting Scientific Data to Scientific Experiments with Provenance. In: Proceedings of the Third IEEE International Conference on e-Science and Grid Computing, pp. 179–186. IEEE Computer Society, Washington, DC, USA (2007)
Chebotko, A., Chang, S., Lu, S., Fotouhi, F., Yang, P.: Scientific workflow provenance querying with security views. In: The Ninth International Conference on Web-Age Information Management, WAIM 2008, pp. 349–356. IEEE, Los Alamitos (2008)
Altintas, I., Anand, M.K., Crawl, D., Bowers, S., Belloum, A., Missier, P., Ludäscher, B., Goble, C., Sloot, P.: Understanding Collaborative Studies through Interoperable Workflow Provenance. In: McGuinness, D.L., Michaelis, J.R., Moreau, L. (eds.) IPAW 2010. LNCS, vol. 6378, pp. 42–58. Springer, Heidelberg (2010)
Missier, P., Ludäscher, B., Bowers, S., Dey, S., Sarkar, A., Shrestha, B., Altintas, I., Anand, M., Goble, C.: Linking multiple workflow provenance traces for interoperable collaborative science. In: Workflows in Support of Large-Scale Science (WORKS), pp. 1–8. IEEE, Los Alamitos (2010)
Altintas, I.: Collaborative Provenance for Workflow-Driven Science and Engineering. PhD thesis, University of Amsterdam (February 2011)
Biton, O., Cohen-Boulakia, S., Davidson, S., Hara, C.: Querying and managing provenance through user views in scientific workflows. In: International Conference, pp. 1072–1081 (2008)
Davidson, S., Khanna, S., Roy, S., Boulakia, S.: Privacy issues in scientific workflow provenance. In: Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric Science, pp. 1–6. ACM, New York (2010)
Moreau, L., Clifford, B., Freire, J., Gil, Y., Groth, P., Futrelle, J., Kwasnikowska, N., Miles, S., Missier, P., Myers, J., Simmhan, Y., Stephan, E., den Bussche, J.V.: The Open Provenance Model - core specification (v1.1). Future Generation Computer Systems (2010)
Cheney, J.: Causality and the Semantics of Provenance. In: CoRR, abs/1006.1429 (2010)
Moreau, L., Ludäscher, B., Altintas, I., Barga, R., Bowers, S., Callahan, S., Chin, J., Clifford, B., Cohen, S., Cohen-Boulakia, S., et al.: Special issue: The first provenance challenge. Concurrency and Computation: Practice and Experience 20(5), 409–418 (2008)
Biton, O., Cohen-Boulakia, S., Davidson, S.: Zoom* userviews: Querying relevant provenance in workflow systems. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 1366–1369. VLDB Endowment (2007)
Ludäscher, B., May, W., Lausen, G.: Referential Actions as Logic Rules. In: PODS, pp. 217–227 (1997)
Anand, M., Bowers, S., McPhillips, T., Ludäscher, B.: Exploring scientific workflow provenance using hybrid queries over nested data and lineage graphs. In: Winslett, M. (ed.) SSDBM 2009. LNCS, vol. 5566, pp. 237–254. Springer, Heidelberg (2009)
May, W., Ludäscher, B.: Understanding the global semantics of referential actions using logic rules. ACM Transactions on Database Systems (TODS) 27, 343–397 (2002)
Ludäscher, B., Altintas, I., Bowers, S., Cummings, J., Critchlow, T., Deelman, E., Roure, D.D., Freire, J., Goble, C., Jones, M., Klasky, S., McPhillips, T., Podhorszki, N., Silva, C., Taylor, I., Vouk, M.: Scientific Process Automation and Workflow Management. In: Shoshani, A., Rotem, D. (eds.) Scientific Data Management: Challenges, Existing Technology, and Deployment. Chapman & Hall/CRC (2009)
Goble, C., Bhagat, J., Aleksejevs, S., Cruickshank, D., Michaelides, D., Newman, D., Borkum, M., Bechhofer, S., Roos, M., Li, P., De Roure, D.: MyExperiment: a repository and social network for the sharing of bioinformatics workflows. Nucleic Acids Research (2010)
Bose, R., Frew, J.: Lineage retrieval for scientific data processing: a survey. ACM Computing Surveys (CSUR) 37(1), 1–28 (2005)
Simmhan, Y., Plale, B., Gannon, D.: A survey of data provenance in e-science. ACM SIGMOD Record 34(3), 31–36 (2005)
Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for Computational Tasks: A Survey. Computing in Science and Engineering 10(3), 11–21 (2008)
Freire, J., Silva, C., Callahan, S., Santos, E., Scheidegger, C., Vo, H.: Managing rapidly-evolving scientific workflows. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 10–18. Springer, Heidelberg (2006)
Silva, C., Freire, J., Callahan, S.: Provenance for visualizations: Reproducibility and beyond. Computing in Science & Engineering, 82–89 (2007)
Heinis, T., Alonso, G.: Efficient Lineage Tracking For Scientific Workflows. In: Proceedings of the 2008 ACM SIGMOD Conference, pp. 1007–1018 (2008)
Anand, M., Bowers, S., Ludäscher, B.: Techniques for efficiently querying scientific workflow provenance graphs. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 287–298. ACM, New York (2010)
Davidson, S., Khanna, S., Roy, S., Stoyanovich, J., Tannen, V., Chen, Y., Milo, T.: Enabling Privacy in Provenance-Aware Workflow Systems. In: Conference on Innovative Data Systems Research, CIDR (2011)
Anand, M., Bowers, S., Ludäscher, B.: A navigation model for exploring scientific workflow provenance graphs. In: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, pp. 1–10. ACM, New York (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dey, S.C., Zinn, D., Ludäscher, B. (2011). ProPub: Towards a Declarative Approach for Publishing Customized, Policy-Aware Provenance. In: Bayard Cushing, J., French, J., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2011. Lecture Notes in Computer Science, vol 6809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22351-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-22351-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22350-1
Online ISBN: 978-3-642-22351-8
eBook Packages: Computer ScienceComputer Science (R0)