Skip to main content

Tracking and Analyzing the Evolution of Provenance from Scripts

  • Conference paper
  • First Online:
Provenance and Annotation of Data and Processes (IPAW 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9672))

Included in the following conference series:

Abstract

Script languages are powerful tools for scientists. Scientists use them to process data, invoke programs, and link program outputs/inputs. During the life cycle of scientific experiments, scientists compose scripts, execute them, and perform analysis on the results. Depending on the results, they modify their script to get more data to confirm the original hypothesis or to test a new hypothesis, evolving the experiment. While some tools capture provenance from the execution of scripts, most approaches focus on a single execution, leaving out the possibility to analyze the provenance evolution of the experiment as a whole. This work enables tracking and analyzing the provenance evolution gathered from scripts. Tracking the provenance evolution also helps to reconstruct the environment of previous executions for reproduction. Provenance evolution analysis allows comparison of executions to understand what has changed and supports the decision of which execution provides better results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://twiki.ipaw.info/bin/view/Challenge/FirstProvenanceChallenge.

  2. 2.

    https://sites.google.com/site/provbench/home/provbench-provenance-week-2014.

  3. 3.

    https://github.com/provbench/Swift-PROV.

  4. 4.

    https://github.com/provbench/CSIRO-PROV.

  5. 5.

    https://github.com/provbench/VisTrails-PROV.

  6. 6.

    https://github.com/provbench/Wf4Ever-PROV.

References

  1. Altintas, I., et al.: Kepler: an extensible system for design and execution of scientific workflows. In: International Conference on Scientific and Statistical Database Management (SSDBM), Santorini, Greece, pp. 423–424 (2004)

    Google Scholar 

  2. Angelino, E., Yamins, D., Seltzer, M.: StarFlow: a script-centric data analysis environment. In: McGuinness, D.L., Michaelis, J.R., Moreau, L. (eds.) IPAW 2010. LNCS, vol. 6378, pp. 236–250. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  3. Bochner, C., Gude, R., Schreiber, A.T.: A python library for provenance recording and querying. In: Freire, J., Koop, D., Moreau, L. (eds.) IPAW 2008. LNCS, vol. 5272, pp. 229–240. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  4. Callahan, S.P., et al.: VisTrails: visualization meets data management. In: ACM SIGMOD, Chicago, USA, pp. 745–747 (2006)

    Google Scholar 

  5. Collberg, C., et al.: A system for graph-based visualization of the evolution of software. In: ACM Symposium on Software Visualization (SoftVis), New York, NY, USA, p. 77–ff (2003)

    Google Scholar 

  6. Conradi, R., Westfechtel, B.: Version models for software configuration management. ACM Comput. Surv. 30(2), 232–282 (1998)

    Article  Google Scholar 

  7. Davison, A.P.: Automated capture of experiment context for easier reproducibility in computational research. Comput. Sci. Eng. 14(4), 48–56 (2012)

    Article  MathSciNet  Google Scholar 

  8. Freire, J., et al.: Provenance for computational tasks: a survey. Comput. Sci. Eng. 10(3), 11–21 (2008)

    Article  Google Scholar 

  9. Hunt, J.W., Szymanski, T.G.: A fast algorithm for computing longest common subsequences. Commun. ACM 20(5), 350–353 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  10. Koop, D., et al.: Visual summaries for graph collections. In: IEEE Pacific Visualization Symposium (PacificVis), pp. 57–64 (2013)

    Google Scholar 

  11. Lerner, B.S., Boose, E.R.: Collecting provenance in an interactive scripting environment. In: Workshop on the Theory and Practice of Provenance (TaPP), Cologne, Germany (2014)

    Google Scholar 

  12. Mattoso, M., et al.: Towards supporting the life cycle of large scale scientific experiments. Int. J. Bus. Process Integr. Manag. 5(1), 79–92 (2010)

    Article  Google Scholar 

  13. McPhillips, T., et al.: YesWorkflow: a user-oriented, language-independent tool for recovering workflow information from scripts. Int. J. Digit. Curation 10, 1 (2015)

    Article  Google Scholar 

  14. Murta, L., Braganholo, V., Chirigati, F., Koop, D., Freire, J.: noWorkflow: capturing and analyzing provenance of scripts. In: Ludaescher, B., Plale, B. (eds.) IPAW 2014. LNCS, vol. 8628, pp. 71–83. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  15. Murta, L.G.P., et al.: Odyssey-SCM: an integrated software configuration management infrastructure for UML models. Sci. Comput. Program. 65(3), 249–274 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  16. Pimentel, J.F., et al.: Fine-grained provenance collection over scripts through program slicing. In: Mattoso, M., Glavic, B. (eds.) IPAW 2016. LNCS, vol. 9672, pp. 199–203. Springer, Heidelberg (2016)

    Google Scholar 

  17. Pimentel, J.F.N., et al.: Collecting and analyzing provenance on interactive notebooks: when IPython meets noWorkflow. In: Workshop on the Theory and Practice of Provenance (TaPP), Edinburgh, Scotland (2015)

    Google Scholar 

  18. Stamatogiannakis, M., Groth, P., Bos, H.: Looking inside the black-box: capturing data provenance using dynamic instrumentation. In: Ludaescher, B., Plale, B. (eds.) IPAW 2014. LNCS, vol. 8628, pp. 155–167. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  19. Tariq, D., et al.: Towards automated collection of application-level data provenance. In: Workshop on the Theory and Practice of Provenance (TaPP), Boston, MA, USA (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to João Felipe Pimentel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Pimentel, J.F., Freire, J., Braganholo, V., Murta, L. (2016). Tracking and Analyzing the Evolution of Provenance from Scripts. In: Mattoso, M., Glavic, B. (eds) Provenance and Annotation of Data and Processes. IPAW 2016. Lecture Notes in Computer Science(), vol 9672. Springer, Cham. https://doi.org/10.1007/978-3-319-40593-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-40593-3_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-40592-6

  • Online ISBN: 978-3-319-40593-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics