Abstract
Collecting provenance from scripts is often useful for scientists to explain and reproduce their scientific experiments. However, most existing automatic approaches capture provenance at coarse-grain, for example, the trace of user-defined functions. These approaches lack information of variable dependencies. Without this information, users may struggle to identify which functions really influenced the results, leading to the creation of false-positive provenance links. To address this problem, we propose an approach that uses dynamic program slicing for gathering provenance of Python scripts. By capturing dependencies among variables, it is possible to expose execution paths inside functions and, consequently, to create a provenance graph that accurately represents the function activations and the results they affect.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agrawal, H., Horgan, J.R.: Dynamic program slicing. In: Conference on Programming Language Design and Implementation, pp. 246−256. ACM, New York, NY, USA (1990)
Angelino, E., Yamins, D., Seltzer, M.: StarFlow: a script-centric data analysis environment. In: McGuinness, D.L., Michaelis, J.R., Moreau, L. (eds.) IPAW 2010. LNCS, vol. 6378, pp. 236–250. Springer, Heidelberg (2010)
Chen, Z., et al.: Dynamic slicing of Python programs. In: Annual Conference on Computer Software and Applications (COMPSAC), pp. 219−228 (2014)
Lerner, B.S., Boose, E.R.: Collecting provenance in an interactive scripting environment. In: Workshop on the Theory and Practice of Provenance (TaPP), Cologne, Germany (2014)
Murta, L., Braganholo, V., Chirigati, F., Koop, D., Freire, J.: noWorkflow: capturing and analyzing provenance of scripts. In: Ludaescher, B., Plale, B. (eds.) IPAW 2014. LNCS, vol. 8628, pp. 71–83. Springer, Heidelberg (2015)
Pimentel, J.F., et al.: Tracking and analyzing the evolution of provenance from scripts. In: Mattoso, M., Glavic, B. (eds.) IPAW 2016. LNCS, vol. 9672, pp. 16–28. Springer, Heidelberg (2016)
Pimentel, J.F.N., et al.: Collecting and analyzing provenance on interactive notebooks: when IPython meets noWorkflow. In: Workshop on the Theory and Practice of Provenance (TaPP), Edinburgh, Scotland (2015)
Porges, A.: A set of eight numbers. Am. Math. Mon. 52(7), 379–382 (1945)
Tariq, D. et al.: Towards automated collection of application-level data provenance. In: Workshop on the Theory and Practice of Provenance (TaPP), Boston, MA, USA (2012)
Weiser, M.: Program slicing. In: International Conference on Software Engineering (ICSE), pp. 439–449. IEEE Press, Piscataway, NJ, USA (1981)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Pimentel, J.F., Freire, J., Murta, L., Braganholo, V. (2016). Fine-Grained Provenance Collection over Scripts Through Program Slicing. In: Mattoso, M., Glavic, B. (eds) Provenance and Annotation of Data and Processes. IPAW 2016. Lecture Notes in Computer Science(), vol 9672. Springer, Cham. https://doi.org/10.1007/978-3-319-40593-3_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-40593-3_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40592-6
Online ISBN: 978-3-319-40593-3
eBook Packages: Computer ScienceComputer Science (R0)