Abstract
Large scientific experiments and simulations produce vast quantities of data. Though smaller in volume, the corresponding metadata describing the production, pedigree, and ontology, is just as important as the raw data to the scientific discovery process. Driven by the application needs of a number of large-scale distributed workflows, we develop a metadata capturing and analysis system called MPO (short for Metadata, Provenance, Ontology). It seamlessly integrates with most data analysis environments and requires a minimal amount of changes to users’ existing analysis programs. Users have the full control of how to instrument their programs to capture as much or as little information as they desire. Once captured in a database system, the workflows can be visualized and studied through a set of web-based tools. In large scientific collaborations where the workflows have been built up over decades, this ability to instrument the complex existing workflows and visualize the key interactions among the software components is tremendously useful.
This work was supported by the US DOE, Office of Advanced Scientific Computing Research and the Office of Fusion Energy Sciences under DE-SC0008697, DEAC02-05CH11231, and DE-SC0008736.
The rights of this work are transferred to the extent transferable according to title 17 § 105 U.S.C.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
MPO project documentation and software are available at <https://mpo.psfc.mit.edu/>.
- 2.
SQLAlchemy is available at http://www.sqlalchemy.org/.
References
Marinho, A., et al.: ProvManager: a provenance management system for scientific workflows. Concurr. Comput. Pract. Exp. 24(13), 1513–1530 (2012)
Kondylakis, H., Plexousakis, D.: Ontology evolution without tears. Web Semant.: Sci. Serv. Agents World Wide Web 19, 42–58 (2013)
Altintas, I., Barney, O., Jaeger-Frank, E.: Provenance collection support in the Kepler scientific workflow system. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 118–132. Springer, Heidelberg (2006)
Davidson, S.B., et al.: Provenance in scientific workflow systems. IEEE Data Eng. Bull. 30(4), 44–50 (2007)
Schissel, D.P., et al.: Automated metadata, provenance cataloging and navigable interfaces: ensuring the usefulness of extreme-scale data. Fusion Eng. Des. 89(5), 745–749 (2014)
Wright, J.C., et al.: The MPO API: a tool for recording scientific workflows. Fusion Eng. Design 89(5), 754–757 (2014)
Greenwald, M., et al.: A metadata catalog for organization and systemization of fusion simulation data. Fusion Eng. Design 87(12), 2205–2208 (2012)
Abla, G., et al.: The MPO System for Automatic Workflow Documentation. Fusion Engineering and Design (2016 to appear)
Richardson, L., Ruby, S.: RESTful Web Services. O’Reilly Media, Sebastopol (2008)
Fielding, R.T., Taylor, R.N.: Principled design of the modern Web architecture. ACM Trans. Internet Technol. 2(2), 115–150 (2002)
Stillerman, J., et al.: MDSplus data acquisition system. Rev. Sci. Instrum. 68(1), 939–942 (1997)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland (outside the US)
About this paper
Cite this paper
Wu, K. et al. (2016). MPO: A System to Document and Analyze Distributed Heterogeneous Workflows. In: Mattoso, M., Glavic, B. (eds) Provenance and Annotation of Data and Processes. IPAW 2016. Lecture Notes in Computer Science(), vol 9672. Springer, Cham. https://doi.org/10.1007/978-3-319-40593-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-40593-3_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40592-6
Online ISBN: 978-3-319-40593-3
eBook Packages: Computer ScienceComputer Science (R0)