ABSTRACT
The value of workflows to the scientific community spans over time and space. Not only results but also performance and resource consumption of a workflow need to be replayed over time and in varying environments. Achieving such repeatability in practice is challenging due to changes in software and infrastructure over time. In this work, we introduce a new abstraction that builds on the concept of virtual appliance to enable workflow repeatability. We have also developed a novel architecture to leverage this abstraction and realized it into a system implementation that supports a popular workflow management system and builds on a federated in-production environment. We demonstrate the effectiveness of our approach by examining various aspects of workflow repeatability. Our results show that workflows can be replayed with 2% fidelity when considering their walltime as performance metric.
- Montage. http://montage.ipac.caltech.edu/docs/grid.html.Google Scholar
- Network descriptive language. http://en.wikipedia.org/wiki/Network_Description_Language.Google Scholar
- UC Davis, UC Santa Barbara, and UC San Diego. https://kepler-project.org/.Google Scholar
- S. Bechhofer, J. Ainsworth, J. Bhagat, I. Buchan, P. Couch, D. Cruickshank, D. D. Roure, M. Delderfield, I. Dunlop, M. Gamble, C. Goble, D. Michaelides, P. Missier, S. Owen, D. Newman, and S. Sufi. Why Linked Data Is Not Enough for Scientists. In e-Science (e-Science), 2010 IEEE Sixth International Conference on, pages 300--307, 2010. Google ScholarDigital Library
- K. Belhajjame, C. Goble, S. Soiland-Reyes, and D. De Roure. Fostering Scientific Workflow Preservation through Discovery of Substitute Services. In E-Science (e-Science), 2011 IEEE 7th International Conference on, pages 97--104, 2011. Google ScholarDigital Library
- Shishir Bharathi, Ann Chervenak, Ewa Deelman, Gaurang Mehta, Mei-Hui Su, and Karan Vahi. Characterization of Scientific Workflows. In Workflows in Support of Large-Scale Science, 2008. WORKS 2008. Third Workshop on, pages 1--10. IEEE, 2008.Google ScholarCross Ref
- Jeff Chase, Laura Grit, David Irwin, Varun Marupadi, Piyush Shivam, and Aydan Yumerefendi. Beyond Virtual Data Centers: Toward An Open Resource Control Architecture. In in Selected Papers from the International Conference on the Virtual Computing Initiative (ACM Digital Library), ACM, 2007.Google Scholar
- Ewa Deelman, Gurmeet Singh, Mei-Hui Su, James Blythe, Yolanda Gil, Carl Kesselman, Gaurang Mehta, Karan Vahi, G. Bruce Berriman, John Good, Anastasia Laity, Joseph C. Jacob, and Daniel S. Katz. Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems. Scientific Programming, 13(3):219--237, 2005. Google ScholarDigital Library
- ExoGENI. http://www.exogeni.net/.Google Scholar
- Juliana Freire, Philippe Bonnet, and Dennis Shasha. Computational Reproducibility: State-of-the-art, Challenges, and Database Research Opportunities. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12, pages 593--596, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- Ian P. Gent. The Recomputation Manifesto. http://arxiv.org/abs/1304.3674, April 2013.Google Scholar
- Y. Gil, E. Deelman, M. Ellisman, T. Fahringer, G. Fox, D. Gannon, C. Goble, M. Livny, L. Moreau, and J. Myers. Examining the Challenges of Scientific Workflows. Computer, 40(12):24--32, 2007. Google ScholarDigital Library
- Michael Litzkow, Miron Livny, and Matthew Mutka. Condor - a hunter of idle workstations. In Proceedings of the 8th International Conference of Distributed Computing Systems, June 1988.Google ScholarCross Ref
- Anirban Mandal, Paul Ruth, Ilya Baldin, Yufeng Xin, Claris Castillo, Gideon Juve, Mats Rynge, Ewa Deelman, and Jeff Chase. Tr-15-01: Adapting Scientific Workflows on Networked Clouds Using Proactive Introspection. Technical Report TR-15-01, Renaissance Computing Institute (RENCI), 2015.Google Scholar
- MongoDB. https://www.mongodb.org/.Google Scholar
- Ahalt S. Berg J. Coyle J. Evans J. Fecho K. Gillis D. Schmitt C. Young D. Owen, P. and K. Wilhelmsen. Technologies for Genomic Medicine: The GMW, A Genetic Medical Workflow Engine. 2014.Google Scholar
- RabbitMQ. http://www.rabbitmq.com/.Google Scholar
- David De Roure, Carole Goble, and Robert Stevens. The Design and Realisation of the Virtual Research Environment for Social Sharing of Workflows. Future Generation Computer Systems, 25(5):561--567, 2009. Google ScholarDigital Library
- Idafen Santana-Perez, Rafael Ferreira da Silva, Mats Rynge, Ewa Deelman, MarÃņaS. PÃl'rez-HernÃąndez, and Oscar Corcho. A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study. In Euro-Par 2014: Parallel Processing Workshops, volume 8805 of Lecture Notes in Computer Science, pages 452--463. Springer International Publishing, 2014.Google Scholar
- Constantine Sapuntzakis and Monica S. Lam. Virtual Appliances in the Collective: A Road to Hassle-Free Computing. In Proceedings of the 9th Conference on Hot Topics in Operating Systems - Volume 9, HOTOS'03, pages 10--10, Berkeley, CA, USA, 2003. USENIX Association. Google ScholarDigital Library
- Victoria Stodden, Freidrich Leisch, and Roger D. Peng. Implementing Reproducible Research, chapter 10: Reproducibility, Virtual Appliances, and Cloud Computing, pages 282--295. CRC Press, 2014.Google Scholar
- Indiana University. FutureGrid. https://portal.futuregrid.org/.Google Scholar
- Katherine Wolstencroft, Robert Haines, Donal Fellows, Alan Williams, David Withers, Stuart Owen, Stian Soiland-Reyes, Ian Dunlop, Aleksandra Nenadic, Paul Fisher, Jiten Bhagat, Khalid Belhajjame, Finn Bacall, Alex Hardisty, Abraham Nieva de la Hidalga, Maria P. Balcazar Vargas, Shoaib Sufi, and Carole Goble. The Taverna Workflow Suite: Designing and Executing Workflows of Web Services on the Desktop, Web or in the Cloud. Nucleic Acids Research, 41(W1):W557--W561, 2013.Google Scholar
- Jun Zhao, J. M. Gomez-Perez, K. Belhajjame, G. Klyne, E. Garcia-Cuesta, A. Garrido, K. Hettne, M. Roos, D. De Roure, and C. Goble. Why Workflows Break? Understanding and Combating Decay in Taverna Workflows. In E-Science (e-Science), 2012 IEEE 8th International Conference on, pages 1--9, 2012. Google ScholarDigital Library
Recommendations
Scientific Workflow Repeatability through Cloud-Aware Provenance
UCC '14: Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud ComputingThe transformations, analyses and interpretations of data in scientific workflows are vital for the repeatability and reliability of scientific workflows. This provenance of scientific workflows has been effectively carried out in Grid based scientific ...
The Grid Resource Broker workflow engine
2nd International Workshop on Workflow Management and Applications in Grid Environments (WaGe2007)Increasingly, complex scientific applications are structured in terms of workflows. These applications are usually computationally and-or data intensive and thus are well suited for execution in grid environments. Distributed, geographically spread ...
Design and implementation of a workflow-based resource broker with information system on computational grids
The grid is a promising infrastructure that can allow scientists and engineers to access resources among geographically distributed environments. Grid computing is a new technology which focuses on aggregating resources (e.g., processor cycles, disk ...
Comments