ABSTRACT
The workflow paradigm can provide the means to describe the complete functional pipeline for a scientific experiment and therefore expose the underlying scientific processes for enabling the reproducibility of results. However, current means for exposing such information are tied closely to the individual workflow engines and there is no existing method that provides a common way to share this information.
In this paper, we discuss a lightweight approach that can be used to expose such information, using the Open Archives Initiative Object Reuse and Exchange (ORE) standard, to provide a common format for representing and sharing workflows and their associated metadata required for their execution.
We describe how workflows can be mapped to the ORE format using RDF and how they can be stored using bundles for sharing with others. We discuss tooling we have developed that provides a mechanism for existing workflow engines to conveniently export workflows as ORE bundles. We present three use cases for Triana, ASKALON and MOTEUR, where such integration has already been undertaken, and conclude the paper by providing a short study showing that the overhead implications of adopting the proposed ORE bundling format are minimal.
- Open Archives Initiative. Object, Reuse and Exchange (ORE). http://www.openarchives.org/ore/, 2009.Google Scholar
- The SHaring Interoperable Workflows for large-scale scientific simulations on Available DCIs Project . http://www.shiwa-workflow.eu/.Google Scholar
- Resource Description Framework (RDF). http://www.w3.org/RDF/.Google Scholar
- Ian Taylor, Matthew Shields, Ian Wang, and Andrew Harrison. Visual Grid Workflow in Triana. Journal of Grid Computing, 3(3--4):153--169, September 2006.Google Scholar
- Andrew Harrison, Ian Taylor, Ian Wang, and Matthew Shields. WS-RF Workflow in Triana. International Journal of High Performance Computing Applications, 22(3):268--283, August 2008. Google ScholarDigital Library
- Tristan Glatard, Johan Montagnat, Diane Lingrand, and Xavier Pennec. Flexible and efficient workflow deployment of data-intensive applications on grids with moteur. Int. J. High Perform. Comput. Appl., 22:347--360, August 2008. Google ScholarDigital Library
- Marek Wieczorek, Radu Prodan, and Thomas Fahringer. Scheduling of scientific workflows in the askalon grid environment. SIGMOD Rec., 34:56--62, September 2005. Google ScholarDigital Library
- I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludascher, and S. Mock. Kepler: An Extensible System for Design and Execution of Scientific Workflows. In 16th International Conference on Scientific and Statistical Database Management (SSDBM), pages 423--424. IEEE Computer Society, New York, 2004. Google ScholarDigital Library
- E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, G. B. Berriman, J. Good, A. Laity, J. C. Jacob, and D.S. Katz. Pegasus: a Framework for Mapping Complex Scientific Workflows onto Distributed Systems. Scientific Programming Journal, 13(3):219--237, 2005. Google ScholarDigital Library
- Peter Kacsuk. P-grade portal family for grid infrastructures. Concurr. Comput. : Pract. Exper., 23:235--245, March 2011. Google ScholarDigital Library
- Tom Oinn, Matthew Addis, Justin Ferris, Darren Marvin, Martin Senger, Mark Greenwood, Tim Carver, Kevin Glover, Matthew R. Pocock, Anil Wipat, and Peter Li. Taverna: A Tool for the Composition and Enactment of Bioinformatics Workflows. Bioinformatics, 20(17):3045--3054, November 2004. Google ScholarDigital Library
- Roger Barga, Jared Jackson, Nelson Araujo, Dean Guo, Nitin Gautam, and Yogesh Simmhan. The trident scientific workflow workbench. In Proceedings of the 2008 Fourth IEEE International Conference on eScience, pages 317--318, Washington, DC, USA, 2008. IEEE Computer Society. Google ScholarDigital Library
- Andrew Harrison and Ian Taylor. Web enabling desktop workflow applications. In Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, 2009. Google ScholarDigital Library
- Sean Bechhofer, John Ainsworth, Jitenkumar Bhagat, Iain Buchan, Phillip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Carole Goble, Danius Michaelides, Paolo Missier, Stuart Owen, David Newman, David De Roure, and Shoaib Sufi. Why linked data is not enough for scientists. In Sixth IEEE e-Science conference (e-Science 2010), August 2010. Google ScholarDigital Library
- David De Roure, Carole Goble, and Robert Stevens. Designing the myexperiment virtual research environment for the social sharing of workflows. In Proceedings of the Third IEEE International Conference on e-Science and Grid Computing, pages 603--610, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
- ST. Peltier, AW. Lin, D. Lee, S. Mock, S. Lamont, T. Molina, M. Wong, ME. Martone, and MH. Ellisman. The Telescience Portal for Advanced Tomography Applications. Journal of Parallel and Distributed Applications, 63(5):539--550, 2003. Google ScholarDigital Library
- Paul Groth, Ewa Deelman, Gideon Juve, Gaurang Mehta, and Bruce Berriman. Pipeline-centric provenance model. In Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, WORKS '09, pages 4:1--4:8, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- Attila Kertész, Gergely Sipos, and Péter Kacsuk. Brokering Multi-grid Workflows in the P-GRADE Portal. In Euro-Par 2006: Parallel Processing, volume 4375, pages 138--149. Springer, Berlin, 2007. Google ScholarDigital Library
- Dublin Core Metadata Initiative (DCMI). http://dublincore.org/.Google Scholar
- Friend of a Friend (FOAF). http://www.foaf-project.org/.Google Scholar
- Simple Knowledge Organization System (SKOS), 2009. http://www.w3.org/TR/skos-reference/.Google Scholar
- SHIWA Desktop - D5.3. http://www.shiwa-workflow.eu/documents/10753/8bc729cf-34ac-4bfe-bb96--9c%e8ebf9f8ca.Google Scholar
Index Terms
- Object reuse and exchange for publishing and sharing workflows
Recommendations
Web enabling desktop workflow applications
WORKS '09: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale ScienceWHIP (Workflows Hosted In Portals) is a project aimed at bridging the gap between eScience portals and desktop-based workflow applications by enabling workflow interactions to be modeled using ubiquitous Web technologies. Specifically, WHIP comprises ...
A new approach for publishing workflows: abstractions, standards, and linked data
WORKS '11: Proceedings of the 6th workshop on Workflows in support of large-scale scienceIn recent years, a variety of systems have been developed that export the workflows used to analyze data and make them part of published articles. We argue that the workflows that are published in current approaches are dependent on the specific codes ...
A framework for the design and reuse of grid workflows
SAG'04: Proceedings of the First international conference on Scientific Applications of Grid ComputingGrid workflows can be seen as special scientific workflows involving high performance and/or high throughput computational tasks. Much work in grid workflows has focused on improving application performance through schedulers that optimize the use of ...
Comments