Abstract
Many scientific experiments deal with data-intensive applications and the orchestration of computational workflow activities. These can benefit from data parallelism exploited in parallel systems to minimize execution time. Due to its complexity, robustness and efficiency to exploit data parallelism, grid infrastructures are widely used in some e-Science areas like bioinformatics. Workflow techniques are very important to in-silico bioinformatics experiments, allowing the e-scientist to describe and enact experimental process in a structured, repeatable and verifiable way. The main purpose of this paper is to describe our experience with Tavena Workbench and PeDRo, which are part of myGrid project. Taverna is provided with a workflow toolset and enactor, allowing the specification of processing units, data transfer and execution constraints. As a data entry tool, PeDRo provides a model, a controlled vocabulary and field validations for Web Services descriptions, leveraging the knowledge associated to the workflows. The main contribution of this work is a summary of some considerations drawn by our experience with the use of these tools, emphasizing its advantages and negative aspects, together with proposals for some future improvements.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
BioMart Project (2006), Available at http://www.biomart.org/
BioMOBY. Available at http://biomoby.open-bio.org/
BiowebDB. Available at http://www.biowebdb.org/index.html/
Business Process Execution Language for Web Service version 1.1 (Feb. 2005), http://www-128.ibm.com/developerworks/library/specification/ws-bpel/
Foster, I.: A Globus Primer (2005), Available at http://www.globus.org/toolkit/docs/4.0/key/
Foster, I., Kesselman, C.: The Grid: Blueprint for a new computing infrastructure. Morgan Kaufmann, San Francisco (1998)
Van Heijst, G., Schreiber, A., Wielinga, B.: Using explicit ontologies in KBS development. International Journal of Human-Computer Studies 46, 183–292 (1996)
Globus Toolkit. Available at http://www.globus.org/toolkit/
Gruber, T.: A translation approach to portable ontologies. Knowledge Acquisition 5(2), 199–220 (1993)
Guarino, N.: Formal Ontology and Information Systems. In: International Conference on Formal Ontologies in Information Systems (FOIS), Trento, Italy, June 1998, pp. 3–15 (1998)
Kaler, C., et al.: Web Services Security (WS-Security) (2002), Available at http://www-128.ibm.com/developerworks/webservices/library/ws-secure/
Oinn, T., et al.: Taverna: Lessons in creating a workflow environment for the life sciences. In: Concurrency and Computation: Practice and Experience, pp.2 (2002)
Oinn, T., et al.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics Journal 20(17), 3045–3054 (2004)
PeDRo, dynamic form generation, XML Schema, data validation, controlled vocabulary services... Manchester University (2004), Available at http://pedrodownload.man.ac.uk/main.html
Santos, R.T.: – “O Ambiente 10+C para a definição e execução de workflows in silico através de serviços web” – (In Portuguese). Master Thesis, COPPE/UFRJ (2004)
Schulze-Kremer, S.: Ontologies for Molecular Biology. In: Pacific Symposium on Biocomputing, pp. 693–704 (1998)
SeqHound. Available at http://www.blueprint.org/seqhound/
Silva, F., Cavalcanti, M.: Intermediate Data Management for In-Silico Workflows using Web Services. In: Workshop de Teses e Dissertações em Banco de Dados, Uberlândia, MG, Brazil (2005)
Stevens, R., Robinson, A., Goble, C.: myGrid: Personalized bioinformatics on the information grid. Bioinformatics 19(1), 302–304 (2003)
Taverna Project Website (2006), Available at http://taverna.sourceforge.net/
Wroe, C., et al.: Recycling Services and Workflows through Discovery and Reuse. In: Proc. UK e-Science All Hands Meeting 2004, pp. 622–629 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Ruberg, N. et al. (2007). Experiencing Data Grids. In: Daydé, M., Palma, J.M.L.M., Coutinho, Á.L.G.A., Pacitti, E., Lopes, J.C. (eds) High Performance Computing for Computational Science - VECPAR 2006. VECPAR 2006. Lecture Notes in Computer Science, vol 4395. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71351-7_56
Download citation
DOI: https://doi.org/10.1007/978-3-540-71351-7_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71350-0
Online ISBN: 978-3-540-71351-7
eBook Packages: Computer ScienceComputer Science (R0)