Abstract
The goal of the Encyclopedia of Life (EOL) Project is to predict structural information for all proteins, in all organisms. This calculation presents challenges both in terms of the scale of the computational resources required (approximately 1.8 million CPU hours), as well as in data and workflow management. While tools are available that solve some subsets of these problems, it was necessary for us to build software to integrate and manage the overall Grid application execution. In this paper, we present this workflow system, detail its components, and report on the performance of our initial prototype implementation for runs over a large-scale Grid platform during the SC’03 conference.
This research was supported in part by the National Science Foundation under the NPACI Cooperative Agreement No. ACI-9619020 and under award No. ACI-0086092. W.W. Li, is also supported in part by PRAGMA, funded by NSF Grant No. INT-0314015, and Systematic Protein Annotation and Modeling, funded by the National Institutes of Health (NIH) Grant No. GM63208-01A1S1.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abramson, D., Giddy, J., Kotler, L.: High Performance Parametric Modeling with Nimrod/G: Killer Application for the Global Grid? In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), Cancun, Mexico, pp. 520–528 (May 2000)
Agrawal, S., Dongarra, J., Seymour, K., Vadhiyar, S.: NetSolve: Past, Present, and Future - A Look at a Grid Enabled Server. In: Hey, A., Berman, F., Fox, G. (eds.) Grid Computing: Making The Global Infrastructure a Reality. John Wiley, Chichester (2003)
Allcock, W., Bester, J., Bresnahan, J., Chervenak, A., Liming, L., Tuecke, S.: GridFTP: Protocol Extension to FTP for the Grid, Grid Forum Internet-Draft (March 2001)
APST Homepage, http://grail.sdsc.edu/projects/apst
Baru, C., Rajasekar, R.A., Wan, M.: The SDSC Storage Resource Broker. In: Proceedings of the CASCON 1998 Conference (November 1998)
Beaumont, O., Legrand, A., Robert, Y.: Static scheduling strategies for heterogeneous systems. Technical Report LIP RR-2002-29, École Normale Supérieure, Laboratoire d’Informatique du Parallélisme (July 2002)
Berman, F., Fox, G., Hey, T. (eds.): Grid Computing: Making the Global Infrastructure a Reality. Wiley Publishers, Inc., Chichester (2003)
Berman, F., Wolski, R., Casanova, H., Cirne, W., Dail, H., Faerman, M., Figueira, S., Hayes, J., Obertelli, G., Schopf, J., Shao, G., Smallen, S., Spring, N., Su, A., Zagorodnov, D.: Adaptive Computing on the Grid Using AppLeS. IEEE Transactions on Parallel and Distributed Systems (TPDS) 14(4), 369–382 (2003)
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Research 28, 235–242 (2000)
Beynon, M., Kurc, T., Catalyurek, U., Chang, C., Sussman, A., Saltz, J.: Distributed Processing of Very Large Datasets with DataCutter. Parallel Computing 27(11), 1457–1478 (2001)
Braun, T.D., Hensgen, D., Freund, R.F., Siegel, H.J., Beck, N., Boloni, L.L., Maheswaran, M., Reuther, A., Robertson, J.P., Theys, M.D., Yao, B.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. Journal of Parallel and Distributed Computing 61(6), 810–837 (2001)
Buyya, R., Murshed, M., Abramson, D.: A Deadline and Budget Constrained Cost-Time Optimization Algorithm for Scheduling Task Farming Applications on Global Grids. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas (June 2002)
Casanova, H., Bartol, T., Stiles, J., Berman, F.: Distributing MCell Simulations on the Grid. International Journal of High Performance Computing Applications (IJHPCA)Â 14(3) (2001)
Casanova, H., Berman, F.: Parameter Sweeps on the Grid with APST. In: Berman, F., Fox, G., Hey, T. (eds.) Grid Computing: Making the Global Infrastructure a Reality. Wiley Publisher, Inc., Chichester (2002)
Casanova, H., Legrand, A., Zagorodnov, D., Berman, F.: Heuristics for Scheduling Parameter Sweep Applications in Grid Environments. In: Proceedings of the 9th Heterogeneous Computing Workshop (HCW 2000), Cancun, Mexico, pp. 349–363 (May 2000)
Condor Version 6.2.2 Manual, http://www.cs.wisc.edu/condor/manual/v6.2/
Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid Information Services for Distributed Resource Sharing. In: Proceedings of the 10th IEEE Symposium on High-Performance Distributed Computing, HPDC-10 (August 2001)
Dail, H., Berman, D., Casanova, H.: A Decoupled Scheduling Approach for Grid Application Development Environments. Journal of Parallel and Distributed Computing 63(5), 505–524 (2003)
EOL Homepage, http://eol.sdsc.edu/
Foster, I., Kesselman, C.: Globus: A Toolkit-Based Grid Architecture. In: Foster, I., Kesselman, C. (eds.) The Grid: Blueprint for a New Computing Infrastructure, pp. 259–278. Morgan Kaufmann, San Francisco (1999)
Foster, I., Kesselman, C. (eds.): Grid 2: Blueprint for a New Computing Infrastructure, 2nd edn. M. Kaufmann Publichers, Inc., San Francisco (2003)
Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of High Performance Computing Applications 15(3) (2001)
Ganglia, http://ganglia.sourceforge.net
Joint Center for Structural Genomics, http://www.jcsg.org
Kwok, Y., Ahmad, I.: Benchmarking and Comparison of Task Graph Scheduling Algorithms. Journal of Parallel and Distributed Computing 59(3), 318–422 (1999)
Li, W.W., Byrnes, R.W., Hayes, J., Birnbaum, A., Reyes, V.M., Shabab, A., Mosley, C., Perkurowsky, D., Quinn, G., Shindyalov, I., Casanova, H., Ang, L., Berman, F., Arzberger, P.W., Miller, M., Bourne, P.E.: The Encyclopedia of Life Project: Grid Software and Deployment. New Generation Computing (2004) (in press)
Li, W.W., Quinn, G.B., Alexandrov, N.N., Bourne, P.E., Shindyalov, I.N.: A comparative proteomics resource: proteins of Arabidopsis thaliana. Genome Biology 4(8), R51 (2003)
Marinescu, D.: A Grid Workflow Management Architecture. Global Grid Forum White Paper (August 2002)
National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/
Open grid service architecture, http://www.globus.org/ogsa/
Pinchak, C., Lu, P., Goldenberg, M.: Practical Heterogeneous Placeholder Scheduling in Overlay Metacomputers: Early Experiences. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 205–228. Springer, Heidelberg (2002)
Pacific Rim Applications and Grid Middleware Assembly, http://www.pragma-grid.net/
Sievert, O., Casanova, H.: Policies for Swapping MPI Processes. In: Proceedings of the 12th IEEE Symposium on High Performance and Distributed Computing (HPDC-12), Seattle (June 2003)
Subramani, V., Kettimuthu, R., Srinivasan, S., Sadayappan, P.: Distributed Job Scheduling on Computational Grids using Multiple Simultaneous Requests. In: Proceedings of the 11th IEEE Symposium on High Performance and Distributed Computing (HPDC-11), Edinburgh (2002)
Thain, D., Tannenbaum, T., Livny, M.: Condor and the Grid. In: Berman, F., Hey, A.J.G., Fox, G. (eds.) Grid Computing: Making The Global Infrastructure a Reality. John Wiley, Chichester (2003)
Vadhiyar, S., Dongarra, J.: A Performance Oriented Migration Framework for The Grid. In: Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid), Tokyo (May 2003)
Wolski, R., Spring, N., Hayes, J.: The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing. Future Generation Computer Systems 15(5-6), 757–768 (1999)
Yarkhan, A., Dongarra, J.: Experiments with Scheduling Using Simulated Annealing in a Grid Environment. In: Proceedings of the 3rd International Workshop on Grid Computing, Baltimore (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Birnbaum, A. et al. (2005). Grid Workflow Software for a High-Throughput Proteome Annotation Pipeline. In: Konagaya, A., Satou, K. (eds) Grid Computing in Life Science. LSGRID 2004. Lecture Notes in Computer Science(), vol 3370. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-32251-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-32251-1_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25208-5
Online ISBN: 978-3-540-32251-1
eBook Packages: Computer ScienceComputer Science (R0)