Skip to main content

Grid Workflow Software for a High-Throughput Proteome Annotation Pipeline

  • Conference paper
Grid Computing in Life Science (LSGRID 2004)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3370))

Included in the following conference series:

Abstract

The goal of the Encyclopedia of Life (EOL) Project is to predict structural information for all proteins, in all organisms. This calculation presents challenges both in terms of the scale of the computational resources required (approximately 1.8 million CPU hours), as well as in data and workflow management. While tools are available that solve some subsets of these problems, it was necessary for us to build software to integrate and manage the overall Grid application execution. In this paper, we present this workflow system, detail its components, and report on the performance of our initial prototype implementation for runs over a large-scale Grid platform during the SC’03 conference.

This research was supported in part by the National Science Foundation under the NPACI Cooperative Agreement No. ACI-9619020 and under award No. ACI-0086092. W.W. Li, is also supported in part by PRAGMA, funded by NSF Grant No. INT-0314015, and Systematic Protein Annotation and Modeling, funded by the National Institutes of Health (NIH) Grant No. GM63208-01A1S1.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abramson, D., Giddy, J., Kotler, L.: High Performance Parametric Modeling with Nimrod/G: Killer Application for the Global Grid? In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), Cancun, Mexico, pp. 520–528 (May 2000)

    Google Scholar 

  2. Agrawal, S., Dongarra, J., Seymour, K., Vadhiyar, S.: NetSolve: Past, Present, and Future - A Look at a Grid Enabled Server. In: Hey, A., Berman, F., Fox, G. (eds.) Grid Computing: Making The Global Infrastructure a Reality. John Wiley, Chichester (2003)

    Google Scholar 

  3. Allcock, W., Bester, J., Bresnahan, J., Chervenak, A., Liming, L., Tuecke, S.: GridFTP: Protocol Extension to FTP for the Grid, Grid Forum Internet-Draft (March 2001)

    Google Scholar 

  4. APST Homepage, http://grail.sdsc.edu/projects/apst

  5. Baru, C., Rajasekar, R.A., Wan, M.: The SDSC Storage Resource Broker. In: Proceedings of the CASCON 1998 Conference (November 1998)

    Google Scholar 

  6. Beaumont, O., Legrand, A., Robert, Y.: Static scheduling strategies for heterogeneous systems. Technical Report LIP RR-2002-29, École Normale Supérieure, Laboratoire d’Informatique du Parallélisme (July 2002)

    Google Scholar 

  7. Berman, F., Fox, G., Hey, T. (eds.): Grid Computing: Making the Global Infrastructure a Reality. Wiley Publishers, Inc., Chichester (2003)

    Google Scholar 

  8. Berman, F., Wolski, R., Casanova, H., Cirne, W., Dail, H., Faerman, M., Figueira, S., Hayes, J., Obertelli, G., Schopf, J., Shao, G., Smallen, S., Spring, N., Su, A., Zagorodnov, D.: Adaptive Computing on the Grid Using AppLeS. IEEE Transactions on Parallel and Distributed Systems (TPDS) 14(4), 369–382 (2003)

    Article  Google Scholar 

  9. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Research 28, 235–242 (2000)

    Article  Google Scholar 

  10. Beynon, M., Kurc, T., Catalyurek, U., Chang, C., Sussman, A., Saltz, J.: Distributed Processing of Very Large Datasets with DataCutter. Parallel Computing 27(11), 1457–1478 (2001)

    Article  MATH  Google Scholar 

  11. Braun, T.D., Hensgen, D., Freund, R.F., Siegel, H.J., Beck, N., Boloni, L.L., Maheswaran, M., Reuther, A., Robertson, J.P., Theys, M.D., Yao, B.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. Journal of Parallel and Distributed Computing 61(6), 810–837 (2001)

    Article  Google Scholar 

  12. Buyya, R., Murshed, M., Abramson, D.: A Deadline and Budget Constrained Cost-Time Optimization Algorithm for Scheduling Task Farming Applications on Global Grids. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas (June 2002)

    Google Scholar 

  13. Casanova, H., Bartol, T., Stiles, J., Berman, F.: Distributing MCell Simulations on the Grid. International Journal of High Performance Computing Applications (IJHPCA) 14(3) (2001)

    Google Scholar 

  14. Casanova, H., Berman, F.: Parameter Sweeps on the Grid with APST. In: Berman, F., Fox, G., Hey, T. (eds.) Grid Computing: Making the Global Infrastructure a Reality. Wiley Publisher, Inc., Chichester (2002)

    Google Scholar 

  15. Casanova, H., Legrand, A., Zagorodnov, D., Berman, F.: Heuristics for Scheduling Parameter Sweep Applications in Grid Environments. In: Proceedings of the 9th Heterogeneous Computing Workshop (HCW 2000), Cancun, Mexico, pp. 349–363 (May 2000)

    Google Scholar 

  16. Condor Version 6.2.2 Manual, http://www.cs.wisc.edu/condor/manual/v6.2/

  17. Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid Information Services for Distributed Resource Sharing. In: Proceedings of the 10th IEEE Symposium on High-Performance Distributed Computing, HPDC-10 (August 2001)

    Google Scholar 

  18. Dail, H., Berman, D., Casanova, H.: A Decoupled Scheduling Approach for Grid Application Development Environments. Journal of Parallel and Distributed Computing 63(5), 505–524 (2003)

    Article  MATH  Google Scholar 

  19. Elagi, http://grail.sdsc.edu/projects/elagi/

  20. EOL Homepage, http://eol.sdsc.edu/

  21. Foster, I., Kesselman, C.: Globus: A Toolkit-Based Grid Architecture. In: Foster, I., Kesselman, C. (eds.) The Grid: Blueprint for a New Computing Infrastructure, pp. 259–278. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  22. Foster, I., Kesselman, C. (eds.): Grid 2: Blueprint for a New Computing Infrastructure, 2nd edn. M. Kaufmann Publichers, Inc., San Francisco (2003)

    Google Scholar 

  23. Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of High Performance Computing Applications 15(3) (2001)

    Google Scholar 

  24. Ganglia, http://ganglia.sourceforge.net

  25. Joint Center for Structural Genomics, http://www.jcsg.org

  26. Kwok, Y., Ahmad, I.: Benchmarking and Comparison of Task Graph Scheduling Algorithms. Journal of Parallel and Distributed Computing 59(3), 318–422 (1999)

    Article  Google Scholar 

  27. Li, W.W., Byrnes, R.W., Hayes, J., Birnbaum, A., Reyes, V.M., Shabab, A., Mosley, C., Perkurowsky, D., Quinn, G., Shindyalov, I., Casanova, H., Ang, L., Berman, F., Arzberger, P.W., Miller, M., Bourne, P.E.: The Encyclopedia of Life Project: Grid Software and Deployment. New Generation Computing (2004) (in press)

    Google Scholar 

  28. Li, W.W., Quinn, G.B., Alexandrov, N.N., Bourne, P.E., Shindyalov, I.N.: A comparative proteomics resource: proteins of Arabidopsis thaliana. Genome Biology 4(8), R51 (2003)

    Article  Google Scholar 

  29. Marinescu, D.: A Grid Workflow Management Architecture. Global Grid Forum White Paper (August 2002)

    Google Scholar 

  30. National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/

  31. Open grid service architecture, http://www.globus.org/ogsa/

  32. Pinchak, C., Lu, P., Goldenberg, M.: Practical Heterogeneous Placeholder Scheduling in Overlay Metacomputers: Early Experiences. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 205–228. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  33. Pacific Rim Applications and Grid Middleware Assembly, http://www.pragma-grid.net/

  34. Sievert, O., Casanova, H.: Policies for Swapping MPI Processes. In: Proceedings of the 12th IEEE Symposium on High Performance and Distributed Computing (HPDC-12), Seattle (June 2003)

    Google Scholar 

  35. Subramani, V., Kettimuthu, R., Srinivasan, S., Sadayappan, P.: Distributed Job Scheduling on Computational Grids using Multiple Simultaneous Requests. In: Proceedings of the 11th IEEE Symposium on High Performance and Distributed Computing (HPDC-11), Edinburgh (2002)

    Google Scholar 

  36. Thain, D., Tannenbaum, T., Livny, M.: Condor and the Grid. In: Berman, F., Hey, A.J.G., Fox, G. (eds.) Grid Computing: Making The Global Infrastructure a Reality. John Wiley, Chichester (2003)

    Google Scholar 

  37. Vadhiyar, S., Dongarra, J.: A Performance Oriented Migration Framework for The Grid. In: Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid), Tokyo (May 2003)

    Google Scholar 

  38. Wolski, R., Spring, N., Hayes, J.: The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing. Future Generation Computer Systems 15(5-6), 757–768 (1999)

    Article  Google Scholar 

  39. Yarkhan, A., Dongarra, J.: Experiments with Scheduling Using Simulated Annealing in a Grid Environment. In: Proceedings of the 3rd International Workshop on Grid Computing, Baltimore (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Birnbaum, A. et al. (2005). Grid Workflow Software for a High-Throughput Proteome Annotation Pipeline. In: Konagaya, A., Satou, K. (eds) Grid Computing in Life Science. LSGRID 2004. Lecture Notes in Computer Science(), vol 3370. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-32251-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-32251-1_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25208-5

  • Online ISBN: 978-3-540-32251-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics