skip to main content
10.1145/2616498.2616551acmotherconferencesArticle/Chapter ViewAbstractPublication PagesxsedeConference Proceedingsconference-collections
research-article

Evaluating Distributed Platforms for Protein-Guided Scientific Workflow

Published: 13 July 2014 Publication History

Abstract

Complex and large-scale applications in different scientific disciplines are often represented as a set of independent tasks, known as workflows. Many scientific workflows have intensive resource requirements. Therefore, different distributed platforms, including campus clusters, grids and clouds are used for efficient execution of these workflows. In this paper we examine the performance and the cost of running the Pegasus Workflow Management System (Pegasus WMS) implementation of blast2cap3, the protein-guided assembly approach, on three different execution platforms: Sandhills, the University of Nebraska Campus Cluster, the academic grid Open Science Gird (OSG), and the commercial cloud Amazon EC2. Furthermore, the behavior of the blast2cap3 workflow was tested with different number of tasks. For the used workflows and execution platforms, we perform multiple runs in order to compare the total workflow running time, as well as the different resource availability over time. Additionally, for the most interesting runs, the number of running versus the number of idle jobs over time was analyzed for each platform. The performed experiments show that using the Pegasus WMS implementation of blast2cap3 with more than 100 tasks significantly reduces the running time for all execution platforms. In general, for our workflow, better performance and resource usage were achieved when Amazon EC2 was used as an execution platform. However, due to the Amazon EC2 cost, the academic distributed systems can sometimes be a good alternative and have excellent performance, especially when there are plenty of resources available.

References

[1]
E. Deelman, J. Blythe, Y. Gil, C. Kesselman, "Pegasus: Planning for Execution in Grids," GriPhyN technical report 20(17):12--22.
[2]
P. Couvares, T. Kosar, A. Roy, Jeff Weber, K. Wenger, "Workflow in Condor," Workflows for e-Science, Editors: I.Taylor, E.Deelman, D.Gannon, M.Shields, Springer Press, January 2007 (ISBN: 1-84628-519-4).
[3]
E. Deelman, G. Singha, M. Sua, J. Blythea, Y. Gila, C. Kesselmana, G. Mehtaa, K. Vahia, G. Berrimanb, J. Goodb, A. Laityb, J. Jacobc, D. Katzc, "Pegasus: a Framework for Mapping Complex Scientific Workflows onto Distributed Systems," Scientific Programming Journal, Vol 13(3), pages 219--237, 2005.
[4]
V. Curcin, M. Ghanem, Y. Guo, M. Kohler, A. Rowe, J. Syed, P. Wendel, "Discovery Net: towards a grid of knowledge discovery," KDD'02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. August, 2002.
[5]
L. Bavoil, S. P. Callahan, P. J. Crossno, J. Freire, C. E. Scheidegger, C. T. Silva, H. T. Vo, "VisTrails: Enabling Interactive Multiple-View Visualizations." Proceedings of IEEE Visualization, pp. 135--142, 2005.
[6]
C. Berkley, E. Jaeger, M. Jones, B. Ludäscher. S. Mock S, "Kepler: An Extensible System for Design and Execution of Scientific Workflows," Proceedings of the The Future of Grid Data Environments, Global Grid Forum 10, 2004.
[7]
T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood, T. Carver, K. Glover, M. R. Pocock, A. Wipat. P. Li, "Taverna: A tool for the composition and enactment of bioinformatics workflows," Bioinformatics 20 (17): 3045--3054. PMID 15201187, 2004.
[8]
J. Elhai, A. Taton, J. Massar, J. K. Myers, M. Travers, J. Casey, M. Slupesky, J. Shrager, "BioBIKE: A Web-based, programmable, integrated biological knowledge base," Nucleic Acids Research 37 (Web Server issue): W28--W32. PMC 2703918. PMID 19433511, 2009.
[9]
R. Pordes et al. "The Open Science Grid," J. Phys. Conf. Ser. 78, 012057. 2007.
[10]
Extreme Science and Engineering Discovery Environment (XSEDE). {http://www.xsede.org}.
[11]
Amazon Elastic Compute Cloud (EC2). {http://aws.amazon.com/ec2}.
[12]
FutureGrid. {http://futuregrid.org/}.
[13]
Nimbus Platform. {http://www.nimbusproject.org/}.
[14]
Eucalyptus, Open Source AWS Compatible Private Clouds. {https://www.eucalyptus.com/}.
[15]
J. --S. Vockler, G. Juve, E. Deelman, M. Rynge, G. B. Berriman, "Experiences Using Cloud Computing for A Scientific Workflow Application," Workshop on Scientific Cloud Computing (ScienceCloud), June 2011.
[16]
N. Pavlovikj, K. Begcy, S. Behera, M. Campbell, H. Walia, J. S. Deogun, "A Comparison of a Campus Cluster and Open Science Grid Platforms for Protein-Guided Assembly using Pegasus Workflow Management System," 28th IEEE International Parallel and Distributed Processing Symposium: Workshop on High Performance Computational Biology, May 2014.
[17]
Z. Wang, M. Gerstein, M. Snyder, "RNA-Seq: a revolutionary tool for transcriptomics," Nature Reviews Genetics 10 (1): 57--63. PMC 2949280. PMID 19015660, 2009.
[18]
D. R. Zerbino, E. Birney, "Velvet: algorithms for de novo short read assembly using de Bruijn graphs," Genome Research 18:821--829.
[19]
H. Xiaoqiu, M. Anup, "CAP3: A DNA Sequence Assembly Program," Genome Res. 1999 September; 9(9): 868--877.
[20]
K. Krasileva, V. Buffalo, P. Bailey, S. Pearce, S. Ayling, F. Tabbita, M. Soria, S. Wang, IWGS Consortium, E. Akhunov, C. Uauy, J. Dubcovsky, "Separating homeologs by phasing in the tetraploid wheat transcriptome," Genome Biology 2013, 14:R66
[21]
S. Altschul, W. Gish, W. Miller, E. Myers, D. Lipman, "Basic local alignment search tool," J Mol Biol 1990, 215:403--410.
[22]
Buffalo V: Blast2cap3 software. {https://github.com/vsbuffalo/blast2cap3/}.
[23]
G. Singh, M. --H. Su, K. Vahi, E. Deelman, B. Berriman, J. Good, D. S. Katz, G. Mehta, "Workflow Task Clustering for Best Effort Systems with Pegasus," Mardi Gras Conference, Baton Rouge, LA, January 2008.
[24]
Python Programming Language. {http://www.python.org/}.
[25]
Biopython. {http://biopython.org/}.
[26]
P. Mhashilkar, A. Tiradani, B. Holzman, K. Larson, I. Sfiligoi, M. Rynge, "Cloud Bursting with GlideinWMS: Means to satisfy ever increasing computing needs for Scientific Workflows," 20th International Conference on Computing on High Energy and Nuclear Physics (CHEP), October 2013.
[27]
Pegasus 4.3 User Guide. {https://pegasus.isi.edu/wms/docs/latest/pegasus-user-guide.pdf/}.
[28]
Sandhills UNL HPC Cluster. {http://hcc.unl.edu/sandhills/}.
[29]
I. Sfiligoi, F. Würthwein, W. Andrews, J. M. Dost, I. MacNeill, A. McCrea, E. Sheripon, C. W. Murphy, "Operating a production pilot factory serving several scientific domains," J. Phys.: Conf. Ser. 331, 072031.
[30]
B. Darrow, "Cycle Computing spins up 50K core Amazon cluster," GigaOm, 2012.
[31]
Z. Ou, H. Zhuang, J. K. Nurminen, A. Ylä-Jääski, P. Hui, "Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2," HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Computing.
[32]
NCBI BioProjects. {http://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA191053/}.
[33]
S. Buyske, K. Vahi, E. Deelman, U. Peters, T. Matise, "Conducting Large-Scale Imputation Studies on the Cloud," ASHG 2013, Boston, Masachuessets, 2013.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
XSEDE '14: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment
July 2014
445 pages
ISBN:9781450328937
DOI:10.1145/2616498
  • General Chair:
  • Scott Lathrop,
  • Program Chair:
  • Jay Alameda
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • NSF: National Science Foundation
  • Drexel University
  • Indiana University: Indiana University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Amazon EC2
  2. Blast2cap3
  3. Campus Cluster
  4. Open Science Grid
  5. Pegasus Workflow Management System
  6. Scientific Workflow

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

XSEDE '14

Acceptance Rates

XSEDE '14 Paper Acceptance Rate 80 of 120 submissions, 67%;
Overall Acceptance Rate 129 of 190 submissions, 68%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 67
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media