Skip to main content

Exploring Molecular Evolution Reconstruction Using a Parallel Cloud Based Scientific Workflow

  • Conference paper
Advances in Bioinformatics and Computational Biology (BSB 2012)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7409))

Included in the following conference series:

Abstract

Recent studies of evolution at molecular level address two important issues: reconstruction of the evolutionary relationships between species and investigation of the forces of the evolutionary process. Both issues experienced an explosive growth in the last two decades due to massive generation of genomic data, novel statistical methods and computational approaches to process and analyze this large volume of data. Most experiments in molecular evolution are based on computing intensive simulations preceded by other computation tools and post-processed by computing validators. All these tools can be modeled as scientific workflows to improve the experiment management while capturing provenance data. However, these evolutionary analyses experiments are very complex and may execute for weeks. These workflows need to be executed in parallel in High Performance Computing (HPC) environments such as clouds. Clouds are becoming adopted for bioinformatics experiments due to its characteristics, such as, elasticity and availability. Clouds are evolving into HPC environments. In this paper, we introduce SciEvol, a bioinformatics scientific workflow for molecular evolution reconstruction that aims at inferring evolutionary relationships (i.e. to detect positive Darwinian selection) on genomic data. SciEvol is designed and implemented to execute in parallel over the clouds using SciCumulus workflow engine. Our experiments show that SciEvol can help scientists by enabling the reconstruction of evolutionary relationships using the cloud environment. Results present performance improvements of up to 94.64% in the execution time when compared to the sequential execution, which drops from around 10 days to 12 hours.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Miller, W., Makova, K.D., Nekrutenko, A., Hardison, R.C.: Comparative Genomics. Annu. Rev. Genom. Human Genet.Ā 5, 15ā€“56 (2004)

    ArticleĀ  Google ScholarĀ 

  2. Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M.: Workflows for e-Science: Scientific Workflows for Grids. Springer (2007)

    Google ScholarĀ 

  3. Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for Computational Tasks: A Survey. Computing in Science and EngineeringĀ 10, 11ā€“21 (2008)

    Google ScholarĀ 

  4. Egan, A., Mahurkar, A., Crabtree, J., Badger, J.H., Carlton, J.M., Silva, J.C.: IDEA: Interactive Display for Evolutionary Analyses. BMC BioinformaticsĀ 9, 524 (2008)

    ArticleĀ  Google ScholarĀ 

  5. Busset, J., Cabau, C., Meslin, C., Pascal, G.: PhyleasProg: a user-oriented web server for wide evolutionary analyses. Nucleic Acids Research 39, W479ā€“W485 (2011)

    Google ScholarĀ 

  6. Katoh, K., Toh, H.: Recent developments in the MAFFT multiple sequence alignment program. Brief. BioinformaticsĀ 9, 286ā€“298 (2008)

    ArticleĀ  Google ScholarĀ 

  7. Goldman, N., Yang, Z.: A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol.Ā 11, 725ā€“736 (1994)

    Google ScholarĀ 

  8. Hey, T., Tansley, S., Tolle, K.: The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research (2009)

    Google ScholarĀ 

  9. Vaquero, L.M., Rodero-Merino, L., Caceres, J., Lindner, M.: A break in the clouds: towards a cloud definition. SIGCOMM Comput. Commun. Rev.Ā 39, 50ā€“55 (2009)

    ArticleĀ  Google ScholarĀ 

  10. Jackson, K.R., Ramakrishnan, L., Runge, K.J., Thomas, R.C.: Seeking supernovae in the clouds: a performance study. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp. 421ā€“429. ACM, New York (2010)

    ChapterĀ  Google ScholarĀ 

  11. Yang, Z.: PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol.Ā 24, 1586ā€“1591 (2007)

    ArticleĀ  Google ScholarĀ 

  12. OcaƱa, K.A.C.S., de Oliveira, D., Dias, J., Ogasawara, E., Mattoso, M.: Optimizing Phylogenetic Analysis Using SciHmm Cloud-based Scientific Workflow. In: 2011 IEEE Seventh International Conference on e-Science (e-Science), pp. 190ā€“197. IEEE, Stockholm (2011)

    Google ScholarĀ 

  13. OcaƱa, K.A.C.S., de Oliveira, D., Ogasawara, E., DĆ”vila, A.M.R., Lima, A.A.B., Mattoso, M.: SciPhy: A Cloud-Based Workflow for Phylogenetic Analysis of Drug Targets in Protozoan Genomes. In: Norberto de Souza, O., Telles, G.P., Palakal, M. (eds.) BSB 2011. LNCS (LNBI), vol.Ā 6832, pp. 66ā€“70. Springer, Heidelberg (2011)

    ChapterĀ  Google ScholarĀ 

  14. de Oliveira, D., Ogasawara, E., BaiĆ£o, F., Mattoso, M.: SciCumulus: A Lightweight Cloud Middleware to Explore Many Task Computing Paradigm in Scientific Workflows. In: 3rd International Conference on Cloud Computing, pp. 378ā€“385. IEEE Computer Society, Washington, DC (2010)

    ChapterĀ  Google ScholarĀ 

  15. Anisimova, M., Bielawski, J.P., Yang, Z.: Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol. Biol. Evol.Ā 18, 1585ā€“1592 (2001)

    ArticleĀ  Google ScholarĀ 

  16. Aguileta, G., RefrĆ©gier, G., Yockteng, R., Fournier, E., Giraud, T.: Rapidly evolving genes in pathogens: methods for detecting positive selection and examples among fungi, bacteria, viruses and protists. Infect. Genet. Evol.Ā 9, 656ā€“670 (2009)

    ArticleĀ  Google ScholarĀ 

  17. King, C.-C., Chao, D.-Y., Chien, L.-J., Chang, G.-J.J., Lin, T.-H., Wu, Y.-C., Huang, J.-H.: Comparative analysis of full genomic sequences among different genotypes of dengue virus type 3. Virol. J.Ā 5, 63 (2008)

    ArticleĀ  Google ScholarĀ 

  18. Nielsen, R., Yang, Z.: Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. GeneticsĀ 148, 929ā€“936 (1998)

    Google ScholarĀ 

  19. Yang, Z.: Computational Molecular Evolution. Oxford University Press (2006)

    Google ScholarĀ 

  20. Freedman, D., Pisani, R., Purves, R.: Statistics, 4th edn. W. W. Norton (2007)

    Google ScholarĀ 

  21. Muse, S.V., Gaut, B.S.: A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol. Biol. Evol.Ā 11, 715ā€“724 (1994)

    Google ScholarĀ 

  22. Yang, Z., Swanson, W.J.: Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes. Mol. Biol. Evol.Ā 19, 49ā€“57 (2002)

    ArticleĀ  Google ScholarĀ 

  23. Felsenstein, J.: PHYLIP - Phylogeny Inference Package (Version 3.2). CladisticsĀ 5, 164ā€“166 (1989)

    Google ScholarĀ 

  24. Chen, S.L., Hung, C.-S., Xu, J., Reigstad, C.S., Magrini, V., Sabo, A., Blasiar, D., Bieri, T., Meyer, R.R., Ozersky, P., Armstrong, J.R., Fulton, R.S., Latreille, J.P., Spieth, J., Hooton, T.M., Mardis, E.R., Hultgren, S.J., Gordon, J.I.: Identification of genes subject to positive selection in uropathogenic strains of Escherichia coli: a comparative genomics approach. Proc. Natl. Acad. Sci. U.S.A.Ā 103, 5977ā€“5982 (2006)

    ArticleĀ  Google ScholarĀ 

  25. Ge, G., Cowen, L., Feng, X., Widmer, G.: Protein coding gene nucleotide substitution pattern in the apicomplexan protozoa Cryptosporidium parvum and Cryptosporidium hominis. Comp. Funct. Genomics 879023 (2008)

    Google ScholarĀ 

  26. Montin, K., Cervellati, C., Dallocchio, F., Hanau, S.: Thermodynamic characterization of substrate and inhibitor binding to Trypanosoma brucei 6-phosphogluconate dehydrogenase. FEBS J.Ā 274, 6426ā€“6435 (2007)

    ArticleĀ  Google ScholarĀ 

  27. Talavera, G., Castresana, J.: Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol.Ā 56, 564ā€“577 (2007)

    ArticleĀ  Google ScholarĀ 

  28. Vilella, A.J., Severin, J., Ureta-Vidal, A., Heng, L., Durbin, R., Birney, E.: EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res.Ā 19, 327ā€“335 (2009)

    ArticleĀ  Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

OcaƱa, K.A.C.S., de Oliveira, D., Horta, F., Dias, J., Ogasawara, E., Mattoso, M. (2012). Exploring Molecular Evolution Reconstruction Using a Parallel Cloud Based Scientific Workflow. In: de Souto, M.C., Kann, M.G. (eds) Advances in Bioinformatics and Computational Biology. BSB 2012. Lecture Notes in Computer Science(), vol 7409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31927-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31927-3_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31926-6

  • Online ISBN: 978-3-642-31927-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics