Skip to main content

A Grid-Enabled Modular Framework for Efficient Sequence Analysis Workflows

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 517))

Abstract

In the era of Big Data in Life Sciences, efficient processing and analysis of vast amounts of sequence data is becoming an ever daunting challenge. Among such analyses, sequence alignment is one of the most commonly used procedures, as it provides useful insights on the functionality and relationship of the involved entities. Sequence alignment is one of the most common computational bottlenecks in several bioinformatics workflows. We have designed and implemented a time-efficient distributed modular application for sequence alignment, phylogenetic profiling and clustering of protein sequences, by utilizing the European Grid Infrastructure. The optimal utilization of the Grid with regards to the respective modules, allowed us to achieve significant speedups to the order of 1400%.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hach, F., et al.: SCALCE: boosting sequence compression algorithms using locally consistent encoding. Bioinformatics. 28(23), 3051–3057 (2012)

    Article  Google Scholar 

  2. Jourdren, L., et al.: Eoulsan: a cloud computing-based framework facilitating high throughput sequencing analyses. Bioinformatics. 28(11), 1542–1543 (2012)

    Article  Google Scholar 

  3. Vouzis, P., et al.: GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2011)

    Article  Google Scholar 

  4. Chung, W.C., et al.: CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce. PLoS One 9(6), e98146 (2014)

    Article  Google Scholar 

  5. Jun, G., et al.: An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data. Genome Res. 16. pii: gr.176552.114 (2015)

    Google Scholar 

  6. Decap, D., et al.: Halvade: scalable sequence analysis with MapReduce. Bioinformatics. 26. pii: btv179 (2015)

    Google Scholar 

  7. Lobo, I.: Basic Local Alignment Search Tool (BLAST). Nature Education 1(1), 215 (2008)

    Google Scholar 

  8. Enright, A.J., Van Dongen, S.: C. A. Ouzounis.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30(7), 1575–1584 (2002)

    Article  Google Scholar 

  9. Pellegrini, M., et al.: Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96, 4285–4288 (1999)

    Article  Google Scholar 

  10. Psomopoulos, F.E., Mitkas, P.A., Ouzounis, C.A.: Detection of Genomic Idiosyncrasies Using Fuzzy Phylogenetic Profiles. PLoS ONE 8(1), e52854 (2013)

    Article  Google Scholar 

  11. Gómez, J., et al.: BioJS: an open source JavaScript framework for biological data visualization. Bioinformatics 29(8), 1103–1104 (2013)

    Article  Google Scholar 

  12. Psomopoulos, F.E, et al.: The Chlamydiales Pangenome Revisited: Structural Stability and Functional Coherence. Genes 3(2), 291–319

    Google Scholar 

  13. Goecks, J., et al.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11(8), R86 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fotis E. Psomopoulos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Vrousgou, O.T., Psomopoulos, F.E., Mitkas, P.A. (2015). A Grid-Enabled Modular Framework for Efficient Sequence Analysis Workflows. In: Iliadis, L., Jayne, C. (eds) Engineering Applications of Neural Networks. EANN 2015. Communications in Computer and Information Science, vol 517. Springer, Cham. https://doi.org/10.1007/978-3-319-23983-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23983-5_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23981-1

  • Online ISBN: 978-3-319-23983-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics