Skip to main content
Log in

Development of a virtualized supercomputing environment for genomic analysis

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Recently the importance of genomic data analysis has been growing; one realizes necessity of the personalized treatment of human cancers. Next generation sequencing (NGS) technique is a cost-effective way to obtain such data sets for cancer data analysis. Hence, most of bioinformatics research groups use the NGS technique to obtain such data sets. The amount of NGS data is huge and rapidly growing; therefore, it requires supercomputing systems to be handled within a reasonable time. Bioinformatics researchers analyze the sets by using NGS applications such as BWA and BowTie, but those legacy applications have limited scalability and resource utilization on supercomputing systems.

To resolve this situation, we developed a virtualized technique by improving the resource utilization and scalability of NGS applications. First, to improve resource utilization, the virtualized system architecture is built by allocating virtual machines considering the limitation of resource utilization. Second, the virtualized system architecture considering data locality is presented to improve scalability. Finally, experimental results show that our virtualized system achieved approximately 30 % better performance than native systems. In addition, the performance of the system considering data locality achieves a speedup twice that of a system using a single-storage server.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://www.kobic.re.kr/.

  2. http://bio-bwa.sourceforge.net/.

  3. http://en.wikipedia.org/wiki/David_Bowie.

  4. http://sourceforge.net/apps/mediawiki/cloudburst-bio/.

  5. http://xen.org/.

  6. http://opennebula.org/.

  7. http://www.linux-kvm.org/.

  8. http://www.openstack.org/.

  9. http://www.eucalyptus.com/.

  10. http://hadoop.apache.org/.

  11. http://opennebula.org/documentation:rel3.2:java.

  12. http://www.oracle.com/technetwork/oem/grid-engine-support-215299.html.

  13. http://www.1000genomes.org/.

References

  1. Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet 11(1):31–46

    Article  Google Scholar 

  2. Schuster SC (2008) Next-generation sequencing transforms today’s biology. Nat Methods 5(1):16–18

    Article  Google Scholar 

  3. Mardis ER (2008) The impact of next-generation sequencing technology on genetics. Trends Genet 24(3):133–141

    Article  Google Scholar 

  4. Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26(5):589–595

    Article  Google Scholar 

  5. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25

    Article  Google Scholar 

  6. Schatz MC (2009) CloudBurst: highly sensitive short read mapping with MapReduce. Bioinformatics 25(11):1363–1369

    Article  Google Scholar 

  7. The SAM Format Specification Working Group (2011) samtools.sourceforge.net/SAM-1.3.pdf

  8. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  9. Lange JR, Pedretti K, Dinda P, Bridges PG, Bae C, Soleto P, Merritt A (2011) Minimal overhead virtualization of a large scale supercomputer. In: 7th ACM SIGPLAN/SIGOPS international conference on virtual execution environments

    Google Scholar 

  10. Huang W, Liu J, Abali B, Panda DK (2006) A case for high performance computing with virtual machines. In: Proc international conference on supercomputing

    Google Scholar 

  11. Pedretti KT, Bridges PG (2010) Opportunities for leveraging OS virtualization in high-end supercomputing. In: Proc ACM virtual execution environments

    Google Scholar 

  12. Gavrilovska A, Kumar S, Raj H, Schwan K, Gupta V, Nathuji R, Niranjan R, Ranadive A, Saraiya P (2007) High-performance hypervisor architectures: virtualization in HPC system 07. In: Proc HPCvirt 07

    Google Scholar 

  13. Ekanayake J, Gunarathne T, Qiu J (2011) Cloud technologies for bioinformatics applications. IEEE Trans Parallel Distrib Syst 22(6):998–1011

    Article  Google Scholar 

  14. Matsunaga A, Tsugawa M, Fortes J (2009) CloudBLAST: combining MapReduce and virtualization on distributed resources for bioinformatics applications. In: Proc 4th IEEE international conference on eScience

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tai-hoon Kim.

Additional information

Both authors J.-h. Um and H. Choi contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Um, Jh., Choi, H., Song, Sk. et al. Development of a virtualized supercomputing environment for genomic analysis. J Supercomput 65, 71–85 (2013). https://doi.org/10.1007/s11227-012-0752-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-012-0752-3

Keywords

Navigation