Development of a virtualized supercomputing environment for genomic analysis

Um, Jung-ho; Choi, Hoon; Song, Sa-kwang; Choi, Sung-pil; Yoon, Hwa mook; Jung, Hanmin; Kim, Tai-hoon

doi:10.1007/s11227-012-0752-3

Development of a virtualized supercomputing environment for genomic analysis

Published: 30 March 2012

Volume 65, pages 71–85, (2013)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jung-ho Um¹,
Hoon Choi¹,
Sa-kwang Song¹,
Sung-pil Choi¹,
Hwa mook Yoon¹,
Hanmin Jung¹ &
…
Tai-hoon Kim²

265 Accesses
2 Citations
Explore all metrics

Abstract

Recently the importance of genomic data analysis has been growing; one realizes necessity of the personalized treatment of human cancers. Next generation sequencing (NGS) technique is a cost-effective way to obtain such data sets for cancer data analysis. Hence, most of bioinformatics research groups use the NGS technique to obtain such data sets. The amount of NGS data is huge and rapidly growing; therefore, it requires supercomputing systems to be handled within a reasonable time. Bioinformatics researchers analyze the sets by using NGS applications such as BWA and BowTie, but those legacy applications have limited scalability and resource utilization on supercomputing systems.

To resolve this situation, we developed a virtualized technique by improving the resource utilization and scalability of NGS applications. First, to improve resource utilization, the virtualized system architecture is built by allocating virtual machines considering the limitation of resource utilization. Second, the virtualized system architecture considering data locality is presented to improve scalability. Finally, experimental results show that our virtualized system achieved approximately 30 % better performance than native systems. In addition, the performance of the system considering data locality achieves a speedup twice that of a system using a single-storage server.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

References

Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet 11(1):31–46
Article Google Scholar
Schuster SC (2008) Next-generation sequencing transforms today’s biology. Nat Methods 5(1):16–18
Article Google Scholar
Mardis ER (2008) The impact of next-generation sequencing technology on genetics. Trends Genet 24(3):133–141
Article Google Scholar
Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26(5):589–595
Article Google Scholar
Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25
Article Google Scholar
Schatz MC (2009) CloudBurst: highly sensitive short read mapping with MapReduce. Bioinformatics 25(11):1363–1369
Article Google Scholar
The SAM Format Specification Working Group (2011) samtools.sourceforge.net/SAM-1.3.pdf
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Lange JR, Pedretti K, Dinda P, Bridges PG, Bae C, Soleto P, Merritt A (2011) Minimal overhead virtualization of a large scale supercomputer. In: 7th ACM SIGPLAN/SIGOPS international conference on virtual execution environments
Google Scholar
Huang W, Liu J, Abali B, Panda DK (2006) A case for high performance computing with virtual machines. In: Proc international conference on supercomputing
Google Scholar
Pedretti KT, Bridges PG (2010) Opportunities for leveraging OS virtualization in high-end supercomputing. In: Proc ACM virtual execution environments
Google Scholar
Gavrilovska A, Kumar S, Raj H, Schwan K, Gupta V, Nathuji R, Niranjan R, Ranadive A, Saraiya P (2007) High-performance hypervisor architectures: virtualization in HPC system 07. In: Proc HPCvirt 07
Google Scholar
Ekanayake J, Gunarathne T, Qiu J (2011) Cloud technologies for bioinformatics applications. IEEE Trans Parallel Distrib Syst 22(6):998–1011
Article Google Scholar
Matsunaga A, Tsugawa M, Fortes J (2009) CloudBLAST: combining MapReduce and virtualization on distributed resources for bioinformatics applications. In: Proc 4th IEEE international conference on eScience
Google Scholar

Download references

Author information

Authors and Affiliations

Korea Institute of Science and Technology Information, 245 Daehangno, Yuseong-gu, Daejeon, 305-806, Korea
Jung-ho Um, Hoon Choi, Sa-kwang Song, Sung-pil Choi, Hwa mook Yoon & Hanmin Jung
GVSA and University of Tasmania, 20 Virginia Court, Sandy Bay, Tasmania, Australia
Tai-hoon Kim

Authors

Jung-ho Um
View author publications
You can also search for this author in PubMed Google Scholar
Hoon Choi
View author publications
You can also search for this author in PubMed Google Scholar
Sa-kwang Song
View author publications
You can also search for this author in PubMed Google Scholar
Sung-pil Choi
View author publications
You can also search for this author in PubMed Google Scholar
Hwa mook Yoon
View author publications
You can also search for this author in PubMed Google Scholar
Hanmin Jung
View author publications
You can also search for this author in PubMed Google Scholar
Tai-hoon Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tai-hoon Kim.

Additional information

Both authors J.-h. Um and H. Choi contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Um, Jh., Choi, H., Song, Sk. et al. Development of a virtualized supercomputing environment for genomic analysis. J Supercomput 65, 71–85 (2013). https://doi.org/10.1007/s11227-012-0752-3

Download citation

Published: 30 March 2012
Issue Date: July 2013
DOI: https://doi.org/10.1007/s11227-012-0752-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development of a virtualized supercomputing environment for genomic analysis

Abstract

Access this article

Similar content being viewed by others

Applications and challenges of high performance computing in genomics

Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis

Extracting Insights: A Data Centre Architecture Approach in Million Genome Era

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Development of a virtualized supercomputing environment for genomic analysis

Abstract

Access this article

Similar content being viewed by others

Applications and challenges of high performance computing in genomics

Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis

Extracting Insights: A Data Centre Architecture Approach in Million Genome Era

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation