Abstract
Recently the importance of genomic data analysis has been growing; one realizes necessity of the personalized treatment of human cancers. Next generation sequencing (NGS) technique is a cost-effective way to obtain such data sets for cancer data analysis. Hence, most of bioinformatics research groups use the NGS technique to obtain such data sets. The amount of NGS data is huge and rapidly growing; therefore, it requires supercomputing systems to be handled within a reasonable time. Bioinformatics researchers analyze the sets by using NGS applications such as BWA and BowTie, but those legacy applications have limited scalability and resource utilization on supercomputing systems.
To resolve this situation, we developed a virtualized technique by improving the resource utilization and scalability of NGS applications. First, to improve resource utilization, the virtualized system architecture is built by allocating virtual machines considering the limitation of resource utilization. Second, the virtualized system architecture considering data locality is presented to improve scalability. Finally, experimental results show that our virtualized system achieved approximately 30 % better performance than native systems. In addition, the performance of the system considering data locality achieves a speedup twice that of a system using a single-storage server.
Access this article
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.











Similar content being viewed by others
Notes
References
Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet 11(1):31–46
Schuster SC (2008) Next-generation sequencing transforms today’s biology. Nat Methods 5(1):16–18
Mardis ER (2008) The impact of next-generation sequencing technology on genetics. Trends Genet 24(3):133–141
Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26(5):589–595
Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25
Schatz MC (2009) CloudBurst: highly sensitive short read mapping with MapReduce. Bioinformatics 25(11):1363–1369
The SAM Format Specification Working Group (2011) samtools.sourceforge.net/SAM-1.3.pdf
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Lange JR, Pedretti K, Dinda P, Bridges PG, Bae C, Soleto P, Merritt A (2011) Minimal overhead virtualization of a large scale supercomputer. In: 7th ACM SIGPLAN/SIGOPS international conference on virtual execution environments
Huang W, Liu J, Abali B, Panda DK (2006) A case for high performance computing with virtual machines. In: Proc international conference on supercomputing
Pedretti KT, Bridges PG (2010) Opportunities for leveraging OS virtualization in high-end supercomputing. In: Proc ACM virtual execution environments
Gavrilovska A, Kumar S, Raj H, Schwan K, Gupta V, Nathuji R, Niranjan R, Ranadive A, Saraiya P (2007) High-performance hypervisor architectures: virtualization in HPC system 07. In: Proc HPCvirt 07
Ekanayake J, Gunarathne T, Qiu J (2011) Cloud technologies for bioinformatics applications. IEEE Trans Parallel Distrib Syst 22(6):998–1011
Matsunaga A, Tsugawa M, Fortes J (2009) CloudBLAST: combining MapReduce and virtualization on distributed resources for bioinformatics applications. In: Proc 4th IEEE international conference on eScience
Author information
Authors and Affiliations
Corresponding author
Additional information
Both authors J.-h. Um and H. Choi contributed equally to this work.
Rights and permissions
About this article
Cite this article
Um, Jh., Choi, H., Song, Sk. et al. Development of a virtualized supercomputing environment for genomic analysis. J Supercomput 65, 71–85 (2013). https://doi.org/10.1007/s11227-012-0752-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-012-0752-3