Abstract
The introduction of next generation sequencing technologies did not bring only huge amounts of biological data but also highly sophisticated and versatile analysis workflows and systems. These new challenges require reliable and fast deployment methods over high performance servers in the local infrastructure or in the cloud. The use of virtualization technology has provided an efficient solution to overcome the complexity of deployment procedures and to provide a safe personalized execution box. However, the performance of applications running in virtual machines is worse than that of those running on the native infrastructure. Docker is a light weight alternative to the usual virtualization technology achieving notable better performance. In this paper, we explore the use case scenarios for using Docker to deploy and execute sophisticated bioinformatics tools and workflows, with a focus on the sequence analysis domain. We also introduce an efficient implementation of the package elasticHPC-Docker to enable creation of a docker-based computer cluster in the private cloud and in commercial clouds like Amazon and Google. We demonstrate by experiments that the use of elasticHPC-Docker is efficient and reliable in both private and commercial clouds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gonzalez-Garay, M.: The road from next-generation sequencing to personalized medicine. Pers. Med. 11(5), 523–544 (2014)
DePristo, M., Banks, E., et al.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genet. 43(5), 491–498 (2011)
FASTX-Toolkit. http://hannonlab.cshl.edu/fastx_toolkit
FASTQC. http://www.bioinformatics.babraham.ac.uk/projects/fastqc
Li, H., Durbin, R.: Fast and accurate short read alignment with burrows and wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)
Raczy, C., Petrovski, R., Saunders, C.T., et al.: Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms. Bioinformatics 29(16), 2041–2043 (2013). (Oxford, England)
Wang, K., Li, M., Hakonarson, H.: Annovar: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38(16), e164 (2010)
Langmead, B., Schatz, M., Lin, J., Pop, M., Salzberg, S.: Searching for SNPs with cloud computing. Genome Biol. 10, R134 (2009)
Wall, D., Kudtarkar, P., Fusaro, V., Pivovarov, R., Patil, P., Tonellato, P.: Cloud computing for comparative genomics. BMC Bioinformatics 11, 259 (2010)
Angiuoli, S., Matalka, M., Gussman, A., et al.: CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinformatics 12(1), 356+ (2011)
Gregory, J., Kuczynski, J., Stombaugh, J., et al.: QIIME allows analysis of high-throughput community sequencing data. Nat. Meth. 7(5), 335–336 (2010)
Guerrero, G., Wallace, R., Vázquez-Poletti, J., et al.: A performance/cost model for a cuda drug discovery application on physical and public cloud infrastructures. Concurrency Comput.: Pract. Experience 26(10), 1787–1798 (2014)
Mrozek, D., Malysiak-Mrozek, B., Klapcinski, A.: Cloud4Psi: cloud computing for 3D protein structure similarity searching. Bioinformatics 30(19), 2822–2825 (2014)
Mrozek, D., Gosk, P., Malysiak-Mrozek, B.: Scaling ab initio predictions of 3D protein structures in Microsoft Azure cloud. J. Grid Comp. 13(4), 561–585 (2015)
Hung, C.-L., Hua, G.-J.: Cloud computing for protein-ligand binding site comparison. Biomed. Res. Int. 2013, Article ID 170356, 1–7 (2013)
Oracle VirtualBox. http://www.virtualbox.org/
Kernel Virtual Machine. http://www.linux-kvm.org
Xen Project. http://www.xenproject.org/
VMware. http://www.vmware.com/
Docker. http://docker.com/
Folarin, A., Dobson, R., Newhouse, S.: NGSeasy: a next generation sequencing pipeline in Docker containers. F1000Research 4, 997 (2015)
Ali, A.A., El-Kalioby, M., Abouelhoda, M.: Supporting bioinformatics applications with hybrid multi-cloud services. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2015, Part I. LNCS, vol. 9043, pp. 415–425. Springer, Heidelberg (2015)
Docker Compose. https://www.docker.com/docker-compose
Garzon, J., Lopéz-Blanco, J., Pons, C., et al.: Frodock: a new approach for fast rotational protein-protein docking. Bioinformatics 25(19), 2544–2551 (2009)
Abouelhoda, M., Issa, S., Ghanem, M.: Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support. BMC Bioinformatics 13(1), 77 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
The following Dockerfile is used to build Docker image for Variant Calling detection:
Build Docker Image: To build a docker image you have to install Docker engine on your local host; as explained in http://docs.docker.com/engine/installation/. Once installed, write the code as shown above in a file called Dockerfile on the same directory where you will build your image. Finally run the following command line to build variant calling detection Docker image.
Start Docker Container: To start container using Docker engine, run the following command line:
Now user is ready to call any program for the variant detection workflow.
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Ali, A.A., El-Kalioby, M., Abouelhoda, M. (2016). The Case for Docker in Multicloud Enabled Bioinformatics Applications. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2016. Lecture Notes in Computer Science(), vol 9656. Springer, Cham. https://doi.org/10.1007/978-3-319-31744-1_52
Download citation
DOI: https://doi.org/10.1007/978-3-319-31744-1_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31743-4
Online ISBN: 978-3-319-31744-1
eBook Packages: Computer ScienceComputer Science (R0)