The Case for Docker in Multicloud Enabled Bioinformatics Applications

Ali, Ahmed Abdullah; El-Kalioby, Mohamed; Abouelhoda, Mohamed

doi:10.1007/978-3-319-31744-1_52

Ahmed Abdullah Ali¹⁶,
Mohamed El-Kalioby¹⁶ &
Mohamed Abouelhoda^15,16

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9656))

Included in the following conference series:

International Conference on Bioinformatics and Biomedical Engineering

2105 Accesses
6 Citations
1 Altmetric

Abstract

The introduction of next generation sequencing technologies did not bring only huge amounts of biological data but also highly sophisticated and versatile analysis workflows and systems. These new challenges require reliable and fast deployment methods over high performance servers in the local infrastructure or in the cloud. The use of virtualization technology has provided an efficient solution to overcome the complexity of deployment procedures and to provide a safe personalized execution box. However, the performance of applications running in virtual machines is worse than that of those running on the native infrastructure. Docker is a light weight alternative to the usual virtualization technology achieving notable better performance. In this paper, we explore the use case scenarios for using Docker to deploy and execute sophisticated bioinformatics tools and workflows, with a focus on the sequence analysis domain. We also introduce an efficient implementation of the package elasticHPC-Docker to enable creation of a docker-based computer cluster in the private cloud and in commercial clouds like Amazon and Google. We demonstrate by experiments that the use of elasticHPC-Docker is efficient and reliable in both private and commercial clouds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gonzalez-Garay, M.: The road from next-generation sequencing to personalized medicine. Pers. Med. 11(5), 523–544 (2014)
Article Google Scholar
DePristo, M., Banks, E., et al.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genet. 43(5), 491–498 (2011)
Article Google Scholar
FASTX-Toolkit. http://hannonlab.cshl.edu/fastx_toolkit
FASTQC. http://www.bioinformatics.babraham.ac.uk/projects/fastqc
Li, H., Durbin, R.: Fast and accurate short read alignment with burrows and wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)
Article Google Scholar
Raczy, C., Petrovski, R., Saunders, C.T., et al.: Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms. Bioinformatics 29(16), 2041–2043 (2013). (Oxford, England)
Article Google Scholar
Wang, K., Li, M., Hakonarson, H.: Annovar: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38(16), e164 (2010)
Article Google Scholar
Langmead, B., Schatz, M., Lin, J., Pop, M., Salzberg, S.: Searching for SNPs with cloud computing. Genome Biol. 10, R134 (2009)
Article Google Scholar
Wall, D., Kudtarkar, P., Fusaro, V., Pivovarov, R., Patil, P., Tonellato, P.: Cloud computing for comparative genomics. BMC Bioinformatics 11, 259 (2010)
Article Google Scholar
Angiuoli, S., Matalka, M., Gussman, A., et al.: CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinformatics 12(1), 356+ (2011)
Article Google Scholar
Gregory, J., Kuczynski, J., Stombaugh, J., et al.: QIIME allows analysis of high-throughput community sequencing data. Nat. Meth. 7(5), 335–336 (2010)
Article Google Scholar
Guerrero, G., Wallace, R., Vázquez-Poletti, J., et al.: A performance/cost model for a cuda drug discovery application on physical and public cloud infrastructures. Concurrency Comput.: Pract. Experience 26(10), 1787–1798 (2014)
Article Google Scholar
Mrozek, D., Malysiak-Mrozek, B., Klapcinski, A.: Cloud4Psi: cloud computing for 3D protein structure similarity searching. Bioinformatics 30(19), 2822–2825 (2014)
Article Google Scholar
Mrozek, D., Gosk, P., Malysiak-Mrozek, B.: Scaling ab initio predictions of 3D protein structures in Microsoft Azure cloud. J. Grid Comp. 13(4), 561–585 (2015)
Article Google Scholar
Hung, C.-L., Hua, G.-J.: Cloud computing for protein-ligand binding site comparison. Biomed. Res. Int. 2013, Article ID 170356, 1–7 (2013)
Google Scholar
Oracle VirtualBox. http://www.virtualbox.org/
Kernel Virtual Machine. http://www.linux-kvm.org
Xen Project. http://www.xenproject.org/
VMware. http://www.vmware.com/
Docker. http://docker.com/
Folarin, A., Dobson, R., Newhouse, S.: NGSeasy: a next generation sequencing pipeline in Docker containers. F1000Research 4, 997 (2015)
Google Scholar
Ali, A.A., El-Kalioby, M., Abouelhoda, M.: Supporting bioinformatics applications with hybrid multi-cloud services. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2015, Part I. LNCS, vol. 9043, pp. 415–425. Springer, Heidelberg (2015)
Google Scholar
Pods. http://cloud.google.com/container-engine/docs/pods
Docker Compose. https://www.docker.com/docker-compose
Garzon, J., Lopéz-Blanco, J., Pons, C., et al.: Frodock: a new approach for fast rotational protein-protein docking. Bioinformatics 25(19), 2544–2551 (2009)
Article Google Scholar
Abouelhoda, M., Issa, S., Ghanem, M.: Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support. BMC Bioinformatics 13(1), 77 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Engineering, Cairo University, Giza, Egypt
Mohamed Abouelhoda
Center for Informatics Sciences, Nile University, Sheikh Zaid City, Egypt
Ahmed Abdullah Ali, Mohamed El-Kalioby & Mohamed Abouelhoda

Authors

Ahmed Abdullah Ali
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed El-Kalioby
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Abouelhoda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Abouelhoda .

Editor information

Editors and Affiliations

Universidad de Granada, Granada, Spain
Francisco Ortuño
Universidad de Granada, Granada, Spain
Ignacio Rojas

A Appendix

The following Dockerfile is used to build Docker image for Variant Calling detection:

Build Docker Image: To build a docker image you have to install Docker engine on your local host; as explained in http://docs.docker.com/engine/installation/. Once installed, write the code as shown above in a file called Dockerfile on the same directory where you will build your image. Finally run the following command line to build variant calling detection Docker image.

Start Docker Container: To start container using Docker engine, run the following command line:

Now user is ready to call any program for the variant detection workflow.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ali, A.A., El-Kalioby, M., Abouelhoda, M. (2016). The Case for Docker in Multicloud Enabled Bioinformatics Applications. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2016. Lecture Notes in Computer Science(), vol 9656. Springer, Cham. https://doi.org/10.1007/978-3-319-31744-1_52

Download citation

DOI: https://doi.org/10.1007/978-3-319-31744-1_52
Published: 25 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31743-4
Online ISBN: 978-3-319-31744-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Case for Docker in Multicloud Enabled Bioinformatics Applications

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation