Skip to main content

The pegi3s Bioinformatics Docker Images Project

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 325))

Abstract

Among the available Linux container technologies, Docker is one of the most popular ones. Docker images can be used to provide ready-to-use software packages, where all required dependencies are already installed, and they can be deployed in any operating system where Docker is installed. They are also a convenient way to store immutable working software packages, thus contributing to reproducibility. Moreover, the usage of Docker images greatly eases the development of complex pipelines, standalone software applications with graphical user interfaces that require external software, and even the development of databases. Therefore, not surprisingly, Docker images are now ubiquitously used in computational biology and bioinformatics. Here, we present the pegi3s Bioinformatics Docker Images Project (https://pegi3s.github.io/dockerfiles/), a collection of more than 70 Docker images for commonly used software in the fields of genomics, transcriptomics, proteomics, phylogenetics, and sequence handling, among others, that is constantly growing. Several features distinguish this project from much larger projects, namely: 1) by providing a list of tools that are classified into broad categories, it is easier to find the most adequate tool(s) for a given project; 2) by providing the hyperlinks to the software manuals, we facilitate the process of finding the parameter combinations that are best suited for a given processing step; 3) most importantly, we provide clear instructions on how to run the images, provide test data that can be used to quickly evaluate the Docker image, and give all details on how each Docker image was built. All images are routinely used by ourselves, in the context of our research and teaching activities, meaning that they have been extensively tested. Therefore, we believe that this project, which is offered as a service in the context of the European ELIXIR program, is of interest to many researchers, independently of having or not a background in informatics.

H. López-Fernández and P. Ferreira—Contributed equally to this work.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.winehq.org.

  2. 2.

    https://pegi3s.github.io/dockerfiles.

  3. 3.

    https://hub.docker.com.

  4. 4.

    http://bioboxes.org.

  5. 5.

    https://biocontainers.pro.

  6. 6.

    https://dugongbioinformatics.github.io.

  7. 7.

    https://dockstore.org.

  8. 8.

    http://www.reproducible-bioinformatics.org.

  9. 9.

    https://hub.docker.com/r/bcgsc/orca.

  10. 10.

    https://www.sing-group.org/seda/manual/operations.html#splign-compart-pipeline.

  11. 11.

    https://docs.docker.com/docker-for-windows/install.

  12. 12.

    https://github.com/pegi3s/dockerfiles/blob/master/tutorials/singularity.md.

  13. 13.

    https://xpra.org.

  14. 14.

    https://github.com/docker-java/docker-java.

  15. 15.

    https://github.com/sing-group/evoppi-docker.

  16. 16.

    https://ubuntu.com/about.

References

  1. Perkel, J.M.: Workflow systems turn raw data into scientific knowledge. Nature 573, 149–150 (2019). https://doi.org/10.1038/d41586-019-02619-z

    Article  Google Scholar 

  2. Gomes, J., et al.: Enabling rootless Linux Containers in multi-user environments: the udocker tool. Comput. Phys. Commun. 232, 84–97 (2018). https://doi.org/10.1016/j.cpc.2018.05.021

    Article  Google Scholar 

  3. Gruening, B., et al.: Recommendations for the packaging and containerizing of bioinformatics software. F1000Res. 7, 742 (2019). https://doi.org/10.12688/f1000research.15140.2

  4. Nüst, D., et al.: Ten simple rules for writing Dockerfiles for reproducible data science. PLoS Comput. Biol. 16, e1008316 (2020). https://doi.org/10.1371/journal.pcbi.1008316

    Article  Google Scholar 

  5. Belmann, P., Dröge, J., Bremges, A., McHardy, A.C., Sczyrba, A., Barton, M.D.: Bioboxes: standardised containers for interchangeable bioinformatics software. GigaScience 4, (2015). https://doi.org/10.1186/s13742-015-0087-0

  6. Moreews, F., et al.: BioShaDock: a community driven bioinformatics shared Docker-based tools registry. F1000Res. 4, 1443 (2015). https://doi.org/10.12688/f1000research.7536.1

  7. da Veiga Leprevost, F., et al.: BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics 33, 2580–2582 (2017). https://doi.org/10.1093/bioinformatics/btx192

    Article  Google Scholar 

  8. Menegidio, F.B., Jabes, D.L., Costa de Oliveira, R., Nunes, L.R.: Dugong: a Docker image, based on Ubuntu Linux, focused on reproducibility and replicability for bioinformatics analyses. Bioinformatics 34, 514–515 (2018). https://doi.org/10.1093/bioinformatics/btx554

  9. O’Connor, B.D., et al.: The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows. F1000Res. 6, 52 (2017). https://doi.org/10.12688/f1000research.10137.1

  10. Jackman, S.D., et al.: ORCA: a comprehensive bioinformatics container environment for education and research. Bioinformatics 35, 4448–4450 (2019). https://doi.org/10.1093/bioinformatics/btz278

    Article  Google Scholar 

  11. Lopez-Fernandez, H., et al.: SEDA: a desktop tool suite for FASTA files processing. IEEE/ACM Trans. Comput. Biol. Bioinform 1 (2020). https://doi.org/10.1109/TCBB.2020.3040383

  12. López-Fernández, H., Graña-Castro, O., Nogueira-Rodríguez, A., Reboiro-Jato, M., Glez-Peña, D.: Compi: a framework for portable and reproducible pipelines. PeerJ Comput. Sci. 7, e593 (2021). https://doi.org/10.7717/peerj-cs.593

    Article  Google Scholar 

  13. López-Fernández, H., et al.: Inferring positive selection in large viral datasets. In: Fdez-Riverola, F., Rocha, M., Mohamad, M.S., Zaki, N., Castellanos-Garzón, J.A. (eds.) PACBB 2019. AISC, vol. 1005, pp. 61–69. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-23873-5_8

    Chapter  Google Scholar 

  14. Nogueira-Rodríguez, A., López-Fernández, H., Graña-Castro, O., Reboiro-Jato, M., Glez-Peña, D.: Compi hub: a public repository for sharing and discovering Compi pipelines. In: Panuccio, G., Rocha, M., Fdez-Riverola, F., Mohamad, M.S., Casado-Vara, R. (eds.) PACBB 2020. AISC, vol. 1240, pp. 51–59. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-54568-0_6

    Chapter  Google Scholar 

  15. López-Fernández, H., Vieira, C.P., Fdez-Riverola, F., Reboiro-Jato, M., Vieira, J.: Inferences on mycobacterium leprae host immune response escape and antibiotic resistance using genomic data and GenomeFastScreen. In: Panuccio, G., Rocha, M., Fdez-Riverola, F., Mohamad, M.S., Casado-Vara, R. (eds.) PACBB 2020. AISC, vol. 1240, pp. 42–50. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-54568-0_5

    Chapter  Google Scholar 

  16. Reboiro-Jato, D., Reboiro-Jato, M., Fdez-Riverola, F., Vieira, C.P., Fonseca, N.A., Vieira, J.: ADOPS–Automatic detection of positively selected sites. J Integr. Bioinform. 9, 200 (2012). https://doi.org/10.2390/biecoll-jib-2012-200

    Article  Google Scholar 

  17. Vázquez, N., López-Fernández, H., Vieira, C.P., Fdez-Riverola, F., Vieira, J., Reboiro-Jato, M.: BDBM 1.0: a desktop application for efficient retrieval and processing of high-quality sequence data and application to the identification of the putative Coffea S-locus. Interdiscip. Sci. Comput. Life Sci. 11(1), 57–67 (2019). https://doi.org/10.1007/s12539-019-00320-3

    Article  Google Scholar 

  18. Vázquez, N., et al.: EvoPPI 1.0: a web platform for within- and between-species multiple interactome comparisons and application to nine PolyQ proteins determining neurodegenerative diseases. Interdiscip. Sci. Comput. Life Sci. 11(1), 45–56 (2019). https://doi.org/10.1007/s12539-019-00317-y

    Article  Google Scholar 

Download references

Acknowledgments

This work was financed by the National Funds through FCT—Fundação para a Ciência e a Tecnologia, I.P., under the project UIDB/04293/2020 and through the individual scientific employment program-contract with Hugo López-Fernández (2020.00515.CEECIND), and also by BioData.pt (project 22231/01/SAICT/2016). This work was also partially supported by the Consellería de Educación, Universidades e Formación Profesional (Xunta de Galicia) under the scope of the strategic funding ED431C2018/55-GRC Competitive Reference Group.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hugo López-Fernández .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

López-Fernández, H., Ferreira, P., Reboiro-Jato, M., Vieira, C.P., Vieira, J. (2022). The pegi3s Bioinformatics Docker Images Project. In: Rocha, M., Fdez-Riverola, F., Mohamad, M.S., Casado-Vara, R. (eds) Practical Applications of Computational Biology & Bioinformatics, 15th International Conference (PACBB 2021). PACBB 2021. Lecture Notes in Networks and Systems, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-86258-9_4

Download citation

Publish with us

Policies and ethics