Skip to main content

Towards Integration of Containers into Scientific Workflows: A Preliminary Experience with Cached Dependencies

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2020 (ICCSA 2020)

Abstract

A scientific workflow consists of a large number of tasks that are typically performed on distributed systems. For each node in the distributed computing system, an entire environment must be configured to be able to run a set of tasks, which depend on libraries, binaries, etc. Containers can provide lightweight virtualization capable of isolating the application and its dependencies, ensuring flexibility to run the same environment on different hosts. In this work, we investigate the integration of containers into scientific workflows, by combining Docker and Makeflow technologies. We focus on performance issues and propose a cache system adapted to containers, considering that data transfers can represent a potential bottleneck on iterative executions typically found in scientific workflows. We use Docker Volumes to circumvent the volatility of containers, allowing files stored in the cache to be available when a new container is instantiated. Our experimental results from the execution of two real-world bioinformatics workflows (Blast and Hecil) show that using our cache system effectively decreases execution times. We also show that the number of workers per host impacts the workflow execution time in different ways.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Docker Hub: https://hub.docker.com/.

  2. 2.

    Makeflow Repository: http://github.com/cooperative-computing-lab/makeflow-examples.

References

  1. Albrecht, M., Donnelly, P., Bui, P., Thain, D.: Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids. In: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, SWEET 2012, New York, NY, USA, pp. 1–13. ACM (2012)

    Google Scholar 

  2. Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludascher, B., Mock, S.: Kepler: an extensible system for design and execution of scientific workflows. In: Proceedings of 2004 16th International Conference on Scientific and Statistical Database Management, pp. 423–424, June 2004

    Google Scholar 

  3. Barker, A., van Hemert, J.: Scientific workflow: a survey and research directions. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2007. LNCS, vol. 4967, pp. 746–753. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68111-3_78

    Chapter  Google Scholar 

  4. Choudhury, O., Chakrabarty, A., Emrich, S.J.: HECIL: a hybrid error correction algorithm for long reads with iterative learning. Sci. Rep. 8(9936), 1–9 (2018)

    Google Scholar 

  5. Combe, T., Martin, A., Di Pietro, R.: To Docker or not to Docker: a security perspective. IEEE Cloud Comput. 3(5), 54–62 (2016)

    Article  Google Scholar 

  6. Donkor, E.S., Dayie, N.T.K.D., Adiku, T.K.: Basic local alignment search tool. J. Mol. Biol. 3(215), 403–410 (1990)

    Google Scholar 

  7. Dua, R., Raja, A.R., Kakadia, D.: Virtualization vs containerization to support PaaS. In: 2014 IEEE International Conference on Cloud Engineering, pp. 610–614, March 2014

    Google Scholar 

  8. Felter, W., Ferreira, A., Rajamony, R., Rubio, J.: An updated performance comparison of virtual machines and Linux containers. In: 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 171–172, March 2015

    Google Scholar 

  9. Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)

    Article  Google Scholar 

  10. Majithia, S., Shields, M., Taylor, I., Wang, I.: Triana: a graphical web service composition and execution toolkit. In: Proceedings of 2004 IEEE International Conference on Web Services, pp. 514–521, July 2004

    Google Scholar 

  11. Oinn, T., et al.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)

    Article  Google Scholar 

  12. Preeth, E.N., Mulerickal, F.J.P., Paul, B., Sastri, Y.: Evaluation of Docker containers based on hardware utilization. In: 2015 International Conference on Control Communication Computing India (ICCC), pp. 697–700, November 2015

    Google Scholar 

  13. Sweeney, K.M.D., Thain, D.: Efficient integration of containers into scientific workflows. In: Proceedings of the 9th Workshop on Scientific Cloud Computing, ScienceCloud 2018, New York, NY, USA pp. 7:1–7:6. ACM (2018)

    Google Scholar 

  14. Xavier, M.G., Neves, M.V., Rossi, F.D., Ferreto, T.C., Lange, T., De Rose, C.A.F.: Performance evaluation of container-based virtualization for high performance computing environments. In: 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 233–240, February 2013

    Google Scholar 

  15. Zheng, C., Thain, D.: Integrating containers into workflows: A case study using Makeflow, Work Queue, and Docker. In: Proceedings of the 8th International Workshop on Virtualization Technologies in Distributed Computing, VTDC 2015, New York, NY, USA, pp. 31–38. ACM (2015)

    Google Scholar 

Download references

Acknowledgements

This research has been partially supported by the GREEN-CLOUD project (http://www.inf.ufrgs.br/greencloud/) (#16/2551-0000 488-9), from FAPERGS and CNPq Brazil, program PRONEX 12/2014.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Bruno da Silva Alves or Andrea Schwertner Charão .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

da Silva Alves, B., Charão, A.S. (2020). Towards Integration of Containers into Scientific Workflows: A Preliminary Experience with Cached Dependencies. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2020. ICCSA 2020. Lecture Notes in Computer Science(), vol 12253. Springer, Cham. https://doi.org/10.1007/978-3-030-58814-4_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58814-4_61

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58813-7

  • Online ISBN: 978-3-030-58814-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics