Abstract
A scientific workflow consists of a large number of tasks that are typically performed on distributed systems. For each node in the distributed computing system, an entire environment must be configured to be able to run a set of tasks, which depend on libraries, binaries, etc. Containers can provide lightweight virtualization capable of isolating the application and its dependencies, ensuring flexibility to run the same environment on different hosts. In this work, we investigate the integration of containers into scientific workflows, by combining Docker and Makeflow technologies. We focus on performance issues and propose a cache system adapted to containers, considering that data transfers can represent a potential bottleneck on iterative executions typically found in scientific workflows. We use Docker Volumes to circumvent the volatility of containers, allowing files stored in the cache to be available when a new container is instantiated. Our experimental results from the execution of two real-world bioinformatics workflows (Blast and Hecil) show that using our cache system effectively decreases execution times. We also show that the number of workers per host impacts the workflow execution time in different ways.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Docker Hub: https://hub.docker.com/.
- 2.
Makeflow Repository: http://github.com/cooperative-computing-lab/makeflow-examples.
References
Albrecht, M., Donnelly, P., Bui, P., Thain, D.: Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids. In: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, SWEET 2012, New York, NY, USA, pp. 1–13. ACM (2012)
Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludascher, B., Mock, S.: Kepler: an extensible system for design and execution of scientific workflows. In: Proceedings of 2004 16th International Conference on Scientific and Statistical Database Management, pp. 423–424, June 2004
Barker, A., van Hemert, J.: Scientific workflow: a survey and research directions. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2007. LNCS, vol. 4967, pp. 746–753. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68111-3_78
Choudhury, O., Chakrabarty, A., Emrich, S.J.: HECIL: a hybrid error correction algorithm for long reads with iterative learning. Sci. Rep. 8(9936), 1–9 (2018)
Combe, T., Martin, A., Di Pietro, R.: To Docker or not to Docker: a security perspective. IEEE Cloud Comput. 3(5), 54–62 (2016)
Donkor, E.S., Dayie, N.T.K.D., Adiku, T.K.: Basic local alignment search tool. J. Mol. Biol. 3(215), 403–410 (1990)
Dua, R., Raja, A.R., Kakadia, D.: Virtualization vs containerization to support PaaS. In: 2014 IEEE International Conference on Cloud Engineering, pp. 610–614, March 2014
Felter, W., Ferreira, A., Rajamony, R., Rubio, J.: An updated performance comparison of virtual machines and Linux containers. In: 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 171–172, March 2015
Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)
Majithia, S., Shields, M., Taylor, I., Wang, I.: Triana: a graphical web service composition and execution toolkit. In: Proceedings of 2004 IEEE International Conference on Web Services, pp. 514–521, July 2004
Oinn, T., et al.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)
Preeth, E.N., Mulerickal, F.J.P., Paul, B., Sastri, Y.: Evaluation of Docker containers based on hardware utilization. In: 2015 International Conference on Control Communication Computing India (ICCC), pp. 697–700, November 2015
Sweeney, K.M.D., Thain, D.: Efficient integration of containers into scientific workflows. In: Proceedings of the 9th Workshop on Scientific Cloud Computing, ScienceCloud 2018, New York, NY, USA pp. 7:1–7:6. ACM (2018)
Xavier, M.G., Neves, M.V., Rossi, F.D., Ferreto, T.C., Lange, T., De Rose, C.A.F.: Performance evaluation of container-based virtualization for high performance computing environments. In: 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 233–240, February 2013
Zheng, C., Thain, D.: Integrating containers into workflows: A case study using Makeflow, Work Queue, and Docker. In: Proceedings of the 8th International Workshop on Virtualization Technologies in Distributed Computing, VTDC 2015, New York, NY, USA, pp. 31–38. ACM (2015)
Acknowledgements
This research has been partially supported by the GREEN-CLOUD project (http://www.inf.ufrgs.br/greencloud/) (#16/2551-0000 488-9), from FAPERGS and CNPq Brazil, program PRONEX 12/2014.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
da Silva Alves, B., Charão, A.S. (2020). Towards Integration of Containers into Scientific Workflows: A Preliminary Experience with Cached Dependencies. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2020. ICCSA 2020. Lecture Notes in Computer Science(), vol 12253. Springer, Cham. https://doi.org/10.1007/978-3-030-58814-4_61
Download citation
DOI: https://doi.org/10.1007/978-3-030-58814-4_61
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58813-7
Online ISBN: 978-3-030-58814-4
eBook Packages: Computer ScienceComputer Science (R0)