Abstract
Use of software containers and services in science is a rising trend that is not satisfied by the HPC computing resources often available in research contexts. We propose a method to grow Kubernetes clusters onto transient nodes allocated through the Grid Engine batch workload manager. The method is being used to run a mix of data-intensive service applications and bursty HPC-style workflows on an OpenStack-based Kubernetes deployment, while keeping a homogeneous job management, logging, monitoring, and storage infrastructure. Moreover, it is relatively straightforward to convert the implementation to be compatible with other workload managers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Workaround DNS bug. In our tests fresh deployments created with the most recent version of KubeSpray were created with an erroneous cluster DNS setting. One must correct it by editing the config map kubelet-config-X.Y (where X.Y is the k8s version) in the kube-system namespace or new nodes will not work correctly.
References
Clyburne-Sherin, A., Fei, X., Green, S.A.: Computational reproducibility via containers in social psychology, April 2019. http://osf.io/s8mz4
Container Network Interface - networking for Linux containers, April 2019. https://github.com/containernetworking/cni. Accessed 26 Apr 2019
Flannel is a simple and easy way to configure a layer 3 network fabric designed for Kubernetes, April 2019. https://github.com/coreos/flannel. Accessed 26 Apr 2019
Gentzsch, W.: Sun grid engine: towards creating a compute power grid. In: Proceedings of the 1st International Symposium on Cluster Computing and the Grid, CCGRID 2001, p. 35. IEEE Computer Society, Washington, DC, USA (2001)
Grüning, B., et al.: Practical computational reproducibility in the life sciences. Cell Syst. 6(6), 631–635 (2018). https://doi.org/10.1016/j.cels.2018.03.014
Guerler, A., et al.: The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res. 46(W1), W537–W544 (2018). https://doi.org/10.1093/nar/gky379
Huang, X., Saha, A.K., Dutta, D., Gao, C.: Kubebench: a benchmarking platform for ML workloads. In: 2018 First International Conference on Artificial Intelligence for Industries (AI4I), pp. 73–76 (2018). https://doi.org/10.1109/AI4I.2018.8665688
Jacobsen, D.M., Canon, R.S.: Contain this, unleashing Docker for HPC. In: Proceedings of the Cray User Group (2015)
Khalid, A.: HPC-wire: Bridging HPC and Cloud Native development with Kubernetes, April 2019. https://www.hpcwire.com/solution_content/ibm/cross-industry/bridging-hpc-and-cloud-native-development-with-kubernetes/. Accessed 26 Apr 2019
kube-batch, April 2019. https://github.com/kubernetes-sigs/kube-batch. Accessed 26 Apr 2019
Kubeflow: The machine learning toolkit for kubernetes, April 2019. https://www.kubeflow.org. Accessed 26 Apr 2019
Kubernetes: production-grade container orchestration, April 2019. https://www.kubernetes.io. Accessed 26 Apr 2019
Deploy a production ready kubernetes cluster, April 2019. https://kubespray.io. Accessed 26 Apr 2019
Kubespray, April 2019. https://github.com/tdm-project/kubespray/. Accessed 26 Apr 2019
Kurtzer, G.M., Sochat, V., Bauer, M.W.: Singularity: scientific containers for mobility of compute. PLoS ONE 12(5), e0177459 (2017)
Liu, F., Keahey, K., Riteau, P., Weissman, J.: Dynamically negotiating capacity between on-demand and batch clusters. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 493–503. IEEE, November 2018. https://doi.org/10.1109/SC.2018.00041
Marmol, V., Jnagal, R., Hockin, T.: Networking in containers and container clusters. In: Proceedings of NetDev 0.1 (2015)
Merkel, D.: Docker: lightweight Linux containers for consistent development and deployment. Linux J. 2014(239) (2014)
Nagler, R., Bruhwiler, D.L., Moeller, P., Webb, S.: Sustainability and reproducibility via containerized computing. CoRR abs/1509.08789 (2015)
Nekrutenko, A., Team, G., Goecks, J., Taylor, J., Blankenberg, D.: Biology needs evolutionary software tools: let’s build them right. Mol. Biol. Evol. 35(6), 1372–1375 (2018). https://doi.org/10.1093/molbev/msy084
Oracle Inc.: Sun N1 Grid Engine 6.1 Administration Guide, April 2019. Accessed 26 Apr 2019
Peters, K., et al.: PhenoMeNal: processing and analysis of metabolomics data in the cloud. GigaScience, 8(2), giy149 (2018)
Piras, M.E., del Rio, M., Pireddu, L., Gaggero, M., Zanetti, G.: Manage-cluster: simple utility to help deploy Kubernetes clusters with Terraform and KubeSpray, April 2019. https://github.com/tdm-project/tdm-manage-cluster. Accessed 26 Apr 2019
Silver, A.: Software simplified. Nat. News 546(7656), 173 (2017)
Skamarock, W.C., et al.: A description of the advanced research WRF model, version 4. Technical report, National Center for Atmospheric Research, Boulder, CO, USA (2008)
Terraform, April 2019. https://www.terraform.io. Accessed 26 Apr 2019
da Veiga Leprevost, F., et al.: BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics 33(16), 2580–2582 (2017). https://doi.org/10.1093/bioinformatics/btx192
Yoo, A.B., Jette, M.A., Grondona, M.: SLURM: simple linux utility for resource management. In: Feitelson, D., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 44–60. Springer, Heidelberg (2003). https://doi.org/10.1007/10968987_3
Zhang, J., Lu, X., Chakraborty, S., Panda, D.K.D.K.: Slurm-V: extending slurm for building efficient HPC cloud with SR-IOV and IVShmem. In: Dutot, P.-F., Trystram, D. (eds.) Euro-Par 2016. LNCS, vol. 9833, pp. 349–362. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43659-3_26
Acknowledgements
This work was partially supported by the TDM project funded by Sardinian Regional Authorities under grant agreement POR FESR 2014-2020 Azione 1.2 (D. 66/14 13.12.2016 S3-ICT).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Piras, M.E., Pireddu, L., Moro, M., Zanetti, G. (2019). Container Orchestration on HPC Clusters. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-34356-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34355-2
Online ISBN: 978-3-030-34356-9
eBook Packages: Computer ScienceComputer Science (R0)