Skip to main content

Container Orchestration on HPC Clusters

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11887))

Included in the following conference series:

Abstract

Use of software containers and services in science is a rising trend that is not satisfied by the HPC computing resources often available in research contexts. We propose a method to grow Kubernetes clusters onto transient nodes allocated through the Grid Engine batch workload manager. The method is being used to run a mix of data-intensive service applications and bursty HPC-style workflows on an OpenStack-based Kubernetes deployment, while keeping a homogeneous job management, logging, monitoring, and storage infrastructure. Moreover, it is relatively straightforward to convert the implementation to be compatible with other workload managers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Workaround DNS bug. In our tests fresh deployments created with the most recent version of KubeSpray were created with an erroneous cluster DNS setting. One must correct it by editing the config map kubelet-config-X.Y (where X.Y is the k8s version) in the kube-system namespace or new nodes will not work correctly.

References

  1. Clyburne-Sherin, A., Fei, X., Green, S.A.: Computational reproducibility via containers in social psychology, April 2019. http://osf.io/s8mz4

  2. Container Network Interface - networking for Linux containers, April 2019. https://github.com/containernetworking/cni. Accessed 26 Apr 2019

  3. Flannel is a simple and easy way to configure a layer 3 network fabric designed for Kubernetes, April 2019. https://github.com/coreos/flannel. Accessed 26 Apr 2019

  4. Gentzsch, W.: Sun grid engine: towards creating a compute power grid. In: Proceedings of the 1st International Symposium on Cluster Computing and the Grid, CCGRID 2001, p. 35. IEEE Computer Society, Washington, DC, USA (2001)

    Google Scholar 

  5. Grüning, B., et al.: Practical computational reproducibility in the life sciences. Cell Syst. 6(6), 631–635 (2018). https://doi.org/10.1016/j.cels.2018.03.014

    Article  Google Scholar 

  6. Guerler, A., et al.: The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res. 46(W1), W537–W544 (2018). https://doi.org/10.1093/nar/gky379

    Article  Google Scholar 

  7. Huang, X., Saha, A.K., Dutta, D., Gao, C.: Kubebench: a benchmarking platform for ML workloads. In: 2018 First International Conference on Artificial Intelligence for Industries (AI4I), pp. 73–76 (2018). https://doi.org/10.1109/AI4I.2018.8665688

  8. Jacobsen, D.M., Canon, R.S.: Contain this, unleashing Docker for HPC. In: Proceedings of the Cray User Group (2015)

    Google Scholar 

  9. Khalid, A.: HPC-wire: Bridging HPC and Cloud Native development with Kubernetes, April 2019. https://www.hpcwire.com/solution_content/ibm/cross-industry/bridging-hpc-and-cloud-native-development-with-kubernetes/. Accessed 26 Apr 2019

  10. kube-batch, April 2019. https://github.com/kubernetes-sigs/kube-batch. Accessed 26 Apr 2019

  11. Kubeflow: The machine learning toolkit for kubernetes, April 2019. https://www.kubeflow.org. Accessed 26 Apr 2019

  12. Kubernetes: production-grade container orchestration, April 2019. https://www.kubernetes.io. Accessed 26 Apr 2019

  13. Deploy a production ready kubernetes cluster, April 2019. https://kubespray.io. Accessed 26 Apr 2019

  14. Kubespray, April 2019. https://github.com/tdm-project/kubespray/. Accessed 26 Apr 2019

  15. Kurtzer, G.M., Sochat, V., Bauer, M.W.: Singularity: scientific containers for mobility of compute. PLoS ONE 12(5), e0177459 (2017)

    Article  Google Scholar 

  16. Liu, F., Keahey, K., Riteau, P., Weissman, J.: Dynamically negotiating capacity between on-demand and batch clusters. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 493–503. IEEE, November 2018. https://doi.org/10.1109/SC.2018.00041

  17. Marmol, V., Jnagal, R., Hockin, T.: Networking in containers and container clusters. In: Proceedings of NetDev 0.1 (2015)

    Google Scholar 

  18. Merkel, D.: Docker: lightweight Linux containers for consistent development and deployment. Linux J. 2014(239) (2014)

    Google Scholar 

  19. Nagler, R., Bruhwiler, D.L., Moeller, P., Webb, S.: Sustainability and reproducibility via containerized computing. CoRR abs/1509.08789 (2015)

    Google Scholar 

  20. Nekrutenko, A., Team, G., Goecks, J., Taylor, J., Blankenberg, D.: Biology needs evolutionary software tools: let’s build them right. Mol. Biol. Evol. 35(6), 1372–1375 (2018). https://doi.org/10.1093/molbev/msy084

    Article  Google Scholar 

  21. Oracle Inc.: Sun N1 Grid Engine 6.1 Administration Guide, April 2019. Accessed 26 Apr 2019

    Google Scholar 

  22. Peters, K., et al.: PhenoMeNal: processing and analysis of metabolomics data in the cloud. GigaScience, 8(2), giy149 (2018)

    Google Scholar 

  23. Piras, M.E., del Rio, M., Pireddu, L., Gaggero, M., Zanetti, G.: Manage-cluster: simple utility to help deploy Kubernetes clusters with Terraform and KubeSpray, April 2019. https://github.com/tdm-project/tdm-manage-cluster. Accessed 26 Apr 2019

  24. Silver, A.: Software simplified. Nat. News 546(7656), 173 (2017)

    Article  Google Scholar 

  25. Skamarock, W.C., et al.: A description of the advanced research WRF model, version 4. Technical report, National Center for Atmospheric Research, Boulder, CO, USA (2008)

    Google Scholar 

  26. Terraform, April 2019. https://www.terraform.io. Accessed 26 Apr 2019

  27. da Veiga Leprevost, F., et al.: BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics 33(16), 2580–2582 (2017). https://doi.org/10.1093/bioinformatics/btx192

    Article  Google Scholar 

  28. Yoo, A.B., Jette, M.A., Grondona, M.: SLURM: simple linux utility for resource management. In: Feitelson, D., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 44–60. Springer, Heidelberg (2003). https://doi.org/10.1007/10968987_3

    Chapter  Google Scholar 

  29. Zhang, J., Lu, X., Chakraborty, S., Panda, D.K.D.K.: Slurm-V: extending slurm for building efficient HPC cloud with SR-IOV and IVShmem. In: Dutot, P.-F., Trystram, D. (eds.) Euro-Par 2016. LNCS, vol. 9833, pp. 349–362. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43659-3_26

    Chapter  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the TDM project funded by Sardinian Regional Authorities under grant agreement POR FESR 2014-2020 Azione 1.2 (D. 66/14 13.12.2016 S3-ICT).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luca Pireddu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Piras, M.E., Pireddu, L., Moro, M., Zanetti, G. (2019). Container Orchestration on HPC Clusters. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34356-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34355-2

  • Online ISBN: 978-3-030-34356-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics