skip to main content
10.1145/3491418.3535150acmconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article

Early Experiences with Tight Integration of Kubernetes in an HPC Environment

Published: 08 July 2022 Publication History

Abstract

The Ohio Supercomputer Center has deployed a Kubernetes cluster with tight integration to a high performance computing (HPC) environment. This deployment leverages existing file systems for data sharing between HPC systems and Kubernetes objects, monitoring, account management, resource management, and accounting systems. This paper describes the motivation and overall design, the novel methods for the implementation, and the applications supported by this new resource. It also presents a short description of future work and some of the questions raised by this design.

References

[1]
Harbor Authors. 2022. Harbor. https://goharbor.io/
[2]
Helm Authors. 2022. Helm. https://helm.sh/
[3]
The Kubernetes Authors. 2022. Kubernetes: Production-Grade Container Orchestration. https://kubernetes.io/
[4]
Joe Breen, Lincoln Bryant, Gabriele Carcassi, Jiahui Chen, Robert W Gardner, Ryan Harden, Martin Izdimirski, Robert Killen, Ben Kulbertis, Shawn McKee, 2018. Building the SLATE Platform. In Proceedings of the Practice and Experience on Advanced Research Computing. 1–7.
[5]
Ewa Deelman, Karan Vahi, Gideon Juve, Mats Rynge, Scott Callaghan, Philip J. Maechling, Rajiv Mayani, Weiwei Chen, Rafael Ferreira da Silva, Miron Livny, and Kent Wenger. 2015. Pegasus, a workflow management system for science automation. Future Generation Computer Systems 46 (2015), 17–35. https://doi.org/10.1016/j.future.2014.10.008
[6]
Paolo Di Tommaso, Maria Chatzou, Evan W Floden, Pablo Prieto Barja, Emilio Palumbo, and Cedric Notredame. 2017. Nextflow enables reproducible computational workflows. Nature biotechnology 35, 4 (2017), 316–319.
[7]
Trey Dockendorf. 2022. k8-ldap-configmap. https://github.com/OSC/k8-ldap-configmap
[8]
Red Hat. 2022. Keycloak. https://www.keycloak.org/
[9]
David E. Hudak, Thomas Bitterman, Patricia Carey, Douglas Johnson, Eric Franz, Shaun Brady, and Piyush Diwan. 2013. OSC OnDemand: A Web Platform Integrating Access to HPC Systems, Web and VNC Applications. In Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery(XSEDE ’13). ACM, New York, NY, USA, Article 49, 6 pages. https://doi.org/10.1145/2484762.2484780
[10]
Johannes Köster and Sven Rahmann. 2012. Snakemake – a scalable bioinformatics workflow engine. Bioinformatics 28, 19 (2012), 2520–2522.
[11]
Kubernetes. 2022. Kubernetes Pod Security Standards. https://kubernetes.io/docs/concepts/security/pod-security-standards/
[12]
Gregory M Kurtzer, Vanessa Sochat, and Michael W Bauer. 2017. Singularity: Scientific containers for mobility of compute. PloS one 12, 5 (2017), e0177459.
[13]
Kyverno. 2022. Kyverno. https://kyverno.io/
[14]
letsencrypt. 2022. Let’s Encrypt. https://letsencrypt.org/
[15]
George Papadimitriou, Karan Vahi, Jason Kincl, Valentine Anantharaj, Ewa Deelman, and Jack Wells. 2020. Workflow Submit Nodes as a Service on Leadership Class Systems. In Practice and Experience in Advanced Research Computing. 56–63.
[16]
Prometheus. 2022. Prometheus. https://prometheus.io/
[17]
Robert E Settlage, Alan Chalker, Jeff Ohrstrom, Eric Franz, Doug Johnson, and David Hudak. 2021. Open OnDemand as a Platform for Virtual Learning in Higher Education. In Proceedings of Sixth International Congress on Information and Communication Technology: ICICT 2021, London, Vol. 3. Springer, 323–331.
[18]
Preston M Smith, Erik Gough, Alexander Younts, Brian Werts, Thomas J Hacker, Norbert Neumeister, and Jennifer Wisecaver. 2020. The “Geddes” Composable Platform-An Evolution of Community Clusters for a Composable World. In 2020 IEEE/ACM International Workshop on Interoperability of Supercomputing and Cloud Technologies (SuperCompCloud). IEEE, 33–38.

Cited By

View all
  • (2024)Stable Diffusion in the Classroom: Deploying interactive GPU-enabled ML workloads with Open OnDemand and KubernetesPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670526(1-8)Online publication date: 17-Jul-2024
  • (2024)Aggregate Monitoring for Geo-Distributed Kubernetes Cluster FederationsIEEE Transactions on Cloud Computing10.1109/TCC.2024.348257412:4(1449-1462)Online publication date: Oct-2024
  • (2024)Compliance Validation in the Service Mesh Architecture2024 IEEE Symposium on Product Compliance Engineering - (SPCE Bloomington)10.1109/IEEECONF63668.2024.10739633(1-5)Online publication date: 8-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PEARC '22: Practice and Experience in Advanced Research Computing 2022: Revolutionary: Computing, Connections, You
July 2022
455 pages
ISBN:9781450391610
DOI:10.1145/3491418
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 July 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cyberinfrastructure
  2. distributed computing
  3. high performance computing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

PEARC '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 133 of 202 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)7
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Stable Diffusion in the Classroom: Deploying interactive GPU-enabled ML workloads with Open OnDemand and KubernetesPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670526(1-8)Online publication date: 17-Jul-2024
  • (2024)Aggregate Monitoring for Geo-Distributed Kubernetes Cluster FederationsIEEE Transactions on Cloud Computing10.1109/TCC.2024.348257412:4(1449-1462)Online publication date: Oct-2024
  • (2024)Compliance Validation in the Service Mesh Architecture2024 IEEE Symposium on Product Compliance Engineering - (SPCE Bloomington)10.1109/IEEECONF63668.2024.10739633(1-5)Online publication date: 8-Oct-2024
  • (2023)AdapPF: Self-Adaptive Scrape Interval for Monitoring in Geo-Distributed Cluster Federations2023 IEEE Symposium on Computers and Communications (ISCC)10.1109/ISCC58397.2023.10218080(417-423)Online publication date: 9-Jul-2023
  • (2023)MyKSC: Disaggregated Containerized Supercomputer PlatformWeb Services – ICWS 202310.1007/978-3-031-44836-2_6(83-91)Online publication date: 23-Sep-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media