skip to main content
10.1145/3569951.3597589acmconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
short-paper

Rebuilding Bridges: The tools used to deploy and maintain Bridges-2

Published: 10 September 2023 Publication History

Abstract

Bridges-2 is an NSF-funded heterogeneous supercomputing cluster at the Pittsburgh Supercomputing Center. The successor to the Bridges system (2014-2021), Bridges-2 builds on the flexibility demonstrated by its predecessor to support a wide variety of scientific workflows. This paper, building on a 2017 overview of the infrastructure supporting the original Bridges, is intended as a mid-cycle overview of the infrastructure developed to support the Bridges-2 project. It covers the lessons learned from the predecessor system, the initial design and development of the support infrastructure, modifications and improvements made over the last two years of production operations, and how those improvements have been shared with other systems and projects across the Pittsburgh Supercomputing Center.

References

[1]
Prometheus Authors 2023. Alertmanager Documentation. Prometheus Authors. Retrieved April 20, 2023 from https://prometheus.io/docs/alerting/latest/alertmanager/
[2]
Red Hat, Inc. 2023. Ansible website. Red Hat, Inc. Retrieved April 20, 2023 from https://www.ansible.com/
[3]
Kathy Benninger, Greg Hood, Derek Simmel, Luke Tuite, Arthur Wetzel, Alexander Ropelewski, Simon Watkins, Alan Watson, and Marcel Bruchez. 2020. Cyberinfrastructure of a Multi-Petabyte Microscopy Resource for Neuroscience Research. In Practice and Experience in Advanced Research Computing. ACM, New York, NY, USA, 1–7. https://doi.org/10.1145/3311790.3396653
[4]
Shawn T. Brown, Paola Buitrago, Edward Hanna, Sergiu Sanielevici, Robin Scibek, and Nicholas A. Nystrom. 2021. Bridges-2: A Platform for Rapidly-Evolving and Data Intensive Research. In Practice and Experience in Advanced Research Computing. ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3437359.3465593
[5]
Paola A. Buitrago and Nicholas A. Nystrom. 2021. Neocortex and Bridges-2: A High Performance AI+HPC Ecosystem for Science, Discovery, and Societal Good. In Communications in Computer and Information Science. Springer International Publishing, Cham, 205–219. https://doi.org/10.1007/978-3-030-68035-0_15
[6]
Paola A. Buitrago, Nicholas A. Nystrom, Rajarsi Gupta, and Joel Saltz. 2020. Delivering Scalable Deep Learning to Research with Bridges-AI. In Communications in Computer and Information Science. Springer International Publishing, Cham, 200–214. https://doi.org/10.1007/978-3-030-41005-6_14
[7]
Progress Software Co. 2023. Chef website. Progress Software Co. Retrieved April 20, 2023 from https://www.chef.io/
[8]
HuBMAP Consortium. 2019. The human body at cellular resolution: the NIH Human Biomolecular Atlas Program. Nature 574, 7777 (Oct. 2019), 187–192. https://doi.org/10.1038/s41586-019-1629-x
[9]
Perforce Software, Inc. 2023. eYAML source code. Perforce Software, Inc. Retrieved April 20, 2023 from https://github.com/voxpupuli/hiera-eyaml/
[10]
Git Project 2023. Git website. Git Project. Retrieved April 20, 2023 from https://git-scm.com/
[11]
GitLab B.V. 2023. GitLab website. GitLab B.V. Retrieved April 20, 2023 from https://about.gitlab.com/
[12]
Grafana Labs 2023. Grafana website. Grafana Labs. Retrieved April 20, 2023 from https://grafana.com/
[13]
InfluxData Inc. 2023. InfluxDB website. InfluxData Inc. Retrieved April 20, 2023 from https://www.influxdata.com/
[14]
Naemon project 2023. Naemon website. Naemon project. Retrieved April 20, 2023 from https://www.naemon.io/
[15]
Netbox Project 2023. Netbox website. Netbox Project. Retrieved April 20, 2023 from https://netbox.dev/
[16]
Nicholas A. Nystrom, Michael J. Levine, Ralph Z. Roskies, and J. Ray Scott. 2015. Bridges. In Proceedings of the 2015 XSEDE Conference on Scientific Advancements Enabled by Enhanced Cyberinfrastructure - XSEDE '15. ACM Press, New York, NY, USA, Article 30, 8 pages. https://doi.org/10.1145/2792745.2792775
[17]
OpenHPC Project 2023. OpenHPC website. OpenHPC Project. Retrieved April 20, 2023 from https://openhpc.community/
[18]
OpenInfra Foundation 2023. OpenStack website. OpenInfra Foundation. Retrieved April 20, 2023 from https://www.openstack.org/
[19]
Atlassian 2023. Opsgenie website. Atlassian. Retrieved April 20, 2023 from https://www.atlassian.com/software/opsgenie
[20]
oVirt project 2023. oVirt website. oVirt project. Retrieved April 20, 2023 from https://www.ovirt.org/
[21]
Prometheus Authors 2023. Prometheus website. Prometheus Authors. Retrieved April 20, 2023 from https://prometheus.io/
[22]
Perforce Software, Inc. 2023. Puppet website. Perforce Software, Inc. Retrieved April 20, 2023 from https://www.puppet.com/
[23]
Perforce Software, Inc. 2023. Puppet documentation: Designing system configs (roles and profiles). Perforce Software, Inc. Retrieved June 9, 2023 from https://www.puppet.com/docs/puppet/7/designing_system_configs_roles_and_profiles.html
[24]
Perforce Software, Inc. 2023. Puppet Forge website. Perforce Software, Inc. Retrieved April 20, 2023 from https://forge.puppet.com/
[25]
Perforce Software, Inc. 2023. Puppet Strings source. Perforce Software, Inc. Retrieved April 20, 2023 from https://github.com/puppetlabs/puppet-strings
[26]
Perforce Software, Inc. 2023. r10k source code. Perforce Software, Inc. Retrieved April 20, 2023 from https://github.com/puppetlabs/r10k
[27]
Swiss National Supercomputing Centre 2023. ReFrame documentation. Swiss National Supercomputing Centre. Retrieved April 20, 2023 from https://reframe-hpc.readthedocs.io/en/stable/
[28]
RSpec Project 2023. RSpec website. RSpec Project. Retrieved April 20, 2023 from https://rspec.info/
[29]
Rubocop Project 2023. Rubocop website. Rubocop Project. Retrieved April 20, 2023 from https://rubocop.org/
[30]
VMWare, Inc. 2023. Salt website. VMWare, Inc. Retrieved April 20, 2023 from https://saltproject.io/
[31]
Slack Technologies, LLC. 2023. Slack website. Slack Technologies, LLC. Retrieved April 20, 2023 from https://slack.com
[32]
College of Engineering, Carnegie Mellon University 2023. Tartan Research Advanced Computing Environment (TRACE) website. College of Engineering, Carnegie Mellon University. Retrieved April 20, 2023 from https://www.cmu.edu/engineering/trace/
[33]
Richard Underwood. 2017. Building Bridges. In Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact. ACM, New York, NY, USA, Article 5, 5 pages. https://doi.org/10.1145/3093338.3093339
[34]
McWilliams Center for Cosmology, Carnegie Mellon University 2023. Vera documentation. McWilliams Center for Cosmology, Carnegie Mellon University. Retrieved April 20, 2023 from https://vera-doc.psc.edu/
[35]
Warewulf Project 2023. Warewulf 3 website. Warewulf Project. Retrieved April 20, 2023 from https://warewulf.lbl.gov/
[36]
xCAT Community 2023. xCAT website. xCAT Community. Retrieved April 20, 2023 from https://xcat.org/

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PEARC '23: Practice and Experience in Advanced Research Computing 2023: Computing for the Common Good
July 2023
519 pages
ISBN:9781450399852
DOI:10.1145/3569951
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 September 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Configuration Management
  2. High Performance Computing
  3. System Administration
  4. Version Control

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

PEARC '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 133 of 202 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 54
    Total Downloads
  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)5
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media