skip to main content
10.1145/3569951.3597597acmconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
short-paper

Voyager – An Innovative Computational Resource for Artificial Intelligence & Machine Learning Applications in Science and Engineering

Authors Info & Claims
Published:10 September 2023Publication History

ABSTRACT

Voyager is an innovative computational resource designed by the San Diego Supercomputer Center in collaboration with technology partners to accelerate the development and performance of artificial intelligence and machine learning applications in science and engineering. Based on Intel’s Habana Labs first-generation deep learning (Gaudi) training and (Goya) inference processors, Voyager is funded by the National Science Foundation’s Advanced Computing Systems & Services Program as a Category II system and will be operated for 5 years, starting with an initial 3-year exploratory test-bed phase that will be followed by a 2-year allocated production phase for the national research community. Its AI-focused hardware features several innovative components, including fully-programmable tensor processing cores, high-bandwidth memory, and integrated, on-chip RDMA over Converged Ethernet network interfaces. In addition, Habana’s SynapseAI software suite provides seamless integration to popular machine learning frameworks like PyTorch and TensorFlow for end users. Here, we describe the design motivation for Voyager, its system architecture, software and user environment, initial benchmarking results, and the early science use cases and applications currently being ported to and deployed on the system.

References

  1. 2023. Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS). https://access-ci.orgGoogle ScholarGoogle Scholar
  2. 2023. DeepSpeed. https://www.deepspeed.aiGoogle ScholarGoogle Scholar
  3. 2023. Habana Gaudi Documentation. https://docs.habana.ai/en/latestGoogle ScholarGoogle Scholar
  4. 2023. Laion2B-en. https://huggingface.co/datasets/laion/laion2B-enGoogle ScholarGoogle Scholar
  5. 2023. Training Causal Language Models on SDSC’s Gaudi-based Voyager Supercomputing Cluster. https://developer.habana.ai/blog/training-causal-language-models-on-sdscs-gaudi-based-voyager-supercomputing-cluster/Google ScholarGoogle Scholar
  6. Kaimin He et al.2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition. 770–778. https://doi.org/10.1109/CVPR.2016.90Google ScholarGoogle Scholar
  7. Olga Russakovsky et al.2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (2015), 211–252. https://doi.org/10.1007/s11263-015-0816-yGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  8. Peter Mattson et al.2020. MLPerf Training Benchmark. arxiv:1910.01500 [cs.LG]Google ScholarGoogle Scholar
  9. Intel Habana Labs. 2020. Habana Deep Learning Examples for Training and Inference. Available at https://github.com/HabanaAI/Model-References.Google ScholarGoogle Scholar
  10. Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. 2020. ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. arxiv:1910.02054 [cs.LG]Google ScholarGoogle Scholar
  11. Baidu Research. 2016. DeepBench: Benchmarking Deep Learning operations on different hardware. Available at https://github.com/baidu-research/DeepBench.Google ScholarGoogle Scholar
  12. Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, and Jeremy Kepner. 2022. AI and ML Accelerator Survey and Trends. In 2022 IEEE High Performance Extreme Computing Conference (HPEC). 1–10. https://doi.org/10.1109/HPEC55821.2022.9926331Google ScholarGoogle Scholar
  13. Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. arxiv:2112.10752 [cs.CV]Google ScholarGoogle Scholar

Index Terms

  1. Voyager – An Innovative Computational Resource for Artificial Intelligence & Machine Learning Applications in Science and Engineering

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PEARC '23: Practice and Experience in Advanced Research Computing
      July 2023
      519 pages
      ISBN:9781450399852
      DOI:10.1145/3569951

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 September 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate133of202submissions,66%

      Upcoming Conference

      PEARC '24
    • Article Metrics

      • Downloads (Last 12 months)105
      • Downloads (Last 6 weeks)19

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format