skip to main content
10.1145/3624062.3624235acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

RDARuntime: An OS for AI Accelerators

Published:12 November 2023Publication History

ABSTRACT

Today’s supercomputers are more heterogeneous than ever before. As the share of AI workloads in data centers continues to grow, the share of GPUs and AI-specific hardware grows with it. AI accelerators are different from traditional hardware, affecting all aspects of system design, from data-center scale to single-chip scale. AI accelerators are much more efficient than CPUs or GPUs for some HPC workloads, especially in AI for Science. They also add complexity to system architecture, management, and programming. Although runtime frameworks are critical to reducing system complexity, there is little literature describing AI accelerator runtimes. In this paper, we introduce RDARuntime - an AI-specific OS tailored for the development and operation of SambaNova’s reconfigurable dataflow architecture. We discuss the architecture, our design decisions, and some of the results we have achieved, along with some lessons we have learned while helping to deploy the Reconfigurable Dataflow Unit (RDU) to production environments.

References

  1. [n. d.]. Accelerated Computing with a Reconfigurable Dataflow Architecture. Retrieved July 29, 2023 from https://sambanova.ai/wp-content/uploads/2021/04/SambaNova_Accelerated-Computing-with-a-Reconfigurable-Dataflow-Architecture_Whitepaper_English.pdfGoogle ScholarGoogle Scholar
  2. [n. d.]. Data Plane Development Kit. Retrieved July 29, 2023 from https://github.com/DPDK/dpdkGoogle ScholarGoogle Scholar
  3. [n. d.]. SambaNova DataScale® SN30. Retrieved September 14, 2023 from https://sambanova.ai/wp-content/uploads/2022/09/SambaNova_DataSheet_DataScale_SN30_09132022_EN-1.pdfGoogle ScholarGoogle Scholar
  4. [n. d.]. TOP500 HIGHLIGHTS - JUNE 2023. Retrieved July 29, 2023 from https://www.top500.org/lists/top500/2023/06/highs/Google ScholarGoogle Scholar
  5. Claudio Angione, Eric Silverman, and Elisabeth Yaneske. 2022. Using machine learning as a surrogate model for agent-based simulations. Plos one 17, 2 (2022), e0263150.Google ScholarGoogle ScholarCross RefCross Ref
  6. Adel Belkhiri, Martin Pepin, Mike Bly, and Michel Dagenais. 2023. Performance analysis of DPDK-based applications through tracing. J. Parallel and Distrib. Comput. 173 (2023), 1–19. https://doi.org/10.1016/j.jpdc.2022.10.012Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ivano Cerrato, Mauro Annarumma, and Fulvio Risso. 2014. Supporting Fine-Grained Network Functions through Intel DPDK. In 2014 Third European Workshop on Software Defined Networks. 1–6. https://doi.org/10.1109/EWSDN.2014.33Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ranen Chatterjee, Ravinder Kumar, Raghunath Shenbagam, Maran Wilson, Conrad Alexander Turlik, Arnav Goel, Arjun Sabnis, and Yannan Chen. 2023. Elevated Isolation of Reconfigurable Data Flow Resources in Cloud Computing. Retrieved July 31, 2023 from https://patentimages.storage.googleapis.com/c3/26/ee/3a19bad1548112/US20230205585A1.pdf Patent No. US20230205585A1, Filed December 12, 2022, Issued June 29, 2023.Google ScholarGoogle Scholar
  9. Chi-Chung Chen, Chia-Lin Yang, and Hsiang-Yun Cheng. 2018. Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform. arXiv e-prints (2018), arXiv–1809.Google ScholarGoogle Scholar
  10. Ruining Chen and Guoao Sun. 2018. A Survey of Kernel-Bypass Techniques in Network Stack. In Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence (Shenzhen, China) (CSAI ’18). Association for Computing Machinery, New York, NY, USA, 474–477. https://doi.org/10.1145/3297156.3297242Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Murali Emani, Venkatram Vishwanath, Corey Adams, Michael E Papka, Rick Stevens, Laura Florescu, Sumti Jairath, William Liu, Tejas Nama, and Arvind Sujeeth. 2021. Accelerating scientific applications with sambanova reconfigurable dataflow architecture. Computing in Science & Engineering 23, 2 (2021), 114–119.Google ScholarGoogle ScholarCross RefCross Ref
  12. Murali Emani, Zhen Xie, Siddhisanket Raskar, Varuni Sastry, William Arnold, Bruce Wilson, Rajeev Thakur, Venkatram Vishwanath, Zhengchun Liu, Michael E. Papka, Cindy Orozco Bohorquez, Rick Weisner, Karen Li, Yongning Sheng, Yun Du, Jian Zhang, Alexander Tsyplikhin, Gurdaman Khaira, Jeremy Fowers, Ramakrishnan Sivakumar, Victoria Godsoe, Adrian Macias, Chetan Tekur, and Matthew Boyd. 2022. A Comprehensive Evaluation of Novel AI Accelerators for Deep Learning Workloads. In 2022 IEEE/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS). 13–25. https://doi.org/10.1109/PMBS56514.2022.00007Google ScholarGoogle ScholarCross RefCross Ref
  13. Gregory Frederick Grohoski, Manish K Shah, Raghu Prabhakar, Mark Luttrell, Ravinder Kumar, Kin Hing Leung, Ranen Chatterjee, Sumti Jairath, David Alan Koeplinger, Ram Sivaramakrishnan, 2022. Runtime Patching of Configuration Files. US Patent App. 16/996,666.Google ScholarGoogle Scholar
  14. Zhihao Jia, Matei Zaharia, and Alex Aiken. 2019. Beyond Data and Model Parallelism for Deep Neural Networks.. In Proceedings of Machine Learning and Systems, A. Talwalkar, V. Smith, and M. Zaharia (Eds.). Vol. 1. 1–13. https://proceedings.mlsys.org/paper_files/paper/2019/file/b422680f3db0986ddd7f8f126baaf0fa-Paper.pdfGoogle ScholarGoogle Scholar
  15. Peishi Jiang, Nis Meinert, Helga Jordão, Constantin Weisser, Simon Holgate, Alexander Lavin, Björn Lütjens, Dava Newman, Haruko Wainwright, Catherine Walker, and Patrick Barnard. 2021. Digital Twin Earth – Coasts: Developing a fast and physics-informed surrogate model for coastal floods via neural operators. arxiv:2110.07100 [physics.ao-ph]Google ScholarGoogle Scholar
  16. Poul-Henning Kamp. 1998. Malloc (3) revisited. In 1998 USENIX Annual Technical Conference (USENIX ATC 98).Google ScholarGoogle Scholar
  17. David Kirk 2007. NVIDIA CUDA software and GPU parallel computing architecture. In ISMM, Vol. 7. 103–104.Google ScholarGoogle Scholar
  18. Shen Li, Yanli Zhao, Rohan Varma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, [n. d.]. PyTorch Distributed: Experiences on Accelerating Data Parallel Training. Proceedings of the VLDB Endowment 13, 12 ([n. d.]).Google ScholarGoogle Scholar
  19. Robert Love. 2003. Kernel korner: CPU affinity. Linux Journal 2003, 111 (2003), 8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ogier Maitre and Pierre Collet. 2013. Understanding NVIDIA GPGPU Hardware. Springer Berlin Heidelberg, Berlin, Heidelberg, 15–34. https://doi.org/10.1007/978-3-642-37959-8_2Google ScholarGoogle ScholarCross RefCross Ref
  21. Anand Misra, Arnav Goel, Qi Zheng, Raghunath Shenbagam, and Ravinder Kumar. 2021. Time-Multiplexed use of Reconfigurable Hardware. Retrieved July 31, 2023 from https://patentimages.storage.googleapis.com/a0/ac/0c/06792e61002e09/US20220269534A1.pdf Patent No. US20220269534A1, Filed February 25, 2021, Issued August 25, 2022.Google ScholarGoogle Scholar
  22. Anand Misra, Conrad Alexander Turlik, Maran Wilson, Anand Vayyala, Raghu Shenbagam, Ranen Chatterjee, Pushkar Shridar Nandkar, and Shivam Raikundalia. 2022. Hot-plug events in a pool of reconfigurable data flow resources. Retrieved July 31, 2023 from https://patentimages.storage.googleapis.com/c3/26/ee/3a19bad1548112/US20230205585A1.pdf Patent No. US11487694B1, Filed December 17, 2021, Issued November 1, 2022.Google ScholarGoogle Scholar
  23. Oliver Peckham. 2022. SambaNova launches Second-Gen DataScale System. HPCWire (2022). https://www.hpcwire.com/2022/09/14/sambanova-launches-second-gen-datascale-system/Google ScholarGoogle Scholar
  24. Martin Russell Raumann, Qi Zheng, Bandish B Shah, Ravinder Kumar, Kin Hing Leung, Sumti Jairath, and Gregory Frederick Grohoski. 2021. Dataflow all-reduce for reconfigurable processor systems. Retrieved July 31, 2023 from https://patentimages.storage.googleapis.com/c3/26/ee/3a19bad1548112/US20230205585A1.pdf Patent No. US11237880B1, Filed July 19, 2021, Issued February 1, 2022.Google ScholarGoogle Scholar
  25. Hugo Sadok, Zhipeng Zhao, Valerie Choung, Nirav Atre, Daniel S. Berger, James C. Hoe, Aurojit Panda, and Justine Sherry. 2021. We Need Kernel Interposition over the Network Dataplane. In Proceedings of the Workshop on Hot Topics in Operating Systems (Ann Arbor, Michigan) (HotOS ’21). Association for Computing Machinery, New York, NY, USA, 152–158. https://doi.org/10.1145/3458336.3465281Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Arman Shehabi, Sarah J Smith, Eric Masanet, and Jonathan Koomey. 2018. Data center growth in the United States: decoupling the demand for services from electricity use. Environmental Research Letters 13, 12 (dec 2018), 124030. https://doi.org/10.1088/1748-9326/aaec9cGoogle ScholarGoogle ScholarCross RefCross Ref
  27. Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. [n. d.]. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. ([n. d.]).Google ScholarGoogle Scholar
  28. Jaspal Subhlok, James M Stichnoth, David R O’hallaron, and Thomas Gross. 1993. Exploiting task and data parallelism on a multicomputer. In Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming. 13–22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M S Vinaya, Nagavijayalakshmi Vydyanathan, and Mrugesh Gajjar. 2012. An evaluation of CUDA-enabled virtualization solutions. In 2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing. 621–626. https://doi.org/10.1109/PDGC.2012.6449892Google ScholarGoogle ScholarCross RefCross Ref
  30. Mark Wijtvliet, Henk Corporaal, and Akash Kumar. 2022. CGRA Background and Related Work. Blocks, Towards Energy-efficient, Coarse-grained Reconfigurable Architectures (2022), 15–60.Google ScholarGoogle ScholarCross RefCross Ref
  31. Michael R Wyatt, Valen Yamamoto, Zoë Tosi, Ian Karlin, and Brian Van Essen. 2021. Is disaggregation possible for HPC cognitive simulation?. In 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC). IEEE, 94–105.Google ScholarGoogle ScholarCross RefCross Ref
  32. Yanli Zhao, Andrew Gu, Rohan Varma, Liang Luo, Chien-Chin Huang, Min Xu, Less Wright, Hamid Shojanazeri, Myle Ott, Sam Shleifer, 2023. Pytorch FSDP: experiences on scaling fully sharded data parallel. arXiv preprint arXiv:2304.11277 (2023).Google ScholarGoogle Scholar

Index Terms

  1. RDARuntime: An OS for AI Accelerators

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Other conferences
                  SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
                  November 2023
                  2180 pages
                  ISBN:9798400707858
                  DOI:10.1145/3624062

                  Copyright © 2023 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 12 November 2023

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article
                  • Research
                  • Refereed limited
                • Article Metrics

                  • Downloads (Last 12 months)65
                  • Downloads (Last 6 weeks)18

                  Other Metrics

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader

                HTML Format

                View this article in HTML Format .

                View HTML Format