ABSTRACT
Paged memory systems for GPUs like NVIDIA’s Unified Virtual Memory, offer a simple method for programmers to create out-of-core programs on GPUs. In the case of storage backed approaches, these systems can even handle larger than host memory systems as NVMe is used to back GPU memory through RDMA. However, paged memory systems can struggle with irregular access patterns. In this work, we analyze the limitations of paged, RDMA-backed GPU memory for out-of-core, irregular workloads, through a case study of GNN training. We highlight the key limitations of these systems that must be overcome before the true potential of RDMA backed GPU memory can be realized in a paged memory architecture.
- Tyler Allen and Rong Ge. 2021. In-Depth Analyses of Unified Virtual Memory System for GPU Accelerated Computing. In SC21: International Conference for High Performance Computing, Networking, Storage and Analysis. 1–14. https://doi.org/10.1145/3458817.3480855Google ScholarDigital Library
- Jack Choquette. 2023. NVIDIA Hopper H100 GPU: Scaling Performance. IEEE Micro 43, 3 (2023), 9–17. https://doi.org/10.1109/MM.2023.3256796Google ScholarDigital Library
- David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. Advances in neural information processing systems 28 (2015).Google Scholar
- Thomas Gaudelet, Ben Day, Arian R Jamasb, Jyothish Soman, Cristian Regep, Gertrude Liu, Jeremy BR Hayter, Richard Vickers, Charles Roberts, Jian Tang, 2021. Utilizing graph machine learning within drug discovery and development. Briefings in bioinformatics 22, 6 (2021), bbab159.Google Scholar
- Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. Advances in neural information processing systems 30 (2017).Google Scholar
- Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems 33 (2020), 22118–22133.Google Scholar
- Youjie Li, Amar Phanishayee, Derek Murray, Jakub Tarnawski, and Nam Sung Kim. [n. d.]. Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers. ([n. d.]).Google Scholar
- Pak Markthub. 2019. Improving GPU-NVMe Data Transfer in Unified Virtual Memory Space. Technical Report.Google Scholar
- Pak Markthub, Mehmet E Belviranli, Seyong Lee, Jeffrey S Vetter, and Satoshi Matsuoka. 2018. DRAGON: breaking GPU memory capacity limits with direct NVM access. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 414–426.Google ScholarDigital Library
- Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, and Wen-mei Hwu. 2021. Large graph convolutional network training with gpu-oriented data communication architecture. arXiv preprint arXiv:2103.03330 (2021).Google Scholar
- Jeongmin Brian Park, Vikram Sharma Mailthody, Zaid Qureshi, and Wen-mei Hwu. 2023. Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses. arXiv preprint arXiv:2306.16384 (2023).Google Scholar
- Benjamin Sanchez-Lengeling, Emily Reif, Adam Pearce, and Alexander B. Wiltschko. 2021. A Gentle Introduction to Graph Neural Networks. Distill (2021). https://doi.org/10.23915/distill.00033 https://distill.pub/2021/gnn-intro.Google ScholarCross Ref
- Yingxia Shao, Hongzheng Li, Xizhi Gu, Hongbo Yin, Yawen Li, Xupeng Miao, Wentao Zhang, Bin Cui, and Lei Chen. 2022. Distributed Graph Neural Network Training: A Survey. arXiv preprint arXiv:2211.00216 (2022).Google Scholar
- Roger Waleffe, Jason Mohoney, Theodoros Rekatsinas, and Shivaram Venkataraman. 2023. MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks. In Eighteenth European Conference on Computer Systems (EuroSys’ 23).Google Scholar
- Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 974–983.Google ScholarDigital Library
- Dalong Zhang, Xin Huang, Ziqi Liu, Zhiyang Hu, Xianzheng Song, Zhibang Ge, Zhiqiang Zhang, Lin Wang, Jun Zhou, Yang Shuang, 2020. Agl: a scalable system for industrial-purpose graph machine learning. arXiv preprint arXiv:2003.02454 (2020).Google Scholar
Index Terms
- Exploring Page-based RDMA for Irregular GPU Workloads. A case study on NVMe-backed GNN Execution
Recommendations
Accelerating Performance of GPU-based Workloads Using CXL
FlexScience '23: Proceedings of the 13th Workshop on AI and Scientific Computing at Scale using Flexible ComputingHigh-performance computing (HPC) workloads such as scientific simulations and deep learning (DL) running across multi-GPU systems are memory and data-intensive, relying on the main memory to complement its limited onboard high-bandwidth memory (HBM). To ...
Exploring Efficient Architectures on Remote In-Memory NVM over RDMA
Special Issue ESWEEK 2021, CASES 2021, CODES+ISSS 2021 and EMSOFT 2021Efficiently accessing remote file data remains a challenging problem for data processing systems. Development of technologies in non-volatile dual in-line memory modules (NVDIMMs), in-memory file systems, and RDMA networks provide new opportunities ...
ScaleStore: A Fast and Cost-Efficient Storage Engine using DRAM, NVMe, and RDMA
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataIn this paper, we propose ScaleStore, a novel distributed storage engine that exploits DRAM caching, NVMe storage, and RDMA networking to achieve high performance, cost-efficiency, and scalability at the same time. Using low latency RDMA messages, ...
Comments