research-article

Oversubscribing GPU Unified Virtual Memory: Implications and Suggestions

Authors:
Chuanming Shao

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Jinyang Guo

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Pengyu Wang

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Jing Wang

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Chao Li

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Minyi Guo

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

ICPE '22: Proceedings of the 2022 ACM/SPEC on International Conference on Performance EngineeringApril 2022Pages 67–75https://doi.org/10.1145/3489525.3511691

Published:09 April 2022Publication History

ICPE '22: Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering

Pages 67–75

ABSTRACT

Recent GPU architectures support unified virtual memory (UVM), which offers great opportunities to solve larger problems by memory oversubscription. Although some studies are concerned over the performance degradation under UVM oversubscription, the reasons behind workloads' diverse sensitivities to oversubscription is still unclear. In this work, we take the first step to select various benchmark applications and conduct rigorous experiments on their performance under different oversubscription ratios. Specifically,we take into account the variety of memory access patterns and explain applications' diverse sensitivities to oversubscription. We also consider prefetching and UVM hints, and discover their complex impact under different oversubscription ratios. Moreover, the strengths and pitfalls of UVM's multi-GPU support are discussed. We expect that this paper will provide useful experiences and insights for UVM system design.

References

[n.d.]. CUDA C++ programming guide. http://docs.nvidia.com/cuda/cuda-cprogramming- guide/index.htmlGoogle Scholar
[n.d.]. GCN architecture. https://www.amd.com/en/technologies/gcnGoogle Scholar
[n.d.]. List of Nvidia Graphics Processing Units. https://en.wikipedia.org/wiki/ List_of_Nvidia_graphics_processing_unitsGoogle Scholar
[n.d.]. NVIDIA Pascal GPU architecture. https://www.nvidia.com/en-us/datacenter/ pascal-gpu-architecture/Google Scholar
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. Lee, and K. Skadron. 2009. Rodinia: a benchmark suite for heterogeneous computing. In 2009 IEEE International Symposium on Workload Characterization (IISWC). 44--54.Google Scholar
Long Chen, Oreste Villa, Sriram Krishnamoorthy, and Guang R. Gao. 2010. Dynamic load balancing on single- and multi-GPU systems. In 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS). 1--12. https: //doi.org/10.1109/IPDPS.2010.5470413Google ScholarCross Ref
Steven W. D. Chien, Ivy B. Peng, and Stefano Markidis. 2019. Performance evaluation of advanced features in CUDA unified memory. 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC) (2019), 50--57. https://doi.org/10.1109/MCHPC49590.2019.00014 arXiv:1910.09598Google ScholarCross Ref
Debashis Ganguly, Ziyu Zhang, Jun Yang, and Rami Melhem. 2019. Interplay between Hardware Prefetcher and Page Eviction Policy in CPU-GPU Unified Virtual Memory. In Proceedings of the 46th International Symposium on Computer Architecture (Phoenix Arizona). ACM, 224--235. https://doi.org/10.1145/3307650. 3322224Google ScholarDigital Library
Debashis Ganguly, Ziyu Zhang, Jun Yang, and Rami Melhem. 2020. Adaptive page migration for irregular data-intensive applications under GPU memory oversubscription. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 451--461. https://doi.org/10.1109/IPDPS47924.2020.00054Google ScholarCross Ref
Yongbin Gu, Wenxuan Wu, Yunfan Li, and Lizhong Chen. 2020. UVMBench: A Comprehensive Benchmark Suite for Researching Unified Virtual Memory in GPUs. arXiv preprint arXiv:2007.09822 (2020).Google Scholar
Hyojong Kim, Jaewoong Sim, Prasun Gera, Ramyad Hadidi, and Hyesoon Kim. 2020. Batch-aware unified memory management in GPUs for irregular workloads. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne Switzerland). ACM, 1357--1370. https://doi.org/10.1145/3373376.3378529Google ScholarDigital Library
Raphael Landaverde, Tiansheng Zhang, Ayse K Coskun, and Martin Herbordt. 2014. An investigation of unified memory access performance in cuda. In 2014 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1--6.Google ScholarCross Ref
A. Li, S. L. Song, J. Chen, J. Li, X. Liu, N. R. Tallent, and K. J. Barker. 2020. Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect. IEEE Transactions on Parallel and Distributed Systems 31, 1 (2020), 94--110. https://doi.org/10.1109/TPDS.2019.2928289Google ScholarDigital Library
Chen Li, Rachata Ausavarungnirun, Christopher J. Rossbach, Youtao Zhang, Onur Mutlu, Yang Guo, and Jun Yang. 2019. A framework for memory oversubscription management in graphics processing units. In Proceedings of the Twenty- Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (New York, NY, USA) (ASPLOS '19). Association for Computing Machinery, 49--63. https://doi.org/10.1145/3297858.3304044Google ScholarDigital Library
Yuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang, and John D. Owens. 2017. Multi-GPU Graph Analytics. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 479--490. https://doi.org/10.1109/IPDPS.2017.117Google Scholar
Louis-Noël Pouchet et al. 2012. Polybench: The polyhedral benchmark suite. URL: http://www.cs.ucla.edu/pouchet/software/polybench 437 (2012).Google Scholar
Amir Hossein Nodehi Sabet, Junqiao Qiu, and Zhijia Zhao. 2018. Tigr: transforming irregular Graphs for GPU-Friendly Graph Processing. ACM SIGPLAN Notices 53, 2 (2018), 622--636. https://doi.org/10.1145/3173162.3173180Google ScholarDigital Library
Jeff A. Stuart and John D. Owens. 2011. Multi-GPU MapReduce on GPU Clusters. In 2011 IEEE International Parallel Distributed Processing Symposium (IPDPS). 1068--1079. https://doi.org/10.1109/IPDPS.2011.102Google ScholarDigital Library
Yifan Sun, Trinayan Baruah, Saiful A. Mojumder, Shi Dong, Xiang Gong, Shane Treadway, Yuhui Bao, Spencer Hance, Carter McCardwell, Vincent Zhao, Harrison Barclay, Amir Kavyan Ziabari, Zhongliang Chen, Rafael Ubal, José L. Abellán, John Kim, Ajay Joshi, and David Kaeli. 2019. MGPUSim: Enabling Multi- GPU Performance Modeling and Optimization. In Proceedings of the 46th International Symposium on Computer Architecture (Phoenix, Arizona) (ISCA '19). Association for Computing Machinery, New York, NY, USA, 197--209. https: //doi.org/10.1145/3307650.3322230Google ScholarDigital Library
Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, and Tim Kraska. 2018. Superneurons: Dynamic GPU Memory Management for Training Deep Neural Networks. SIGPLAN Not. 53, 1 (2018), 41--53. https://doi.org/10.1145/3200691.3178491Google ScholarDigital Library
Pengyu Wang, Jing Wang, Chao Li, Jianzong Wang, Haojin Zhu, and Minyi Guo. 2021. Grus: Toward Unified-memory-efficient High-performance Graph Processing on GPU. ACM Transactions on Architecture and Code Optimization (TACO) 18, 2 (2021), 1--25.Google ScholarDigital Library
Pengyu Wang, Lu Zhang, Chao Li, and Minyi Guo. 2019. Excavating the potential of GPU for accelerating graph traversal. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 221--230.Google ScholarCross Ref
Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2015. Gunrock: a High-Performance Graph Processing Library on the GPU. Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2015-Janua (2015), 265--266. https:// doi.org/10.1145/2688500.2688538 arXiv:1501.05387v6Google ScholarDigital Library
Hailu Xu, Murali Emani, Pei-Hung Lin, Liting Hu, and Chunhua Liao. 2019. Machine learning guided optimal use of GPU unified memory. In 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC). IEEE, 64-- 70. https://doi.org/10.1109/MCHPC49590.2019.00016Google ScholarCross Ref
Tianhao Zheng, David Nellans, Arslan Zulfiqar, Mark Stephenson, and Stephen W. Keckler. 2016. Towards High Performance Paged Memory for GPUs. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 345--357. https://doi.org/10.1109/HPCA.2016.7446077Google Scholar

Index Terms

Oversubscribing GPU Unified Virtual Memory: Implications and Suggestions
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Single instruction, multiple data
2. Hardware
  1. Communication hardware, interfaces and storage
    1. External storage

Recommendations

Interplay between hardware prefetcher and page eviction policy in CPU-GPU unified virtual memory
ISCA '19: Proceedings of the 46th International Symposium on Computer Architecture

Memory capacity in GPGPUs is a major challenge for data-intensive applications with their ever increasing memory requirement. To fit a workload into the limited GPU memory space, a programmer needs to manually divide the workload by tiling the working ...
Read More
An Intelligent Framework for Oversubscription Management in CPU-GPU Unified Memory
Abstract
Unified virtual memory (UVM) improves GPU programmability by enabling on-demand data movement between CPU memory and GPU memory. However, due to the limited capacity of GPU device memory, oversubscription overhead becomes a major performance ...
Read More
Automatic memory-based vertical elasticity and oversubscription on cloud platforms

Hypervisors and Operating Systems support vertical elasticity techniques such as memory ballooning to dynamically assign the memory of Virtual Machines (VMs). However, current Cloud Management Platforms (CMPs), such as OpenNebula or OpenStack, do not ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICPE '22: Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering
April 2022
242 pages
ISBN:9781450391436
DOI:10.1145/3489525
General Chairs:
Dan Feng
Huazhong University of Science and Technology, China
,
Steffen Becker
University of Stuttgart, Germany
,
Program Chairs:
Nikolas Herbst
University of Würzburg, Germany
,
Philipp Leitner
Chalmers and University of Gothenburg
,
Publications Chair:
Alessandro Papadopoulos
Mälardalen University, Sweden
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 April 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Best Paper
Author Tags
characterization
gpu
oversubscription
resource sharing
unified virtual memory
Qualifiers
- research-article
Conference

Acceptance Rates
ICPE '22 Paper Acceptance Rate14of58submissions,24%Overall Acceptance Rate252of851submissions,30%
More
Upcoming Conference
ICPE '24

Sponsor:

sigsoft online

sigsoft online

15th ACM/SPEC International Conference on Performance Engineering

May 7 - 11, 2024

London , United Kingdom
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 488
  Total Downloads
- Downloads (Last 12 months)196
- Downloads (Last 6 weeks)16
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Oversubscribing GPU Unified Virtual Memory: Implications and Suggestions

ICPE '22: Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Interplay between hardware prefetcher and page eviction policy in CPU-GPU unified virtual memory

An Intelligent Framework for Oversubscription Management in CPU-GPU Unified Memory

Automatic memory-based vertical elasticity and oversubscription on cloud platforms