skip to main content
10.1145/3489525.3511691acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article
Best Paper

Oversubscribing GPU Unified Virtual Memory: Implications and Suggestions

Authors Info & Claims
Published:09 April 2022Publication History

ABSTRACT

Recent GPU architectures support unified virtual memory (UVM), which offers great opportunities to solve larger problems by memory oversubscription. Although some studies are concerned over the performance degradation under UVM oversubscription, the reasons behind workloads' diverse sensitivities to oversubscription is still unclear. In this work, we take the first step to select various benchmark applications and conduct rigorous experiments on their performance under different oversubscription ratios. Specifically,we take into account the variety of memory access patterns and explain applications' diverse sensitivities to oversubscription. We also consider prefetching and UVM hints, and discover their complex impact under different oversubscription ratios. Moreover, the strengths and pitfalls of UVM's multi-GPU support are discussed. We expect that this paper will provide useful experiences and insights for UVM system design.

References

  1. [n.d.]. CUDA C++ programming guide. http://docs.nvidia.com/cuda/cuda-cprogramming- guide/index.htmlGoogle ScholarGoogle Scholar
  2. [n.d.]. GCN architecture. https://www.amd.com/en/technologies/gcnGoogle ScholarGoogle Scholar
  3. [n.d.]. List of Nvidia Graphics Processing Units. https://en.wikipedia.org/wiki/ List_of_Nvidia_graphics_processing_unitsGoogle ScholarGoogle Scholar
  4. [n.d.]. NVIDIA Pascal GPU architecture. https://www.nvidia.com/en-us/datacenter/ pascal-gpu-architecture/Google ScholarGoogle Scholar
  5. S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. Lee, and K. Skadron. 2009. Rodinia: a benchmark suite for heterogeneous computing. In 2009 IEEE International Symposium on Workload Characterization (IISWC). 44--54.Google ScholarGoogle Scholar
  6. Long Chen, Oreste Villa, Sriram Krishnamoorthy, and Guang R. Gao. 2010. Dynamic load balancing on single- and multi-GPU systems. In 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS). 1--12. https: //doi.org/10.1109/IPDPS.2010.5470413Google ScholarGoogle ScholarCross RefCross Ref
  7. Steven W. D. Chien, Ivy B. Peng, and Stefano Markidis. 2019. Performance evaluation of advanced features in CUDA unified memory. 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC) (2019), 50--57. https://doi.org/10.1109/MCHPC49590.2019.00014 arXiv:1910.09598Google ScholarGoogle ScholarCross RefCross Ref
  8. Debashis Ganguly, Ziyu Zhang, Jun Yang, and Rami Melhem. 2019. Interplay between Hardware Prefetcher and Page Eviction Policy in CPU-GPU Unified Virtual Memory. In Proceedings of the 46th International Symposium on Computer Architecture (Phoenix Arizona). ACM, 224--235. https://doi.org/10.1145/3307650. 3322224Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Debashis Ganguly, Ziyu Zhang, Jun Yang, and Rami Melhem. 2020. Adaptive page migration for irregular data-intensive applications under GPU memory oversubscription. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 451--461. https://doi.org/10.1109/IPDPS47924.2020.00054Google ScholarGoogle ScholarCross RefCross Ref
  10. Yongbin Gu, Wenxuan Wu, Yunfan Li, and Lizhong Chen. 2020. UVMBench: A Comprehensive Benchmark Suite for Researching Unified Virtual Memory in GPUs. arXiv preprint arXiv:2007.09822 (2020).Google ScholarGoogle Scholar
  11. Hyojong Kim, Jaewoong Sim, Prasun Gera, Ramyad Hadidi, and Hyesoon Kim. 2020. Batch-aware unified memory management in GPUs for irregular workloads. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne Switzerland). ACM, 1357--1370. https://doi.org/10.1145/3373376.3378529Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Raphael Landaverde, Tiansheng Zhang, Ayse K Coskun, and Martin Herbordt. 2014. An investigation of unified memory access performance in cuda. In 2014 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  13. A. Li, S. L. Song, J. Chen, J. Li, X. Liu, N. R. Tallent, and K. J. Barker. 2020. Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect. IEEE Transactions on Parallel and Distributed Systems 31, 1 (2020), 94--110. https://doi.org/10.1109/TPDS.2019.2928289Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chen Li, Rachata Ausavarungnirun, Christopher J. Rossbach, Youtao Zhang, Onur Mutlu, Yang Guo, and Jun Yang. 2019. A framework for memory oversubscription management in graphics processing units. In Proceedings of the Twenty- Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (New York, NY, USA) (ASPLOS '19). Association for Computing Machinery, 49--63. https://doi.org/10.1145/3297858.3304044Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang, and John D. Owens. 2017. Multi-GPU Graph Analytics. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 479--490. https://doi.org/10.1109/IPDPS.2017.117Google ScholarGoogle Scholar
  16. Louis-Noël Pouchet et al. 2012. Polybench: The polyhedral benchmark suite. URL: http://www.cs.ucla.edu/pouchet/software/polybench 437 (2012).Google ScholarGoogle Scholar
  17. Amir Hossein Nodehi Sabet, Junqiao Qiu, and Zhijia Zhao. 2018. Tigr: transforming irregular Graphs for GPU-Friendly Graph Processing. ACM SIGPLAN Notices 53, 2 (2018), 622--636. https://doi.org/10.1145/3173162.3173180Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jeff A. Stuart and John D. Owens. 2011. Multi-GPU MapReduce on GPU Clusters. In 2011 IEEE International Parallel Distributed Processing Symposium (IPDPS). 1068--1079. https://doi.org/10.1109/IPDPS.2011.102Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yifan Sun, Trinayan Baruah, Saiful A. Mojumder, Shi Dong, Xiang Gong, Shane Treadway, Yuhui Bao, Spencer Hance, Carter McCardwell, Vincent Zhao, Harrison Barclay, Amir Kavyan Ziabari, Zhongliang Chen, Rafael Ubal, José L. Abellán, John Kim, Ajay Joshi, and David Kaeli. 2019. MGPUSim: Enabling Multi- GPU Performance Modeling and Optimization. In Proceedings of the 46th International Symposium on Computer Architecture (Phoenix, Arizona) (ISCA '19). Association for Computing Machinery, New York, NY, USA, 197--209. https: //doi.org/10.1145/3307650.3322230Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, and Tim Kraska. 2018. Superneurons: Dynamic GPU Memory Management for Training Deep Neural Networks. SIGPLAN Not. 53, 1 (2018), 41--53. https://doi.org/10.1145/3200691.3178491Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Pengyu Wang, Jing Wang, Chao Li, Jianzong Wang, Haojin Zhu, and Minyi Guo. 2021. Grus: Toward Unified-memory-efficient High-performance Graph Processing on GPU. ACM Transactions on Architecture and Code Optimization (TACO) 18, 2 (2021), 1--25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Pengyu Wang, Lu Zhang, Chao Li, and Minyi Guo. 2019. Excavating the potential of GPU for accelerating graph traversal. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 221--230.Google ScholarGoogle ScholarCross RefCross Ref
  23. Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2015. Gunrock: a High-Performance Graph Processing Library on the GPU. Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2015-Janua (2015), 265--266. https:// doi.org/10.1145/2688500.2688538 arXiv:1501.05387v6Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hailu Xu, Murali Emani, Pei-Hung Lin, Liting Hu, and Chunhua Liao. 2019. Machine learning guided optimal use of GPU unified memory. In 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC). IEEE, 64-- 70. https://doi.org/10.1109/MCHPC49590.2019.00016Google ScholarGoogle ScholarCross RefCross Ref
  25. Tianhao Zheng, David Nellans, Arslan Zulfiqar, Mark Stephenson, and Stephen W. Keckler. 2016. Towards High Performance Paged Memory for GPUs. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 345--357. https://doi.org/10.1109/HPCA.2016.7446077Google ScholarGoogle Scholar

Index Terms

  1. Oversubscribing GPU Unified Virtual Memory: Implications and Suggestions

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICPE '22: Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering
        April 2022
        242 pages
        ISBN:9781450391436
        DOI:10.1145/3489525

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 April 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ICPE '22 Paper Acceptance Rate14of58submissions,24%Overall Acceptance Rate252of851submissions,30%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader