skip to main content
10.1145/3366428.3380767acmconferencesArticle/Chapter ViewAbstractPublication PagesgpgpuConference Proceedingsconference-collections
research-article

GPGPU performance estimation for frequency scaling using cross-benchmarking

Published: 23 February 2020 Publication History

Abstract

Dynamic Voltage and Frequency Scaling (D VFS) on General-Purpose Graphics Processing Units (GPGPUs) is now becoming one of the most significant techniques to balance computational performance and energy consumption. However, there are still few fast and accurate models for predicting GPU kernel execution time under different core and memory frequency settings, which is important to determine the best frequency configuration for energy saving. Accordingly, a novel GPGPU performance estimation model with both core and memory frequency scaling is herein proposed. We design a cross-benchmarking suite, which simulates kernels with a wide range of instruction distributions. The synthetic kernels generated by this suite can be used for model pre-training or as supplementary training samples. Then we apply two different machine learning algorithms, Support Vector Regression (SVR) and Gradient Boosting Decision Tree (GBDT), to study the correlation between kernel performance counters and kernel performance. The models trained only with our cross-benchmarking suite achieve satisfying accuracy (16%~22% mean absolute error) on 24 unseen real application kernels. Validated on three modern GPUs with a wide frequency scaling range, by using a collection of 24 real application kernels, the proposed model is able to achieve accurate results (5.1%, 2.8%, 6.5% mean absolute error) for the target GPUs (GTX 980, Titan X Pascal and Tesla P100).

References

[1]
Yuki Abe, Hiroshi Sasaki, Shinpei Kato, Koji Inoue, Masato Edahiro, and Martin Peres. 2014. Power and performance characterization and modeling of GPU-accelerated systems. In IEEE IPDPS 2014. 113--122.
[2]
Y. Arafa, A. A. Badawy, G. Chennupati, N. Santhi, and S. Eidenbenz. 2019. PPT-GPU: Scalable GPU Performance Modeling. IEEE Computer Architecture Letters 18, 1 (Jan 2019), 55--58.
[3]
Robert A Bridges, Neena Imam, and Tiffany M Mintz. 2016. Understanding GPU Power: A Survey of Profiling, Modeling, and Simulation Methods. ACM CSUR 49, 3 (2016), 41.
[4]
X. Chu C. Liu, Q. Wang and Y.W. Leung. 2018. G-CRS: GPU accelerated Cauchy Reed-Solomon coding. IEEE TPDS 29, 7 (2018), 1484--1498.
[5]
Vincent Chau, Xiaowen Chu, Hai Liu, and Yiu-Wing Leung. 2017. Energy Efficient Job Scheduling with DVFS for CPU-GPU Heterogeneous Systems. In ACM e-Energy'17.
[6]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization (IISWC) 2009. IEEE International Symposium on. IEEE, 44--54.
[7]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM, New York, NY, USA, 785--794.
[8]
Xiaowen Chu, Kaiyong Zhao, and Mea Wang. 2009. Practical random linear network coding on GPUs. In 2009 IFIP Networking.
[9]
Adam Coates, Brody Huval, Tao Wang, David Wu, Bryan Catanzaro, and Ng Andrew. 2013. Deep learning with COTS HPC systems. In Proceedings of the 30th international conference on machine learning. 1337--1345.
[10]
Thanh Tuan Dao, Jungwon Kim, Sangmin Seo, Bernhard Egger, and Jaejin Lee. 2015. A performance model for GPUs with caches. IEEE TPDS 26, 7 (2015), 1800--1813.
[11]
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, and Andrew Y. Ng. 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems. 1223--1231.
[12]
Kaijie Fan, Biagio Cosenza, and Ben Juurlink. 2019. Predictable GPUs Frequency Scaling for Energy and Performance. In Proceedings of the 48th ICPP, 2019. IEEE.
[13]
J. Guerreiro, A. Ilic, N. Roma, and P. Tomas. 2018. GPGPU Power Modeling for Multi-domain Voltage-Frequency Scaling. In 2018 IEEE HPCA. 789--800.
[14]
João Guerreiro, Aleksandar Ilic, Nuno Roma, and Pedro Tomás. 2019. DVFS-aware application classification to improve GPGPUs energy efficiency. Parallel Comput. 83 (2019), 93--117.
[15]
Sunpyo Hong and Hyesoon Kim. 2009. An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness. In Proceedings of the 36th ISCA. ACM, 152--163.
[16]
Y. Huang, B. Guo, and Y. Shen. 2019. GPU Energy Consumption Optimization With a Global-Based Neural Network Method. IEEE Access 7 (2019).
[17]
Zhe Jia, Marco Maggioni, Benjamin Staiger, and Daniele P Scarpazza. 2018. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. arXiv preprint arXiv:1804.06826 (2018).
[18]
Qing Jiao, Mian Lu, Huynh Phung Huynh, and Tulika Mitra. 2015. Improving GPGPU energy-efficiency through concurrent kernel execution and DVFS. In 2015 IEEE/ACM International Symposium on CGO. 1--11.
[19]
David HK Kim, Connor Imes, and Henry Hoffmann. 2015. Racing and Pacing to Idle: Theoretical and Empirical Analysis of Energy Optimization Heuristics. In CPSNA, 2015 IEEE 3rd International Conference on. IEEE, 78--85.
[20]
Jungseob Lee, Vijay Sathisha, Michael Schulte, Katherine Compton, and Nam Sung Kim.2011. Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling. In 2011 International Conference on PACT. 111--120.
[21]
You Li, Kaiyong Zhao, Xiaowen Chu, and Jiming Liu. 2010. Speeding up k-means algorithm by GPUs. In 2010 IEEE CIT. 115--122.
[22]
Chi-Man Liu, Thomas Wong, Edward Wu, Ruibang Luo, Siu-Ming Yiu, Yingrui Li, Bingqiang Wang, Chang Yu, Xiaowen Chu, Kaiyong Zhao, Ruiqiang Li, and Tak-Wah Lam. 2012. SOAP3: ultra-fast GPU-based parallel alignment tool for short reads. Bioinformatics 28, 6 (2012), 878--879.
[23]
Xiaohan Ma, Mian Dong, Lin Zhong, and Zhigang Deng. 2009. Statistical power consumption analysis and modeling for GPU-based computing. In ACM Hot-Power'09.
[24]
J. Macri. 2015. AMD's next generation GPU and high bandwidth memory architecture: FURY. In 2015 IEEE Hot Chips 27 Symposium (HCS). 1--26.
[25]
Xinxin Mei and Xiaowen Chu. 2017. Dissecting GPU Memory Hierarchy Through Microbenchmarking. IEEE TPDS 28, 1 (Jan 2017), 72--86.
[26]
Xinxin Mei, Xiaowen Chu, Hai Liu, Yiu-Wing Leung, and Zongpeng Li. 2017. Energy efficient real-time task scheduling on CPU-GPU hybrid clusters. In IEEE INFOCOM 2017.
[27]
Xinxin Mei, Qiang Wang, and Xiaowen Chu. 2017. A survey and measurement study of GPU DVFS on energy conservation. Digital Communications and Networks 3, 2 (2017), 89 -- 100.
[28]
Xinxin Mei, Kaiyong Zhao, Chengjian Liu, and Xiaowen Chu. 2014. Benchmarking the memory hierarchy of modern GPUs. In Network and Parallel Computing. Springer, 144--156.
[29]
Hitoshi Nagasaka, Naoya Maruyama, Akira Nukada, Toshio Endo, and Satoshi Matsuoka. 2010. Statistical power modeling of GPU kernels using performance counters. In Green Computing Conference, 2010. IEEE, 115--122.
[30]
Rajib Nath and Dean Tullsen. 2015. The CRISP performance model for dynamic voltage and frequency scaling in a GPGPU. In Proceedings of the 48th MICRO.
[31]
NVIDIA. 2018. CUDA C Programming Guide. [Online] http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html.
[32]
NVIDIA. 2018. GPU Computing SDK. [Online] https://developer.nvidia.com/gpu-computing-sdk.
[33]
NVIDIA. 2018. NVIDIA Management Library. [Online] https://developer.nvidia.com/nvidia-management-library-nvml.
[34]
NVIDIA. 2018. NVIDIA Profiler. [Online] http://docs.nvidia.com/cuda/profiler-users-guide.
[35]
S. Shi, Q. Wang, and X. Chu. 2018. Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs. In 2018 IEEE DataCom. 949--957.
[36]
S. Shi, Q. Wang, P. Xu, and X. Chu. 2016. Benchmarking State-of-the-Art Deep Learning Software Tools. In 2016 7th International Conference on Cloud Computing and Big Data (CCBD). 99--104.
[37]
Shuaiwen Song, Chunyi Su, Barry Rountree, and Kirk W Cameron. 2013. A simplified and accurate model of power-performance efficiency on emergent gpu architectures. In 2013 IEEE IPDPS. 673--686.
[38]
Erich Strohmaier, Jack Dongarra, Horst Simon, Martin Meuer, and Hans Meuer. 2018. TOP500. [Online] https://www.top500.org/lists/2019/11/.
[39]
Zhenheng Tang, Yuxin Wang, Qiang Wang, and Xiaowen Chu. 2019. The Impact of GPU DVFS on the Energy and Performance of Deep Learning: An Empirical Study. In ACM e-Energy'19. Phoenix, AZ, USA, 315--325.
[40]
Qiang Wang and Xiaowen Chu. 2018. GPGPU Performance Estimation with Core and Memory Frequency Scaling. In 2018 IEEE ICPADS.
[41]
X. Wang, K. Huang, A. Knoll, and X. Qian. 2019. A Hybrid Framework for Fast and Accurate GPU Performance Estimation through Source-Level Analysis and Trace-Based Simulation. In 2019 IEEE HPCA. 506--518.
[42]
Yuxin Wang, Qiang Wang, Shaohuai Shi, Xin He, Zhenheng Tang, Kaiyong Zhao, and Xiaowen Chu. 2019. Benchmarking the Performance and Power of AI Accelerators for AI Training. arXiv:cs.DC/1909.06842
[43]
Henry Wong, Misel-Myrto Papadopoulou, Maryam Sadooghi-Alvandi, and Andreas Moshovos. [n.d.]. Demystifying GPU microarchitecture through microbenchmarking. In ISPASS, 2010. IEEE, 235--246.
[44]
Gene Wu, Joseph L Greathouse, Alexander Lyashevsky, Nuwan Jayasena, and Derek Chiou. 2015. GPGPU performance and power estimation using machine learning. In 2015 IEEE HPCA. IEEE, 564--576.
[45]
Kaiyong Zhao and Xiaowen Chu. 2014. G-BLASTN: accelerating nucleotide alignment by graphics processors. Bioinformatics 30, 10 (2014), 1384--1391.

Cited By

View all
  • (2025)Using Analytical Performance/Power Model and Fine-Grained DVFS to Enhance AI Accelerator Energy EfficiencyProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707231(1118-1132)Online publication date: 30-Mar-2025
  • (2024)Improving GPU Energy Efficiency through an Application-transparent Frequency Scaling Policy with Performance AssuranceProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629584(769-785)Online publication date: 22-Apr-2024
  • (2024)DSO: A GPU Energy Efficiency Optimizer by Fusing Dynamic and Static Information2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS)10.1109/IWQoS61813.2024.10682917(1-6)Online publication date: 19-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GPGPU '20: Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit
February 2020
77 pages
ISBN:9781450370257
DOI:10.1145/3366428
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 February 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPU performance modeling
  2. dynamic voltage and frequency scaling
  3. graphics processing units
  4. machine learning

Qualifiers

  • Research-article

Funding Sources

  • Hong Kong RGC

Conference

PPoPP '20

Acceptance Rates

GPGPU '20 Paper Acceptance Rate 7 of 12 submissions, 58%;
Overall Acceptance Rate 57 of 129 submissions, 44%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)3
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Using Analytical Performance/Power Model and Fine-Grained DVFS to Enhance AI Accelerator Energy EfficiencyProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707231(1118-1132)Online publication date: 30-Mar-2025
  • (2024)Improving GPU Energy Efficiency through an Application-transparent Frequency Scaling Policy with Performance AssuranceProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629584(769-785)Online publication date: 22-Apr-2024
  • (2024)DSO: A GPU Energy Efficiency Optimizer by Fusing Dynamic and Static Information2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS)10.1109/IWQoS61813.2024.10682917(1-6)Online publication date: 19-Jun-2024
  • (2021)Efficiency Near the Edge: Increasing the Energy Efficiency of FFTs on GPUs for Real-Time Edge ComputingIEEE Access10.1109/ACCESS.2021.30534099(18167-18182)Online publication date: 2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media