research-article

Program Analysis and Machine Learning–based Approach to Predict Power Consumption of CUDA Kernel

Authors:

Snehanshu Saha,

Santonu SarkarAuthors Info & Claims

ACM Transactions on Modeling and Performance Evaluation of Computing Systems, Volume 8, Issue 4

Article No.: 10, Pages 1 - 24

https://doi.org/10.1145/3603533

Published: 24 July 2023 Publication History

Abstract

The General Purpose Graphics Processing Unit has secured a prominent position in the High-Performance Computing world due to its performance gain and programmability. Understanding the relationship between Graphics Processing Unit (GPU) power consumption and program features can aid developers in building energy-efficient sustainable applications. In this work, we propose a static analysis-based power model built using machine learning techniques. We have investigated six machine learning models across three NVIDIA GPU architectures: Kepler, Maxwell, and Volta with Random Forest, Extra Trees, Gradient Boosting, CatBoost, and XGBoost reporting favorable results. We observed that the XGBoost technique-based prediction model is the most efficient technique with an R² value of 0.9646 on Volta Architecture. The dataset used for these techniques includes kernels from different benchmarks suits, sizes, nature (e.g., compute-bound, memory-bound), and complexity (e.g., control divergence, memory access patterns). Experimental results suggest that the proposed solution can help developers precisely predict GPU applications power consumption using program analysis across GPU architectures. Developers can use this approach to refactor their code to build energy-efficient GPU applications.

Supplementary Material

TOMPECS-2022-0022-SUPP (tompecs-2022-0022-supp.zip)

Supplementary material

Download
12.52 MB

References

[1]

Muhammad Waseem Ahmad, Monjur Mourshed, and Yacine Rezgui. 2017. Trees vs neurons: Comparison between random forest and ann for high-resolution prediction of building energy consumption. Energy Build. 147, 7 (2017), 77–89.

[2]

Gargi Alavani, Jineet Desai, and Santonu Sarkar. 2020. An approach to estimate power consumption of a CUDA kernel. In Proceedings of the IEEE International Conference on Parallel Distributed Processing with Applications, Big Data Cloud Computing, Sustainable Computing Communications, Social Computing Networking (ISPA/BDCloud/SocialCom/SustainCom’20). IEEE, Exeter, 984–991.

[3]

Gargi Alavani and Santonu Sarkar. 2022. Performance modeling of graphics processing unit application using static and dynamic analysis. Concurr. Comput.: Pract. Exp. 34, 3 (2022), e6602.

[4]

Gargi Alavani and Santonu Sarkar. 2023. Inspect-GPU: A software to evaluate performance characteristics of CUDA kernels using microbenchmarks and regression models. In Proceedings of the International Conference on Software Technologies. SCITEPRESS, Rome.

[5]

Gargi Alavani and Santonu Sarkar. 2023. Prediction of performance and power consumption of GPGPU applications. arxiv:2305.01886 [cs.DC]. Retrieved from https://arxiv.org/abs/2305.01886.

[6]

Abdulaziz Alnori and Karim Djemame. 2018. A holistic resource management for graphics processing units in cloud computing. Electr. Not. Theor. Comput. Sci. 340 (2018), 3–22.

[7]

Krste Asanovic, Rastislav Bodik, et al. 2009. A view of the parallel computing landscape. Commun. ACM 52, 10 (2009), 56–67.

Digital Library

[8]

Suryoday Basak, Saibal Kar, Snehanshu Saha, Luckyson Khaidem, and Sudeepa Roy Dey. 2019. Predicting the direction of stock market prices using tree-based classifiers. N. Am. J. Econ. Financ. 47, 1 (2019), 552–567.

[9]

Vadim Borisov, Tobias Leemann, Kathrin Seßler, Johannes Haug, Martin Pawelczyk, and Gjergji Kasneci. 2022. Deep neural networks and tabular data: A survey. In IEEE Trans. Neural Netw. Learn. Syst. IEEE, 1–21.

[10]

Robert A. Bridges, Neena Imam, and Tiffany M. Mintz. 2016. Understanding gpu power: A survey of profiling, modeling, and simulation methods. Comput. Surv. 49 (2016), 41:1–41:27.

[11]

M. Chadha, A. Srivastava, and S. Sarkar. 2016. Unified power and energy measurement API for HPC co-processors. In Proceedings of the International Performance Computing and Communications Conference (IPCCC’16). IEEE, Las Vegas, NV, 1–8.

[12]

Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’09). IEEE, Austin, TX, 44–54.

Digital Library

[13]

J. Chen, B. Li, Y. Zhang, L. Peng, and J. Peir. 2011. Tree structured analysis on GPU power study. In Proceedings of the International Conference on Computer Design (ICCD). IEEE, Amherst, MA, 57–64.

Digital Library

[14]

Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 785–794.

Digital Library

[15]

Wu chun Feng and Kirk Cameron. 2007. The green500 list: Encouraging sustainable supercomputing. Computer 40, 12 (2007), 50–55.

Digital Library

[16]

Jared Coplin and Martin Burtscher. 2015. Effects of source-code optimizations on GPU performance and energy consumption. In Proceedings of the 8th Workshop on General Purpose Processing using GPUs. ACM, New York, NY, 48–58.

Digital Library

[17]

Harris Drucker, Chris J. C. Burges, Linda Kaufman, Alex Smola, and Vladimir Vapnik. 1996. Support vector regression machines. In Proceedings of the 9th International Conference on Neural Information Processing Systems (NIPS’96). MIT Press, Cambridge, MA, 155–161.

[18]

Kaijie Fan, Biagio Cosenza, and Ben Juurlink. 2019. Predictable GPUs frequency scaling for energy and performance. In Proceedings of the 48th International Conference on Parallel Processing (ICPP’19). Association for Computing Machinery, New York, NY, Article 52, 10 pages.

Digital Library

[19]

Jerome H. Friedman. 2000. Greedy function approximation: A gradient boosting machine. Ann. Stat. 29 (2000), 1189–1232.

[20]

Rong Ge, Ryan Vogt, Jahangir Majumder, Arif Alam, Martin Burtscher, and Ziliang Zong. 2013. Effects of dynamic voltage and frequency scaling on a K20 GPU. In Proceedings of the 42nd International Conference on Parallel Processing (ICPP’13). IEEE, 826–833.

Digital Library

[21]

Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized trees. Mach. Learn. 63 (2006), 3–42.

Digital Library

[22]

Anabel Gómez-Ríos, Julián Luengo, and Francisco Herrera. 2017. A study on the noise label influence in boosting algorithms: AdaBoost, GBM and XGBoost. In Hybrid Artificial Intelligent Systems, Francisco Javier Martínez de Pisón, Rubén Urraca, Héctor Quintián, and Emilio Corchado (Eds.). Springer International Publishing, Cham, 268–280.

[23]

Joao Guerreiro, Aleksandar Ilic, Nuno Roma, and Pedro Tomas. 2018. GPGPU power modeling for multi-domain voltage-frequency scaling. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’18). IEEE, Vienna, 789–800.

[24]

M. S. Hecht and J. D. Ullman. 1974. Characterizations of reducible flow graphs. J. ACM 21 (1974), 367–375.

Digital Library

[25]

Tim Hill, Leorey Marquez, Marcus O’Connor, and William Remus. 1994. Artificial neural network models for forecasting and decision making. Int. J. Forecast. 10 (1994), 5–15.

[26]

Sunpyo Hong and Hyesoon Kim. 2010. An integrated gpu power and performance model. In Proceedings of the 37th Annual International Symposium on Computer Architecture. Association for Computing Machinery, New York, NY, 280–289.

Digital Library

[27]

Vishwesh Jatala, Jayvant Anantpur, and Amey Karkare. 2016. Improving GPU performance through resource sharing. In Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC’16). Association for Computing Machinery, New York, NY, 203–214.

Digital Library

[28]

Vishwesh Jatala, Jayvant Anantpur, and Amey Karkare. 2018. GREENER: A tool for improving energy efficiency of register files (2018). arXiv:1709.04697 [cs.AR].

[29]

Wenhao Jia, Elba Garza, Kelly A. Shaw, and Margaret Martonosi. 2015. GPU performance and power tuning using regression trees. ACM Trans. Arch. Code Optim. 12 (2015), 1–26.

Digital Library

[30]

Krishna Kandalla, Emilio P. Mancini, Sayantan Sur, and Dhabaleswar K. Panda. 2010. Designing power-aware collective communication algorithms for infiniband clusters. In Proceedings of the 39th International Conference on Parallel Processing. IEEE, 218–227.

Digital Library

[31]

Vijay Kandiah, Scott Peverelle, Mahmoud Khairy, Junrui Pan, Amogh Manjunath, Timothy G. Rogers, Tor M. Aamodt, and Nikos Hardavellas. 2021. AccelWattch: A power modeling framework for modern GPUs. In Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-54). ACM, New York, NY, 738–753.

Digital Library

[32]

A. Karki et al. 2019. Tango: A deep neural network benchmark suite for various accelerators. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 137–138.

[33]

Vijay Anand Korthikanti and Gul Agha. 2010. Towards optimizing energy costs of algorithms for shared memory architectures. In Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’10). Association for Computing Machinery, New York, NY, 157–165.

Digital Library

[34]

J. Lemeire, J. G. Cornelis, and L. Segers. 2016. Microbenchmarks for GPU characteristics: The occupancy roofline and the pipeline model. In Proceedings of the Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’16). IEEE, 456–463.

[35]

Yun Liang, Muhammad Teguh Satria, Kyle Rupnow, and Deming Chen. 2016. An accurate GPU performance model for effective control flow divergence optimization. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 35, 7 (2016), 1165–1178.

Digital Library

[36]

Jieun Lim, Nagesh B. Lakshminarayana, Hyesoon Kim, William Song, Sudhakar Yalamanchili, and Wonyong Sung. 2014. Power modeling for GPU architecture using McPAT. ACM Trans. Des. Autom. Electr. Syst. 19, 3 (2014), 1–24.

Digital Library

[37]

Jan Lucas, Sohan Lal, Michael Andersch, Mauricio Alvarez-Mesa, and Ben Juurlink. 2013. How a single chip causes massive power bills. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS’13). IEEE, 97–106.

[38]

Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, 4768–4777.

Digital Library

[39]

Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Wang Xiaorui. 2012. GreenGPU: A holistic approach to energy efficiency in GPU-CPU heterogeneous architectures. In Proceedings of the 41st International Conference on Parallel Processing (ICPP’12). IEEE, New York, NY, 48–57.

Digital Library

[40]

Xiaohan Ma, Mian Dong, Lin Zhong, and Zhigang Deng. 2009. Statistical power consumption analysis and modeling for GPU-based computing. In Proceeding of ACM SOSP Workshop on Power Aware Computing and Systems (HotPower’09), Vol. 1. ACM.

[41]

Xiaohan Ma, Marion Rincon, and Zhigang Deng. 2011. Improving energy efficiency of GPU based general-purpose scientific computing through automated selection of near optimal configurations. Technical Report. https://uh.edu/nsm/_docs/cosc/technical-reports/2011/11_08.pdf.

[42]

Diksha Moolchandani, Anshul Kumar, and Smruti R. Sarangi. 2022. Performance and power prediction for concurrent execution on GPUs. ACM Trans. Archit. Code Optim. 19, 3 (May 2022), 27 pages.

Digital Library

[43]

Hitoshi Nagasaka, Naoya Maruyama, Akira Nukada, Toshio Endo, and Satoshi Matsuoka. 2010. Statistical power modeling of GPU kernels using performance counters. In Proceedings of the International Green Computing Conference (GREENCOMP’10). IEEE, 115–122.

Digital Library

[44]

Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2018. CatBoost: Unbiased boosting with categorical features. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. Advances in Neural Information Processing Systems, 6639–6649.

[45]

S. Sharma, Chung-Hsing Hsu, and Wu chun Feng. 2006. Making a case for a green500 list. In Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium, IEEE, 8 pp.

[46]

J. W. Sheaffer, D. Luebke, and K. Skadron. 2004. A flexible simulation framework for graphics architectures. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware (HWWS’04). Association for Computing Machinery, New York, NY, 85–94.

Digital Library

[47]

Ravid Shwartz-Ziv and Amitai Armon. 2022. Tabular data: Deep learning is not all you need. Inf. Fus. 81, 5 (2022), 84–90.

Digital Library

[48]

Karan Singh, Major Bhadauria, and Sally A. McKee. 2009. Real time power estimation and thread scheduling via performance counters. ACM SIGARCH Comput. Arch. News 37, 5 (2009), 46–55.

Digital Library

[49]

Yan Solihin. 2015. Fundamentals of Parallel Multicore Architecture. Chapman & Hall/CRC.

[50]

Shuaiwen Song, Chunyi Su, Barry Rountree, and Kirk Cameron. 2013. A simplified and accurate model of power-performance efficiency on emergent GPU architectures. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’13). IEEE, 673–686.

Digital Library

[51]

R. Tagliaferri and M. Marinaro. 1997. Neural nets, WIRN Vietri-96. In Proceedings of the 8th Italian Workshop on Neural Nets, Vietri Sul Mare. Springer-Verlag, Berlin, Heidelberg, XI, 346.

[52]

Henry Wong, Misel-Myrto Papadopoulou, Maryam Sadooghi-Alvandi, and Andreas Moshovos. 2010. Demystifying GPU microarchitecture through microbenchmarking. In Proceedings of the International Symposium on Performance Analysis of Systems & Software (ISPASS’10). IEEE, 235–246.

[53]

Qi Zhao, Hailong Yang, Zhongzhi Luan, and Depei Qian. 2013. POIGEM: A programming-oriented instruction level gpu energy model for CUDA program. In Proceedings of the 13th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP’13). Springer, Berlin, 129–142.

Digital Library

Cited By

León-Vega LTosato NCozzini S(2025)A Comprehensive Analysis of Process Energy Consumption on Multi-socket Systems with GPUsHigh Performance Computing10.1007/978-3-031-80084-9_4(52-67)Online publication date: 14-Feb-2025
https://doi.org/10.1007/978-3-031-80084-9_4
Prabhu GDesai TPotdar SGogari NSaha SSarkar S(2024)Estimating Power Consumption of GPU Application Using Machine Learning Tool2024 IEEE 36th International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI62512.2024.00109(734-739)Online publication date: 28-Oct-2024
https://doi.org/10.1109/ICTAI62512.2024.00109
Delestrac PMiquel JBhattacharjee DMoolchandani DCatthoor FTorres LNovo D(2024)Analyzing GPU Energy Consumption in Data Movement and Storage2024 IEEE 35th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP61560.2024.00038(143-151)Online publication date: 24-Jul-2024
https://doi.org/10.1109/ASAP61560.2024.00038

Index Terms

Program Analysis and Machine Learning–based Approach to Predict Power Consumption of CUDA Kernel
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Ensemble methods
  2. Parallel computing methodologies
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption
ARMS-CC '17: Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. ...
Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA

This paper presents implementation strategies and optimization approaches for a D3Q19 lattice Boltzmann flow solver on nVIDIA graphics processing units (GPUs). Using the STREAM benchmarks we demonstrate the GPU parallelization approach and obtain an ...
A performance study of general-purpose applications on graphics processors using CUDA

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Modeling and Performance Evaluation of Computing Systems

ACM Transactions on Modeling and Performance Evaluation of Computing Systems Volume 8, Issue 4

December 2023

119 pages

ISSN:2376-3639

EISSN:2376-3647

DOI:10.1145/3609794

Editor:
Leana Golubchik
University of Southern California, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 July 2023

Online AM: 05 June 2023

Accepted: 24 May 2023

Revised: 02 April 2023

Received: 17 July 2022

Published in TOMPECS Volume 8, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
647
Total Downloads

Downloads (Last 12 months)358
Downloads (Last 6 weeks)27

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

León-Vega LTosato NCozzini S(2025)A Comprehensive Analysis of Process Energy Consumption on Multi-socket Systems with GPUsHigh Performance Computing10.1007/978-3-031-80084-9_4(52-67)Online publication date: 14-Feb-2025
https://doi.org/10.1007/978-3-031-80084-9_4
Prabhu GDesai TPotdar SGogari NSaha SSarkar S(2024)Estimating Power Consumption of GPU Application Using Machine Learning Tool2024 IEEE 36th International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI62512.2024.00109(734-739)Online publication date: 28-Oct-2024
https://doi.org/10.1109/ICTAI62512.2024.00109
Delestrac PMiquel JBhattacharjee DMoolchandani DCatthoor FTorres LNovo D(2024)Analyzing GPU Energy Consumption in Data Movement and Storage2024 IEEE 35th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP61560.2024.00038(143-151)Online publication date: 24-Jul-2024
https://doi.org/10.1109/ASAP61560.2024.00038

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents