skip to main content
research-article

Program Analysis and Machine Learning–based Approach to Predict Power Consumption of CUDA Kernel

Published: 24 July 2023 Publication History

Abstract

The General Purpose Graphics Processing Unit has secured a prominent position in the High-Performance Computing world due to its performance gain and programmability. Understanding the relationship between Graphics Processing Unit (GPU) power consumption and program features can aid developers in building energy-efficient sustainable applications. In this work, we propose a static analysis-based power model built using machine learning techniques. We have investigated six machine learning models across three NVIDIA GPU architectures: Kepler, Maxwell, and Volta with Random Forest, Extra Trees, Gradient Boosting, CatBoost, and XGBoost reporting favorable results. We observed that the XGBoost technique-based prediction model is the most efficient technique with an R2 value of 0.9646 on Volta Architecture. The dataset used for these techniques includes kernels from different benchmarks suits, sizes, nature (e.g., compute-bound, memory-bound), and complexity (e.g., control divergence, memory access patterns). Experimental results suggest that the proposed solution can help developers precisely predict GPU applications power consumption using program analysis across GPU architectures. Developers can use this approach to refactor their code to build energy-efficient GPU applications.

Supplementary Material

TOMPECS-2022-0022-SUPP (tompecs-2022-0022-supp.zip)
Supplementary material

References

[1]
Muhammad Waseem Ahmad, Monjur Mourshed, and Yacine Rezgui. 2017. Trees vs neurons: Comparison between random forest and ann for high-resolution prediction of building energy consumption. Energy Build. 147, 7 (2017), 77–89.
[2]
Gargi Alavani, Jineet Desai, and Santonu Sarkar. 2020. An approach to estimate power consumption of a CUDA kernel. In Proceedings of the IEEE International Conference on Parallel Distributed Processing with Applications, Big Data Cloud Computing, Sustainable Computing Communications, Social Computing Networking (ISPA/BDCloud/SocialCom/SustainCom’20). IEEE, Exeter, 984–991.
[3]
Gargi Alavani and Santonu Sarkar. 2022. Performance modeling of graphics processing unit application using static and dynamic analysis. Concurr. Comput.: Pract. Exp. 34, 3 (2022), e6602.
[4]
Gargi Alavani and Santonu Sarkar. 2023. Inspect-GPU: A software to evaluate performance characteristics of CUDA kernels using microbenchmarks and regression models. In Proceedings of the International Conference on Software Technologies. SCITEPRESS, Rome.
[5]
Gargi Alavani and Santonu Sarkar. 2023. Prediction of performance and power consumption of GPGPU applications. arxiv:2305.01886 [cs.DC]. Retrieved from https://arxiv.org/abs/2305.01886.
[6]
Abdulaziz Alnori and Karim Djemame. 2018. A holistic resource management for graphics processing units in cloud computing. Electr. Not. Theor. Comput. Sci. 340 (2018), 3–22.
[7]
Krste Asanovic, Rastislav Bodik, et al. 2009. A view of the parallel computing landscape. Commun. ACM 52, 10 (2009), 56–67.
[8]
Suryoday Basak, Saibal Kar, Snehanshu Saha, Luckyson Khaidem, and Sudeepa Roy Dey. 2019. Predicting the direction of stock market prices using tree-based classifiers. N. Am. J. Econ. Financ. 47, 1 (2019), 552–567.
[9]
Vadim Borisov, Tobias Leemann, Kathrin Seßler, Johannes Haug, Martin Pawelczyk, and Gjergji Kasneci. 2022. Deep neural networks and tabular data: A survey. In IEEE Trans. Neural Netw. Learn. Syst. IEEE, 1–21.
[10]
Robert A. Bridges, Neena Imam, and Tiffany M. Mintz. 2016. Understanding gpu power: A survey of profiling, modeling, and simulation methods. Comput. Surv. 49 (2016), 41:1–41:27.
[11]
M. Chadha, A. Srivastava, and S. Sarkar. 2016. Unified power and energy measurement API for HPC co-processors. In Proceedings of the International Performance Computing and Communications Conference (IPCCC’16). IEEE, Las Vegas, NV, 1–8.
[12]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’09). IEEE, Austin, TX, 44–54.
[13]
J. Chen, B. Li, Y. Zhang, L. Peng, and J. Peir. 2011. Tree structured analysis on GPU power study. In Proceedings of the International Conference on Computer Design (ICCD). IEEE, Amherst, MA, 57–64.
[14]
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 785–794.
[15]
Wu chun Feng and Kirk Cameron. 2007. The green500 list: Encouraging sustainable supercomputing. Computer 40, 12 (2007), 50–55.
[16]
Jared Coplin and Martin Burtscher. 2015. Effects of source-code optimizations on GPU performance and energy consumption. In Proceedings of the 8th Workshop on General Purpose Processing using GPUs. ACM, New York, NY, 48–58.
[17]
Harris Drucker, Chris J. C. Burges, Linda Kaufman, Alex Smola, and Vladimir Vapnik. 1996. Support vector regression machines. In Proceedings of the 9th International Conference on Neural Information Processing Systems (NIPS’96). MIT Press, Cambridge, MA, 155–161.
[18]
Kaijie Fan, Biagio Cosenza, and Ben Juurlink. 2019. Predictable GPUs frequency scaling for energy and performance. In Proceedings of the 48th International Conference on Parallel Processing (ICPP’19). Association for Computing Machinery, New York, NY, Article 52, 10 pages.
[19]
Jerome H. Friedman. 2000. Greedy function approximation: A gradient boosting machine. Ann. Stat. 29 (2000), 1189–1232.
[20]
Rong Ge, Ryan Vogt, Jahangir Majumder, Arif Alam, Martin Burtscher, and Ziliang Zong. 2013. Effects of dynamic voltage and frequency scaling on a K20 GPU. In Proceedings of the 42nd International Conference on Parallel Processing (ICPP’13). IEEE, 826–833.
[21]
Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized trees. Mach. Learn. 63 (2006), 3–42.
[22]
Anabel Gómez-Ríos, Julián Luengo, and Francisco Herrera. 2017. A study on the noise label influence in boosting algorithms: AdaBoost, GBM and XGBoost. In Hybrid Artificial Intelligent Systems, Francisco Javier Martínez de Pisón, Rubén Urraca, Héctor Quintián, and Emilio Corchado (Eds.). Springer International Publishing, Cham, 268–280.
[23]
Joao Guerreiro, Aleksandar Ilic, Nuno Roma, and Pedro Tomas. 2018. GPGPU power modeling for multi-domain voltage-frequency scaling. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’18). IEEE, Vienna, 789–800.
[24]
M. S. Hecht and J. D. Ullman. 1974. Characterizations of reducible flow graphs. J. ACM 21 (1974), 367–375.
[25]
Tim Hill, Leorey Marquez, Marcus O’Connor, and William Remus. 1994. Artificial neural network models for forecasting and decision making. Int. J. Forecast. 10 (1994), 5–15.
[26]
Sunpyo Hong and Hyesoon Kim. 2010. An integrated gpu power and performance model. In Proceedings of the 37th Annual International Symposium on Computer Architecture. Association for Computing Machinery, New York, NY, 280–289.
[27]
Vishwesh Jatala, Jayvant Anantpur, and Amey Karkare. 2016. Improving GPU performance through resource sharing. In Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC’16). Association for Computing Machinery, New York, NY, 203–214.
[28]
Vishwesh Jatala, Jayvant Anantpur, and Amey Karkare. 2018. GREENER: A tool for improving energy efficiency of register files (2018). arXiv:1709.04697 [cs.AR].
[29]
Wenhao Jia, Elba Garza, Kelly A. Shaw, and Margaret Martonosi. 2015. GPU performance and power tuning using regression trees. ACM Trans. Arch. Code Optim. 12 (2015), 1–26.
[30]
Krishna Kandalla, Emilio P. Mancini, Sayantan Sur, and Dhabaleswar K. Panda. 2010. Designing power-aware collective communication algorithms for infiniband clusters. In Proceedings of the 39th International Conference on Parallel Processing. IEEE, 218–227.
[31]
Vijay Kandiah, Scott Peverelle, Mahmoud Khairy, Junrui Pan, Amogh Manjunath, Timothy G. Rogers, Tor M. Aamodt, and Nikos Hardavellas. 2021. AccelWattch: A power modeling framework for modern GPUs. In Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-54). ACM, New York, NY, 738–753.
[32]
A. Karki et al. 2019. Tango: A deep neural network benchmark suite for various accelerators. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 137–138.
[33]
Vijay Anand Korthikanti and Gul Agha. 2010. Towards optimizing energy costs of algorithms for shared memory architectures. In Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’10). Association for Computing Machinery, New York, NY, 157–165.
[34]
J. Lemeire, J. G. Cornelis, and L. Segers. 2016. Microbenchmarks for GPU characteristics: The occupancy roofline and the pipeline model. In Proceedings of the Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’16). IEEE, 456–463.
[35]
Yun Liang, Muhammad Teguh Satria, Kyle Rupnow, and Deming Chen. 2016. An accurate GPU performance model for effective control flow divergence optimization. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 35, 7 (2016), 1165–1178.
[36]
Jieun Lim, Nagesh B. Lakshminarayana, Hyesoon Kim, William Song, Sudhakar Yalamanchili, and Wonyong Sung. 2014. Power modeling for GPU architecture using McPAT. ACM Trans. Des. Autom. Electr. Syst. 19, 3 (2014), 1–24.
[37]
Jan Lucas, Sohan Lal, Michael Andersch, Mauricio Alvarez-Mesa, and Ben Juurlink. 2013. How a single chip causes massive power bills. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS’13). IEEE, 97–106.
[38]
Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, 4768–4777.
[39]
Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Wang Xiaorui. 2012. GreenGPU: A holistic approach to energy efficiency in GPU-CPU heterogeneous architectures. In Proceedings of the 41st International Conference on Parallel Processing (ICPP’12). IEEE, New York, NY, 48–57.
[40]
Xiaohan Ma, Mian Dong, Lin Zhong, and Zhigang Deng. 2009. Statistical power consumption analysis and modeling for GPU-based computing. In Proceeding of ACM SOSP Workshop on Power Aware Computing and Systems (HotPower’09), Vol. 1. ACM.
[41]
Xiaohan Ma, Marion Rincon, and Zhigang Deng. 2011. Improving energy efficiency of GPU based general-purpose scientific computing through automated selection of near optimal configurations. Technical Report. https://uh.edu/nsm/_docs/cosc/technical-reports/2011/11_08.pdf.
[42]
Diksha Moolchandani, Anshul Kumar, and Smruti R. Sarangi. 2022. Performance and power prediction for concurrent execution on GPUs. ACM Trans. Archit. Code Optim. 19, 3 (May 2022), 27 pages.
[43]
Hitoshi Nagasaka, Naoya Maruyama, Akira Nukada, Toshio Endo, and Satoshi Matsuoka. 2010. Statistical power modeling of GPU kernels using performance counters. In Proceedings of the International Green Computing Conference (GREENCOMP’10). IEEE, 115–122.
[44]
Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2018. CatBoost: Unbiased boosting with categorical features. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. Advances in Neural Information Processing Systems, 6639–6649.
[45]
S. Sharma, Chung-Hsing Hsu, and Wu chun Feng. 2006. Making a case for a green500 list. In Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium, IEEE, 8 pp.
[46]
J. W. Sheaffer, D. Luebke, and K. Skadron. 2004. A flexible simulation framework for graphics architectures. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware (HWWS’04). Association for Computing Machinery, New York, NY, 85–94.
[47]
Ravid Shwartz-Ziv and Amitai Armon. 2022. Tabular data: Deep learning is not all you need. Inf. Fus. 81, 5 (2022), 84–90.
[48]
Karan Singh, Major Bhadauria, and Sally A. McKee. 2009. Real time power estimation and thread scheduling via performance counters. ACM SIGARCH Comput. Arch. News 37, 5 (2009), 46–55.
[49]
Yan Solihin. 2015. Fundamentals of Parallel Multicore Architecture. Chapman & Hall/CRC.
[50]
Shuaiwen Song, Chunyi Su, Barry Rountree, and Kirk Cameron. 2013. A simplified and accurate model of power-performance efficiency on emergent GPU architectures. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’13). IEEE, 673–686.
[51]
R. Tagliaferri and M. Marinaro. 1997. Neural nets, WIRN Vietri-96. In Proceedings of the 8th Italian Workshop on Neural Nets, Vietri Sul Mare. Springer-Verlag, Berlin, Heidelberg, XI, 346.
[52]
Henry Wong, Misel-Myrto Papadopoulou, Maryam Sadooghi-Alvandi, and Andreas Moshovos. 2010. Demystifying GPU microarchitecture through microbenchmarking. In Proceedings of the International Symposium on Performance Analysis of Systems & Software (ISPASS’10). IEEE, 235–246.
[53]
Qi Zhao, Hailong Yang, Zhongzhi Luan, and Depei Qian. 2013. POIGEM: A programming-oriented instruction level gpu energy model for CUDA program. In Proceedings of the 13th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP’13). Springer, Berlin, 129–142.

Cited By

View all
  • (2025)A Comprehensive Analysis of Process Energy Consumption on Multi-socket Systems with GPUsHigh Performance Computing10.1007/978-3-031-80084-9_4(52-67)Online publication date: 14-Feb-2025
  • (2024)Estimating Power Consumption of GPU Application Using Machine Learning Tool2024 IEEE 36th International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI62512.2024.00109(734-739)Online publication date: 28-Oct-2024
  • (2024)Analyzing GPU Energy Consumption in Data Movement and Storage2024 IEEE 35th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP61560.2024.00038(143-151)Online publication date: 24-Jul-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Modeling and Performance Evaluation of Computing Systems
ACM Transactions on Modeling and Performance Evaluation of Computing Systems  Volume 8, Issue 4
December 2023
119 pages
ISSN:2376-3639
EISSN:2376-3647
DOI:10.1145/3609794
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 July 2023
Online AM: 05 June 2023
Accepted: 24 May 2023
Revised: 02 April 2023
Received: 17 July 2022
Published in TOMPECS Volume 8, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPGPU computing
  2. XGBoost
  3. CatBoost
  4. CUDA
  5. static analysis
  6. sustainable computing

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)358
  • Downloads (Last 6 weeks)27
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)A Comprehensive Analysis of Process Energy Consumption on Multi-socket Systems with GPUsHigh Performance Computing10.1007/978-3-031-80084-9_4(52-67)Online publication date: 14-Feb-2025
  • (2024)Estimating Power Consumption of GPU Application Using Machine Learning Tool2024 IEEE 36th International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI62512.2024.00109(734-739)Online publication date: 28-Oct-2024
  • (2024)Analyzing GPU Energy Consumption in Data Movement and Storage2024 IEEE 35th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP61560.2024.00038(143-151)Online publication date: 24-Jul-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media