ABSTRACT
Latency-critical (LC) applications are widely deployed in modern datacenters. Effective power management for LC applications can yield significant cost savings. However, it poses a significant challenge in maintaining the desired Service Level Aggrement (SLA) levels. Prior researches have mainly emphasized predicting the service time of request and utilize heuristic algorithms for CPU frequency adjustment. Unfortunately, the control granularity is limited to the request level and manual feature selection is needed.
This paper proposes DeepPower, a deep reinforcement learning (DRL) based power management solution for LC applications. DeepPower comprises two key components, a DRL agent for monitoring the system load changes and a thread controller for CPU frequency adjustment. Considering the high overhead of the neural network and the short service time of requests, it is infeasible to employ DRL for direct adjustment of CPU frequency at the request level. Instead, DeepPower proposes a hierarchical control mechanism. That means the DRL agent adjusts the parameter of thread controller with longer intervals, and thread controller adjusts the CPU frequency with shorter intervals. This control mechanism enables DeepPower to adapt to dynamic workloads and achieves fine-grained frequency adjustments. We evaluate DeepPower with some common LC applications under dynamic workload. The experimental results show that DeepPower saves up to 28.4% power compared with state-of-the-art methods and reduces the percentage of request timeout.
- [n. d.]. E-commerce search benchmark. https://github.com/alibaba/eCommerceSearchBench.Google Scholar
- Luiz André Barroso and Urs Hölzle. 2007. The case for energy-proportional computing. Computer 40, 12 (2007), 33–37.Google ScholarDigital Library
- Li Chen, Justinas Lingys, Kai Chen, and Feng Liu. 2018. Auto: Scaling deep reinforcement learning for datacenter-scale automatic traffic optimization. In Proceedings of the 2018 conference of the ACM special interest group on data communication. 191–205.Google ScholarDigital Library
- Shuang Chen, Christina Delimitrou, and José F Martínez. 2019. Parties: Qos-aware resource partitioning for multiple interactive services. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 107–120.Google ScholarDigital Library
- Shuang Chen, Angela Jin, Christina Delimitrou, and José F Martínez. 2022. ReTail: Opting for Learning Simplicity to Enable QoS-Aware Power Management in the Cloud. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 155–168.Google ScholarCross Ref
- Chih-Hsun Chou, Laxmi N Bhuyan, and Daniel Wong. 2019. μ DPM: Dynamic power management for the microsecond era. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 120–132.Google ScholarCross Ref
- Chih-Hsun Chou, Daniel Wong, and Laxmi N Bhuyan. 2016. Dynsleep: Fine-grained power management for a latency-critical data center application. In Proceedings of the 2016 International Symposium on Low Power Electronics and Design. 212–217.Google ScholarDigital Library
- Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and qos-aware cluster management. ACM SIGPLAN Notices 49, 4 (2014), 127–144.Google ScholarDigital Library
- Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, 2018. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018).Google Scholar
- Chang-Hong Hsu, Yunqi Zhang, Michael A. Laurenzano, David Meisner, Thomas F. Wenisch, Jason Mars, Lingjia Tang, and Ronald G. Dreslinski. 2015. Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting. high-performance computer architecture (2015).Google Scholar
- Myeongjae Jeon, Yuxiong He, Sameh Elnikety, Alan L. Cox, and Scott Rixner. 2013. Adaptive parallelism for web search. european conference on computer systems (2013).Google Scholar
- Harshad Kasture, Davide B. Bartolini, Nathan Beckmann, and Daniel Sanchez. 2015. Rubik: fast analytical power management for latency-critical systems. international symposium on microarchitecture (2015).Google ScholarDigital Library
- Harshad Kasture and Daniel Sanchez. 2014. Ubik: efficient cache sharing with strict qos for latency-critical workloads. architectural support for programming languages and operating systems (2014).Google ScholarDigital Library
- Harshad Kasture and Daniel Sanchez. 2016. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In 2016 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 1–10.Google ScholarCross Ref
- Wonyoung Kim, Meeta S. Gupta, Gu-Yeon Wei, and David Brooks. 2008. System level analysis of fast, per-core DVFS using on-chip switching regulators. high-performance computer architecture (2008).Google Scholar
- Young Geun Kim and Carole-Jean Wu. 2020. Autoscale: Energy efficiency optimization for stochastic edge inference using reinforcement learning. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1082–1096.Google ScholarCross Ref
- Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv: Learning (2015).Google Scholar
- Yanpei Liu, Guilherme Cox, Qingyuan Deng, Stark C Draper, and Ricardo Bianchini. 2016. Fastcap: An efficient and fair algorithm for power capping in many-core systems. In 2016 IEEE International symposium on performance analysis of systems and software (ISPASS). IEEE, 57–68.Google ScholarCross Ref
- David Lo, Liqun Cheng, Rama K. Govindaraju, Luiz Andre Barroso, and Christos Kozyrakis. 2014. Towards energy proportionality for large-scale latency-critical workloads. international symposium on computer architecture (2014).Google Scholar
- David Meisner and Thomas F. Wenisch. 2012. DreamWeaver: architectural support for deep sleep. architectural support for programming languages and operating systems (2012).Google ScholarDigital Library
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature (2015).Google Scholar
- Ciamac C Moallemi and Mehmet Saglam. 2010. The cost of latency. SSRN eLibrary (2010).Google Scholar
- Charles Reiss, Alexey Tumanov, Gregory R Ganger, Randy H Katz, and Michael A Kozuch. 2012. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the third ACM symposium on cloud computing. 1–13.Google ScholarDigital Library
- Efraim Rotem, Alon Naveh, Doron Rajwan, Avinash N. Ananthakrishnan, and Eliezer Weissmann. 2012. Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge. IEEE Micro (2012).Google ScholarDigital Library
- Eric Schurman and Jake Brutlag. 2009. The user and business impact of server delays, additional bytes, and http chunking in web search. In Velocity Web Performance and Operations Conference. oreilly.Google Scholar
- Arman Shehabi, Sarah Smith, Dale Sartor, Richard Brown, Magnus Herrlin, Jonathan Koomey, Eric Masanet, Nathaniel Horner, Inês Azevedo, and William Lintner. 2016. United states data center energy usage report. (2016).Google Scholar
- Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.Google ScholarCross Ref
- Ratnala Vinay, Pradip Sasmal, Chandrajit Pal, Toshihisa Haraki, Kazuhiro Tamura, Chirag Juyal, Mohamed Amir Gabir Elbakri, Sumohana Channappayya, and Amit Acharyya. 2022. Light Weight RL Based Run Time Power Management Methodology for Edge Devices. In 2022 29th IEEE International Conference on Electronics, Circuits and Systems (ICECS). IEEE, 1–4.Google Scholar
- Yiming Wang, Weizhe Zhang, Meng Hao, and Zheng Wang. 2021. Online power management for multi-cores: A reinforcement learning based approach. IEEE Transactions on Parallel and Distributed Systems 33, 4 (2021), 751–764.Google ScholarCross Ref
- Liang Zhou, Laxmi N. Bhuyan, and Kadangode K. Ramakrishnan. 2020. Gemini: Learning to Manage CPU Power for Latency-Critical Search Engines. international symposium on microarchitecture (2020).Google Scholar
- An Zou, Karthik Garimella, Benjamin Lee, Christopher Gill, and Xuan Zhang. 2020. F-LEMMA: Fast learning-based energy management for multi-/many-core processors. In Proceedings of the 2020 ACM/IEEE Workshop on Machine Learning for CAD. 43–48.Google ScholarDigital Library
Index Terms
- DeepPower: Deep Reinforcement Learning based Power Management for Latency Critical Applications in Multi-core Systems
Recommendations
Analysis of dynamic power management on multi-core processors
ICS '08: Proceedings of the 22nd annual international conference on SupercomputingPower management of multi-core processors is extremely important because it allows power/energy savings when all cores are not used. OS directed power management according to ACPI (Advanced Power and Configurations Interface) specifications is the ...
Application and Thermal-reliability-aware Reinforcement Learning Based Multi-core Power Management
Special Issue on HALO for Energy-Constrained On-Chip Machine Learning, Part 2 and Regular PapersPower management through dynamic voltage and frequency scaling (DVFS) is one of the most widely adopted techniques. However, it impacts application reliability (due to soft errors, circuit aging, and deadline misses). However, increased power density ...
Adaptive power management using reinforcement learning
ICCAD '09: Proceedings of the 2009 International Conference on Computer-Aided DesignSystem level power management must consider the uncertainty and variability that comes from the environment, the application and the hardware. A robust power management technique must be able to learn the optimal decision from past history and improve ...
Comments