skip to main content
10.1145/3542929.3563501acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

GreenDRL: managing green datacenters using deep reinforcement learning

Published: 07 November 2022 Publication History

Abstract

Managing datacenters to maximize efficiency and sustain-ability is a complex and challenging problem. In this work, we explore the use of deep reinforcement learning (RL) to manage "green" datacenters, bringing a robust approach for designing efficient management systems that account for specific workload, datacenter, and environmental characteristics. We design and evaluate GreenDRL, a system that combines a deep RL agent with simple heuristics to manage workload, energy consumption, and cooling in the presence of onsite generation of renewable energy to minimize brown energy consumption and cost. Our design addresses several important challenges, including adaptability, robustness, and effective learning in an environment comprising an enormous state/action space and multiple stochastic processes. Evaluation results (using simulation) show that GreenDRL is able to learn important principles such as delaying deferrable jobs to leverage variable generation of renewable (solar) energy, and avoiding the use of power-intensive cooling settings even at the expense of leaving some renewable energy unused. In an environment where a fraction of the workload is deferrable by up to 12 hours, GreenDRL can reduce grid electricity consumption for days with different solar energy generation and temperature characteristics by 32--54% compared to a FIFO baseline approach. GreenDRL also matches or outperforms a management approach that uses linear programming together with oracular future knowledge to manage workload and server energy consumption, but leaves the management of the cooling system to a separate (and independent) controller. Overall, our work shows that deep RL is a promising technique for building efficient management systems for green datacenters.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI).
[2]
Bilge Acun, Benjamin Lee, Fiodar Kazhamiaka, Kiwan Maeng, Manoj Chakkaravarthy, Udit Gupta, David Brooks, and Carole-Jean Wu. 2022. Carbon Explorer: A Holistic Approach for Designing Carbon Aware Datacenters. arXiv:2201.10036
[3]
U.S. Energy Information Administration. 2022. August 2022 Monthly Energy Review.
[4]
Yixin Bao, Yanghua Peng, and Chuan Wu. 2019. Deep Learning-based Job Placement in Distributed Machine Learning Clusters. In Proceedings of the IEEE Internaltional Conference on Computer Communications (INFOCOM).
[5]
Yoshua Bengio. 2012. Practical Recommendations for Gradient-based Training of Deep Rrchitectures. arXiv:1206.5533
[6]
Josep L Berral, Inigo Goiri, Thu D Nguyen, Ricard Gavalda, Jordi Torres, and Ricardo Bianchini. 2014. Building Green Cloud Services at Low Cost. In Proceedings of the International Conference on Distributed Computing Systems (ICDCS).
[7]
Andrew Chung, Jun Woo Park, and Gregory R. Ganger. 2018. Stratus: Cost-Aware Container Scheduling in the Public Cloud. In Proceedings of the ACM Symposium on Cloud Computing (SoCC).
[8]
Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. 2015. Fast and Accurate Deep Network Learning by Exponential Linear Units (elus). arxiv preprint arxiv:1511.07289
[9]
Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the Symposium on Operating Systems Design & Implementation (OSDI).
[10]
Pamela Delgado, Florin Dinu, Anne-Marie Kermarrec, and Willy Zwaenepoel. 2015. Hawk: Hybrid Datacenter Scheduling. In Proceedings of the USENIX Annual Technical Conference (ATC).
[11]
Anuroop Desu, Udaya Puvvadi, Tyler Stachecki, Sagar Vishwakarma, Sadegh Khalili, Kanad Ghose, and Bahgat G. Sammakia. 2021. Latency-Aware Dynamic Server and Cooling Capacity Provisioner for Data Centers. In Proceedings of the ACM Symposium on Cloud Computing (SoCC).
[12]
Nosayba El-Sayed, Ioan A Stefanovici, George Amvrosiadis, Andy A Hwang, and Bianca Schroeder. 2012. Temperature management in data centers: why some (might) like it hot. In Proceedings of the ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS).
[13]
Xiaobo Fan, Wolf-Dietrich Weber, and Luiz Andre Barroso. 2007. Power provisioning for a warehouse-sized computer. ACM SIGARCH Computer Architecture News 35, 2 (2007).
[14]
Inigo Goiri, William Katsak, Kien Le, Thu D. Nguyen, and Ricardo Bianchini. 2013. Parasol and GreenSwitch: Managing Datacenters Powered by Renewable Energy In Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[15]
Inigo Goiri, Thu D. Nguyen, and Ricardo Bianchini. 2015. CoolAir: Temperature- and Variation-Aware Management for Free-Cooled Datacenters. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[16]
Íñigo Goiri, Kien Le, Thu D. Nguyen, Jordi Guitart, Jordi Torres, and Ricardo Bianchini. 2012. GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks. In Proceedings of the ACM European Conference on Computer Systems (EuroSys).
[17]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT press.
[18]
Sriram Govindan, Di Wang, Anand Sivasubramaniam, and Bhuvan Urgaonkar. 2012. Leveraging stored energy for handling power emergencies in aggressively provisioned datacenters. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[19]
Evan Greensmith, Peter L Bartlett, and Jonathan Baxter. 2004. Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research 5, Nov (2004).
[20]
Fanxin Kong and Xue Liu. 2014. A survey on green-energy-aware power management for datacenters. ACM Computing Surveys (CSUR) 47, 2 (2014).
[21]
Vasileios Kontorinis, Liuyi Eric Zhang, Baris Aksanli, Jack Sampson, Houman Homayoun, Eddie Pettis, Dean M Tullsen, and Tajana Simunic Rosing. 2012. Managing distributed ups energy for effective power capping in data centers. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA).
[22]
Nevena Lazic, Tyler Lu, Craig Boutilier, Moonkyung Ryu, Eehern Wong, Binz Roy, and Greg Imwalle. 2018. Data Center Cooling Using Model-Predictive Control. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS).
[23]
Chao Li, Amer Qouneh, and Tao Li. 2012. iSwitch: Coordinating and optimizing renewable energy powered server clusters. ACM SIGARCH Computer Architecture News 40, 3 (2012).
[24]
Ning Liu, Zhe Li, Jielong Xu, Zhiyuan Xu, Sheng Lin, Qinru Qiu, Jian Tang, and Yanzhi Wang. 2017. A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. In Proceedings of the IEEE International Conference on Distributed Computing Systems (ICDCS).
[25]
Zhenhua Liu, Yuan Chen, Cullen Bash, Adam Wierman, Daniel Gmach, Zhikui Wang, Manish Marwah, and Chris Hyser. 2012. Renewable and Cooling Aware Workload Management for Sustainable Data Centers. In Proceedings of the ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS).
[26]
Uri Lublin and Dror G. Feitelson. 2003. The Workload on Parallel Supercomputers: Modeling the Characteristics of Rigid Jobs. J. Parallel Distrib. Comput. 63, 11 (2003).
[27]
Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula. 2016. Resource Management with Deep Reinforcement Learning. In Proceedings of the ACM Workshop on Hot Topics in Networks (HotNets).
[28]
Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. 2019. Learning Scheduling Algorithms for Data Processing Clusters. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM).
[29]
Hongzi Mao, Shaileshh Bojja Venkatakrishnan, Malte Schwarzkopf, and Mohammad Alizadeh. 2018. Variance Reduction for Reinforcement Learning in Input-driven Environments. arxiv preprint arxiv:1807.02264
[30]
Eric Masanet, Arman Shehabi, Nuoa Lei, Sarah Smith, and Jonathan Koomey. 2020. Recalibrating Global Data Center Energy-use Estimates. Science 367, 6481 (2020).
[31]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-Level Control Through Deep Reinforcement Learning. Nature 518, 7540 (2015).
[32]
Justin Moore, Jeff Chase, Parthasarathy Ranganathan, and Ratnesh Sharma. 2005. Making Scheduling "Cool": Temperature-Aware Workload Placement in Data Centers. In Proceedings of the USENIX Annual Technical Conference (ATC).
[33]
Ashvin Nair, Abhishek Gupta, Murtaza Dalal, and Sergey Levine. 2020. AWAC: Accelerating Online Reinforcement Learning with Offline Datasets. arXiv:2006.09359
[34]
Ehsan Pakbaznia and Massoud Pedram. 2009. Minimizing Data Center Cooling and Server Power Costs. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design.
[35]
Yongyi Ran, Han Hu, Xin Zhou, and Yonggang Wen. 2019. DeepEE: Joint Optimization of Job Scheduling and Cooling Control for Data Center Energy Efficiency Using Deep Reinforcement Learning. In Proceedings of the IEEE International Conference on Distributed Computing Systems (ICDCS).
[36]
Charles Reiss, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz, and Michael A. Kozuch. 2012. Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis. In Proceedings of the ACM Symposium on Cloud Computing (SoCC).
[37]
Jürgen Schmidhuber. 2015. Deep Learning in Neural Networks: An Overview. Neural Networks 61 (2015).
[38]
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust Region Policy Optimization. In Proceedings of the International Conference on Machine Learning.
[39]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. arXiv:1707.06347
[40]
Richard S Sutton and Andrew G Barto. 2018. Reinforcement Learning: An Introduction. MIT press.
[41]
Richard S. Sutton, David McAllester, Satinder Singh, and Yishay Mansour. 1999. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS).
[42]
Rahul Urgaonkar, Bhuvan Urgaonkar, Michael J Neely, and Anand Sivasubramaniam. 2011. Optimal Power Cost Management Using Stored Energy in Data Centers. In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS).
[43]
Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale Cluster Management at Google with Borg. In Proceedings of the European Conference on Computer Systems (Eurosys).
[44]
Qizhen Weng, Wencong Xiao, Yinghao Yu, Wei Wang, Cheng Wang, Jian He, Yong Li, Liping Zhang, Wei Lin, and Yu Ding. 2022. MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI).
[45]
Ronald J Williams. 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement learning. Machine learning 8, 3--4 (1992).
[46]
Deliang Yi, Xin Zhou, Yonggang Wen, and Rui Tan. 2019. Toward Efficient Compute-Intensive Job Allocation for Green Data Centers: a Deep Reinforcement Learning Approach. In Proceedings of the IEEE International Conference on Distributed Computing Systems (ICDCS).

Cited By

View all
  • (2024)Exploring the Efficiency of Renewable Energy-based Modular Data Centers at ScaleProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698544(552-569)Online publication date: 20-Nov-2024
  • (2024)Rethinking Low-Carbon Edge Computing System Design with Renewable Energy SharingProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673080(950-960)Online publication date: 12-Aug-2024
  • (2024)A Deep Reinforcement Learning Based Computation Resource Allocation Strategy for Multi-Scenario Advertising Systems2024 IEEE 6th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC)10.1109/IMCEC59810.2024.10575305(926-930)Online publication date: 24-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SoCC '22: Proceedings of the 13th Symposium on Cloud Computing
November 2022
574 pages
ISBN:9781450394147
DOI:10.1145/3542929
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. datacenter
  2. deep reinforcement learning
  3. green datacenter
  4. power management
  5. scheduling

Qualifiers

  • Research-article

Funding Sources

Conference

SoCC '22
Sponsor:
SoCC '22: ACM Symposium on Cloud Computing
November 7 - 11, 2022
California, San Francisco

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)267
  • Downloads (Last 6 weeks)24
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Exploring the Efficiency of Renewable Energy-based Modular Data Centers at ScaleProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698544(552-569)Online publication date: 20-Nov-2024
  • (2024)Rethinking Low-Carbon Edge Computing System Design with Renewable Energy SharingProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673080(950-960)Online publication date: 12-Aug-2024
  • (2024)A Deep Reinforcement Learning Based Computation Resource Allocation Strategy for Multi-Scenario Advertising Systems2024 IEEE 6th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC)10.1109/IMCEC59810.2024.10575305(926-930)Online publication date: 24-May-2024
  • (2024)A Multi-agent Reinforcement Learning Based CR Allocation Approach For Multi-Scenario Advertising Systems2024 IEEE 6th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC)10.1109/IMCEC59810.2024.10575234(1971-1976)Online publication date: 24-May-2024
  • (2024)Latency-Guaranteed Co-Location of Inference and Training for Reducing Data Center Expenses2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS60910.2024.00051(473-484)Online publication date: 23-Jul-2024
  • (2024)When Green Computing Meets Performance and Resilience SLOs2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-S)10.1109/DSN-S60304.2024.00015(17-22)Online publication date: 24-Jun-2024
  • (2023)Concurrent Carbon Footprint Reduction (C2FR) Reinforcement Learning Approach for Sustainable Data Center Digital Twin2023 IEEE 19th International Conference on Automation Science and Engineering (CASE)10.1109/CASE56687.2023.10260633(1-8)Online publication date: 26-Aug-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media