ABSTRACT
With the ever growing popularity of cloud computing and web services, Internet companies are in need of increased computing capacity to serve the demand. However, power has become a major limiting factor prohibiting the growth in industry: it is often the case that no more servers can be added to datacenters without surpassing the capacity of the existing power infrastructure. In this work, we first investigate the power utilization in Facebook datacenters. We observe that the combination of provisioning for peak power usage, highly fluctuating traffic, and multi-level power delivery infrastructure leads to significant power budget fragmentation problem and inefficiently low power utilization. To address this issue, our insight is that heterogeneity of power consumption patterns among different services provides opportunities to re-shape the power profile of each power node by re-distributing services. By grouping services with asynchronous peak times under the same power node, we can reduce the peak power of each node and thus creating more power head-rooms to allow more servers hosted, achieving higher throughput. Based on this insight, we develop a workload-aware service placement framework to systematically spread the service instances with synchronous power patterns evenly under the power supply tree, greatly reducing the peak power draw at power nodes. We then leverage dynamic power profile reshaping to maximally utilize the headroom unlocked by our placement framework. Our experiments based on real production workload and power traces show that we are able to host up to 13% more machines in production, without changing the underlying power infrastructure. Utilizing the unleashed power headroom with dynamic reshaping, we achieve up to an estimated total of 15% and 11% throughput improvement for latency-critical service and batch service respectively at the same time, with up to 44% of energy slack reduction.
- Baris Aksanli, Eddie Pettis, and Tajana Rosing. 2013. Architecting Efficient Peak Power Shaving Using Batteries in Data Centers.Google Scholar
- Theophilus Benson, Aditya Akella, and David A Maltz. 2010. Network traffic characteristics of data centers in the wild Proceedings of the 10th ACM SIGCOMM conference on Internet measurement. ACM, 267--280. Google ScholarDigital Library
- Arka A Bhattacharya, David Culler, Aman Kansal, Sriram Govindan, and Sriram Sankar. 2013. The need for speed and stability in data center power capping. Sustainable Computing: Informatics and Systems Vol. 3, 3 (2013), 183--193.Google ScholarCross Ref
- Alex D. Breslow, Ananta Tiwari, Martin Schulz, Laura Carrington, Lingjia Tang, and Jason Mars. 2013. Enabling Fair Pricing on HPC Systems with Node Sharing Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '13). Google ScholarDigital Library
- Jeffrey S Chase, Darrell C Anderson, Prachi N Thakar, Amin M Vahdat, and Ronald P Doyle. 2001. Managing energy and server resources in hosting centers. ACM SIGOPS Operating Systems Review Vol. 35, 5 (2001), 103--116. Google ScholarDigital Library
- Hao Chen, Can Hankendi, Michael C Caramanis, and Ayse K Coskun. 2013. Dynamic server power capping for enabling data center participation in power markets. In Proceedings of the International Conference on Computer-Aided Design. IEEE Press, 122--129. Google ScholarDigital Library
- Howard David, Eugene Gorbatov, Ulf R Hanebutte, Rahul Khanna, and Christian Le. 2010. RAPL: memory power estimation and capping. In Low-Power Electronics and Design (ISLPED), 2010 ACM/IEEE International Symposium on. IEEE, 189--194. Google ScholarDigital Library
- Qingyuan Deng, David Meisner, Abhishek Bhattacharjee, Thomas F Wenisch, and Ricardo Bianchini. 2012. Coscale: Coordinating cpu and memory system dvfs in server systems Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 143--154. Google ScholarDigital Library
- W. El-Essawy, A. P. Ferreira, J. C. Rubio, T. Keller, K. Rajamani, and M. Ware. 2011. Enabling Real-Time Data Center Energy Management. (2011).Google Scholar
- Songchun Fan, Seyed Majid Zahedi, and Benjamin C Lee. 2016. The computational sprinting game. In ACM SIGOPS Operating Systems Review, Vol. Vol. 50. ACM, 561--575. Google ScholarDigital Library
- Xiaobo Fan, Wolf-Dietrich Weber, and Luiz Andre Barroso. 2007. Power provisioning for a warehouse-sized computer. In ACM SIGARCH Computer Architecture News, Vol. Vol. 35. 13--23. Google ScholarDigital Library
- Shuo Fang, Renuga Kanagavelu, Bu-Sung Lee, Chuan Heng Foh, and Khin Mi Mi Aung. 2013. Power-Efficient Virtual Machine Placement and Migration in Data Centers.Google Scholar
- Xing Fu, Xiaorui Wang, and Charles Lefurgy. 2011. How Much Power Oversubscription is Safe and Allowed in Data Centers Proceedings of the 8th ACM International Conference on Autonomic Computing (ICAC '11). ACM, New York, NY, USA, 21--30. Google ScholarDigital Library
- A. Gandhi, Yuan Chen, D. Gmach, M. Arlitt, and M. Marwah. 2011. Minimizing data center SLA violations and power consumption via hybrid resource provisioning. In Green Computing Conference and Workshops (IGCC), 2011 International. Google ScholarDigital Library
- Anshul Gandhi, Mor Harchol-Balter, Rajarshi Das, Jeffrey O Kephart, and Charles Lefurgy. 2009. Power capping via forced idleness. (2009).Google Scholar
- Lakshmi Ganesh, Jie Liu, Suman Nath, and Feng Zhao. 2009. Unleash stranded power in data centers with RackPacker. Workshop on Energy-Efficient Design (WEED) (2009).Google Scholar
- Sriram Govindan, Jeonghwan Choi, Bhuvan Urgaonkar, Anand Sivasubramaniam, and Andrea Baldini. 2009. Statistical profiling-based techniques for effective power provisioning in data centers. In Proceedings of the 4th ACM European conference on Computer systems (EuroSys). 317--330. Google ScholarDigital Library
- Sriram Govindan, Anand Sivasubramaniam, and Bhuvan Urgaonkar. 2011. Benefits and Limitations of Tapping into Stored Energy for Datacenters ACM SIGARCH Computer Architecture News. Google ScholarDigital Library
- Sriram Govindan, Di Wang, Anand Sivasubramaniam, and Bhuvan Urgaonkar. 2012. Leveraging Stored Energy for Handling Power Emergencies in Aggressively Provisioned Datacenters. In ACM SIGARCH Computer Architecture News. Google ScholarDigital Library
- Can Hankendi, Sherief Reda, and Ayse Kivilcim Coskun. 2013. vCap: Adaptive power capping for virtualized servers ISLPED. Google ScholarDigital Library
- Chang-Hong Hsu, Yunqi Zhang, Michael A Laurenzano, David Meisner, Thomas Wenisch, Jason Mars, Lingjia Tang, and Ronald G Dreslinski. 2015. Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on. IEEE, 271--282.Google Scholar
- Canturk Isci and Margaret Martonosi. 2003. Runtime power monitoring in high-end processors: Methodology and empirical data Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 93. Google ScholarDigital Library
- Ana Klimovic, Christos Kozyrakis, Eno Thereksa, Binu John, and Sanjeev Kumar. 2016. Flash storage disaggregation. In Proceedings of the Eleventh European Conference on Computer Systems (EuroSys). Google ScholarDigital Library
- Ana Klimovic, Heiner Litz, and Christos Kozyrakis. 2017. ReFlex: Remote Flash? Local Flash. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 345--359. Google ScholarDigital Library
- Vasileios Kontorinis, Liuyi Eric Zhang, Baris Aksanli, Jack Sampson, Houman Homayoun, Eddie Pettis, Dean M Tullsen, and Tajana Simunic Rosing. 2012. Managing distributed ups energy for effective power capping in data centers Computer Architecture (ISCA), 2012 39th Annual International Symposium on. IEEE, 488--499. Google ScholarDigital Library
- Charles Lefurgy, Xiaorui Wang, and Malcolm Ware. 2008. Power capping: a prelude to power shifting. Cluster Computing Vol. 11, 2 (2008), 183--195. Google ScholarDigital Library
- Harold Lim, Aman Kansal, and Jie Liu. 2011. Power Budgeting for Virtualized Data Centers. In 2011 USENIX Annual Technical Conference (USENIX ATC'11). Google ScholarDigital Library
- David Lo, Liqun Cheng, Rama Govindaraju, Luiz André Barroso, and Christos Kozyrakis. 2014. Towards energy proportionality for large-scale latency-critical workloads ACM SIGARCH Computer Architecture News, Vol. Vol. 42. IEEE Press, 301--312. Google ScholarDigital Library
- David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: improving resource efficiency at scale. In ACM SIGARCH Computer Architecture News, Vol. Vol. 43. ACM, 450--462. Google ScholarDigital Library
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research Vol. 9 (2008).Google Scholar
- Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th annual IEEE/ACM International Symposium on Microarchitecture. ACM, 248--259. Google ScholarDigital Library
- David Meisner, Christopher M Sadler, Luiz André Barroso, Wolf-Dietrich Weber, and Thomas F Wenisch. 2011 a. Power management of online data-intensive services International Symposium on Computer Architecture (ISCA). Google ScholarDigital Library
- David Meisner, Christopher M Sadler, Luiz André Barroso, Wolf-Dietrich Weber, and Thomas F Wenisch. 2011 b. Power management of online data-intensive services Computer Architecture (ISCA), 2011 38th Annual International Symposium on. IEEE, 319--330. Google ScholarDigital Library
- Ripal Nathuji and Karsten Schwan. 2007. Virtualpower: Coordinated Power Management in Virtualized Enterprise Systems ACM SIGOPS Operating Systems Review. Google ScholarDigital Library
- Steven Pelley, David Meisner, Pooya Zandevakili, Thomas F. Wenisch, and Jack Underwood. 2010. Power routing: dynamic power provisioning in the data center Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS. Google ScholarDigital Library
- Vinicius Petrucci, Michael A Laurenzano, John Doherty, Yunqi Zhang, Daniel Mosse, Jason Mars, and Lingjia Tang. 2015. Octopus-man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers. In International Symposium on High Performance Computer Architecture (HPCA).Google ScholarCross Ref
- Ramya Raghavendra, Parthasarathy Ranganathan, Vanish Talwar, Zhikui Wang, and Xiaoyun Zhu. 2008. No power struggles: Coordinated multi-level power management for the data center ACM SIGARCH Computer Architecture News, Vol. Vol. 36. ACM, 48--59. Google ScholarDigital Library
- Parthasarathy Ranganathan, Phil Leech, David Irwin, and Jeffrey Chase. 2006. Ensemble-level power management for dense blade servers ACM SIGARCH Computer Architecture News, Vol. Vol. 34. IEEE Computer Society, 66--77. Google ScholarDigital Library
- Sherief Reda, Ryan Cochran, and Ayse K Coskun. 2012. Adaptive power capping for servers with multithreaded workloads. IEEE Micro Vol. 5, 32 (2012), 64--75. Google ScholarDigital Library
- Arjun Roy, Hongyi Zeng, Jasmeet Bagga, George Porter, and Alex C Snoeren. 2015. Inside the social network's (datacenter) network. In ACM SIGCOMM Computer Communication Review, Vol. Vol. 45. Google ScholarDigital Library
- Michael Steinbach, Levent Ertöz, and Vipin Kumar. 2004. The challenges of clustering high dimensional data. In New directions in statistical physics. Springer, 273--309.Google Scholar
- Balaji Subramaniam and Wu-chun Feng. 2015. Towards energy-proportional computing using subsystem-level power management. arXiv preprint arXiv:1501.02724 (2015).Google Scholar
- Augusto Vega, Alper Buyuktosunoglu, Heather Hanson, Pradip Bose, and Srinivasan Ramani. 2013. Crank it up or dial it down: coordinated multiprocessor frequency and folding control. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 210--221. Google ScholarDigital Library
- Akshat Verma, Puneet Ahuja, and Anindya Neogi. 2008. pMapper: Power and Migration Cost Aware Application Placement in Virtualized Systems. In ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing. Google ScholarDigital Library
- Di Wang, Chuangang Ren, Sriram Govindan, Anand Sivasubramaniam, Bhuvan Urgaonkar, Aman Kansal, and Kushagra Vaid. 2013. ACE: Abstracting, characterizing and exploiting datacenter power demands.Google Scholar
- Di Wang, Chuangang Ren, Anand Sivasubramaniam, Bhuvan Urgaonkar, and Hosam Fathy. 2012. Energy Storage in Datacenters: What, Where, and How Much? ACM SIGMETRICS Performance Evaluation Review. Google ScholarDigital Library
- Xiaorui Wang, Ming Chen, Charles Lefurgy, and Tom W Keller. 2009. SHIP: Scalable hierarchical power control for large-scale data centers Parallel Architectures and Compilation Techniques, 2009. PACT'09. 18th International Conference on. IEEE, 91--100. Google ScholarDigital Library
- Qiang Wu, Qingyuan Deng, Ganesh, Chang-Hong Hsu, Yun Jin, Sanjeev Kumar, Bin Li, Justin Meza, and Yee Jiun Song. 2016. Dynamo: Facebook's Data Center-Wide Power Management System International Symposium on Computer Architecture (ISCA). Google ScholarDigital Library
- Jianguo Yao, Xue Liu, Wenbo He, and Ashikur Rahman. 2012. Dynamic Control of Electricity Cost with Power Demand Smoothing and Peak Shaving for Distributed Internet Data Centers.Google Scholar
- Yunqi Zhang, George Prekas, Giovanni Matteo Fumarola, Marcus Fontoura, Inigo Goiri, and Ricardo Bianchini. 2016. History-Based Harvesting of Spare Cycles and Storage in Large-Scale Datacenters 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). Google ScholarDigital Library
Index Terms
- SmoothOperator: Reducing Power Fragmentation and Improving Power Utilization in Large-scale Datacenters
Recommendations
SmoothOperator: Reducing Power Fragmentation and Improving Power Utilization in Large-scale Datacenters
ASPLOS '18With the ever growing popularity of cloud computing and web services, Internet companies are in need of increased computing capacity to serve the demand. However, power has become a major limiting factor prohibiting the growth in industry: it is often ...
Virtual machine power metering and provisioning
SoCC '10: Proceedings of the 1st ACM symposium on Cloud computingVirtualization is often used in cloud computing platforms for its several advantages in efficiently managing resources. However, virtualization raises certain additional challenges, and one of them is lack of power metering for virtual machines (VMs). ...
Energy efficient temporal load aware resource allocation in cloud computing datacenters
Cloud computing datacenters consume huge amounts of energy, which has high cost and large environmental impact. There has been significant amount of research on dynamic power management, which shuts down unutilized equipment in a datacenter to reduce ...
Comments