skip to main content
10.1145/1555271.1555273acmconferencesArticle/Chapter ViewAbstractPublication PagesicacConference Proceedingsconference-collections
research-article

Automatic exploration of datacenter performance regimes

Published: 19 June 2009 Publication History

Abstract

Horizontally scalable Internet services present an opportunity to use automatic resource allocation strategies for system management in the datacenter. In most of the previous work, a controller employs a performance model of the system to make decisions about the optimal allocation of resources. However, these models are usually trained offline or on a small-scale deployment and will not accurately capture the performance of the controlled application. To achieve accurate control of the web application, the models need to be trained directly on the production system and adapted to changes in workload and performance of the application. In this paper we propose to train the performance model using an exploration policy that quickly collects data from different performance regimes of the application. The goal of our approach for managing the exploration process is to strike a balance between not violating the performance SLAs and the need to collect sufficient data to train an accurate performance model, which requires pushing the system close to its capacity. We show that by using our exploration policy, we can train a performance model of a Web 2.0 application in less than an hour and then immediately use the model in a resource allocation controller.

References

[1]
J. Allspaw. The Art of Capacity Planning: Scaling Web Resources. O'Reilly Media, Inc., 2008.
[2]
S. Babu, N. Borisov, S. Duan, H. Herodotou, and V. Thummala. Automated experiment-driven management of (database) systems. In HotOS, 2009.
[3]
M. N. Bennani and D. A. Menasce. Resource allocation for autonomic data centers using analytic performance models. In ICAC, 2005.
[4]
P. Bodík, G. Friedman, L. Biewald, H. Levine, G. Candea, K. Patel, G. Tolle, J. Hui, A. Fox, M. I. Jordan, and D. Patterson. Combining visualization and statistical analysis to improve operator confidence and efficiency for failure detection and localization. In ICAC, 2005.
[5]
J. Chase, D. C. Anderson, P. N. Thakar, A. M. Vahdat, and R. P. Doyle. Managing energy and server resources in hosting centers. In Symposium on Operating Systems Principles (SOSP), 2001.
[6]
D. Kusic, J. O. Kephart, J. E. Hanson, N. Kandasamy, and G. Jiang. Power and performance management of virtualized computing environments via lookahead control. In ICAC'08: Proceedings of the 2008 International Conference on Autonomic Computing, pages 3--12, Washington, DC, USA, 2008. IEEE Computer Society.
[7]
X. Liu, J. Heo, L. Sha, and X. Zhu. Adaptive control of multi-tiered web applications using queueing predictor. Network Operations and Management Symposium, 2006. NOMS 2006. 10th IEEE/IFIP, pages 106--114, April 2006.
[8]
P. Shivam, S. Babu, and J. Chase. Active sampling for accelerated learning of performance models. In SysML, 2006.
[9]
P. Shivam, V. Marupadi, J. Chase, T. Subramaniam, and S. Babu. Cutting corners: Workbench automation for server benchmarking. In USENIX, 2008.
[10]
W. Sobel, S. Subramanyam, A. Sucharitakul, J. Nguyen, H. Wong, S. Patil, A. Fox, and D. Patterson. Cloudstone: Multi-platform, multi-language benchmark and measurement tools for web 2.0, 2008.
[11]
C. Stewart and K. Shen. Performance modeling and system management for multi-component online services. In NSDI, 2005.
[12]
R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning). The MIT Press, March 1998.
[13]
G. Tesauro, N. Jong, R. Das, and M. Bennani. A hybrid reinforcement learning aproach to autonomic resource allocation. In International Conference on Autonomic Computing (ICAC), 2006.
[14]
B. Urgaonkar, P. Shenoy, A. Chandra, and P. Goyal. Dynamic provisioning of multi-tier internet applications. In ICAC, 2005.
[15]
P. Vosshall. Amazon, Personal communication.
[16]
L. Wasserman. All of Nonparametric Statistics (Springer Texts in Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.

Cited By

View all
  • (2024)TraceUpscaler: Upscaling Traces to Evaluate Systems at High LoadProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629581(942-961)Online publication date: 22-Apr-2024
  • (2023)A Stochastic Approach to Determine the Optimal Number of Servers for Reliable and Energy Efficient Operation of Data CentersIEEE Transactions on Sustainable Computing10.1109/TSUSC.2022.32163508:2(153-164)Online publication date: 1-Apr-2023
  • (2022)Predictive Auto-Scaling of Multi-Tier Applications Using Performance Varying Cloud ResourcesIEEE Transactions on Cloud Computing10.1109/TCC.2019.294436410:1(595-607)Online publication date: 1-Jan-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ACDC '09: Proceedings of the 1st workshop on Automated control for datacenters and clouds
June 2009
64 pages
ISBN:9781605585857
DOI:10.1145/1555271
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2009

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. automatic control

Qualifiers

  • Research-article

Conference

ICAC '09
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)TraceUpscaler: Upscaling Traces to Evaluate Systems at High LoadProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629581(942-961)Online publication date: 22-Apr-2024
  • (2023)A Stochastic Approach to Determine the Optimal Number of Servers for Reliable and Energy Efficient Operation of Data CentersIEEE Transactions on Sustainable Computing10.1109/TSUSC.2022.32163508:2(153-164)Online publication date: 1-Apr-2023
  • (2022)Predictive Auto-Scaling of Multi-Tier Applications Using Performance Varying Cloud ResourcesIEEE Transactions on Cloud Computing10.1109/TCC.2019.294436410:1(595-607)Online publication date: 1-Jan-2022
  • (2022)Applied Machine Learning for Cloud Resource ManagementMachine Learning for Computer Scientists and Data Analysts10.1007/978-3-030-96756-7_13(405-427)Online publication date: 22-Feb-2022
  • (2021)Estimation of Sharing Dependencies in Personal Storage Clouds Using Ensemble Learning ApproachesOperationalizing Multi-Cloud Environments10.1007/978-3-030-74402-1_4(65-85)Online publication date: 18-Sep-2021
  • (2020)Minimizing Financial Cost of DDoS Attack Defense in Clouds With Fine-Grained Resource ManagementIEEE Transactions on Network Science and Engineering10.1109/TNSE.2020.29814497:4(2541-2554)Online publication date: 1-Oct-2020
  • (2020)RHAS: robust hybrid auto-scaling for web applications in cloud computingCluster Computing10.1007/s10586-020-03148-5Online publication date: 20-Jul-2020
  • (2019)Green Computing with Geo-Distributed Heterogeneous Data Centers2019 Tenth International Green and Sustainable Computing Conference (IGSC)10.1109/IGSC48788.2019.8957179(1-6)Online publication date: Oct-2019
  • (2018)Profiling and Predicting Application Performance on the Cloud2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC)10.1109/UCC.2018.00011(21-30)Online publication date: Dec-2018
  • (2018)Minimizing Energy Costs for Geographically Distributed Heterogeneous Data CentersIEEE Transactions on Sustainable Computing10.1109/TSUSC.2018.28226743:4(318-331)Online publication date: 1-Oct-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media