research-article

CPI²: CPU performance isolation for shared compute clusters

Authors:
Xiao Zhang

Google, Inc.

Google, Inc.
View Profile

,
Eric Tune

Google, Inc.

Google, Inc.
View Profile

,
Robert Hagmann

Google, Inc.

Google, Inc.
View Profile

,
Rohit Jnagal

Google, Inc.

Google, Inc.
View Profile

,
Vrigo Gokhale

Google, Inc.

Google, Inc.
View Profile

,
John Wilkes

Google, Inc.

Google, Inc.
View Profile

EuroSys '13: Proceedings of the 8th ACM European Conference on Computer SystemsApril 2013Pages 379–391https://doi.org/10.1145/2465351.2465388

Published:15 April 2013Publication History

EuroSys '13: Proceedings of the 8th ACM European Conference on Computer Systems

Pages 379–391

ABSTRACT

Performance isolation is a key challenge in cloud computing. Unfortunately, Linux has few defenses against performance interference in shared resources such as processor caches and memory buses, so applications in a cloud can experience unpredictable performance caused by other programs' behavior.

Our solution, CPI², uses cycles-per-instruction (CPI) data obtained by hardware performance counters to identify problems, select the likely perpetrators, and then optionally throttle them so that the victims can return to their expected behavior. It automatically learns normal and anomalous behaviors by aggregating data from multiple tasks in the same job.

We have rolled out CPI² to all of Google's shared compute clusters. The paper presents the analysis that lead us to that outcome, including both case studies and a large-scale evaluation of its ability to solve real production issues.

References

Alameldeen, A. R., and Wood, D. A. IPC considered harmful for multiprocessor workloads. IEEE Micro 26, 4 (July 2006), 8--17. Google ScholarDigital Library
Amazon Elastic Compute Cloud. http://aws.amazon.com/ec2/, 2008.Google Scholar
Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., and Harris, E. Reining in the outliers in Map-Reduce clusters using Mantri. In Proc. USENIX Symp. on Operating Systems Design and Implementation (OSDI) (Vancouver, Canada, Nov. 2010). Google ScholarDigital Library
Awasthi, M., Sudan, K., Balasubramonian, R., and Carter, J. Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches. In Proc. Int'l Symp. on High Performance Computer Architecture (HPCA) (Raleigh, NC, Feb. 2009).Google ScholarCross Ref
Barker, S. K., and Shenoy, P. Empirical evaluation of latency-sensitive application performance in the cloud. In Proc. 1st ACM Multimedia Systems (MMSys) (Phoenix, AZ, Feb. 2010). Google ScholarDigital Library
Barroso, L. A., Dean, J., and Holzle, U. Web search for a planet: the Google cluster architecture. In IEEE Micro (2003), pp. 22--28. Google ScholarDigital Library
Blagodurov, S., Zhuravlev, S., Dashti, M., and Fedorova, A. A case for NUMA-aware contention management on multicore systems. In Proc. USENIX Annual Technical Conf. (USENIX ATC) (Portland, OR, June 2011). Google ScholarDigital Library
Chiang, R. C., and Huang, H. H. TRACON: Interference-aware scheduling for data-intensive applications in virtualized environments. In Proc. Int'l Conf. for High Performance Computing, Networking, Storage and Analysis (SC) (Seattle, WA, Nov. 2011). Google ScholarDigital Library
Cho, S., and Jin, L. Managing distributed, shared L2 caches through OS-level page allocation. In Proc. Int'l Symp. on Microarchitecture (Micro) (Orlando, FL, Dec. 2006), pp. 455--468. Google ScholarDigital Library
Dai, J., Huang, J., Huang, S., Huang, B., and Liu, Y. HiTune: Dataflow-based performance analysis for big data cloud. In Proc. USENIX Annual Technical Conf. (USENIX ATC) (Portland, OR, June 2011). Google ScholarDigital Library
Dean, J., and Barroso, L. A. The tail at scale. Communications of the ACM 56, 2 (Feb. 2012), 74--80. Google ScholarDigital Library
Dean, J., and Ghemawat, S. MapReduce: simplified data processing on large clusters. In Proc. USENIX Symp. on Operating Systems Design and Implementation (OSDI) (San Francisco, CA, Dec. 2004), pp. 137--150. Google ScholarDigital Library
Eranian, S. perfmon2: the hardware-based performance monitoring interface for Linux. http://perfmon2.sourceforge.net/, 2008.Google Scholar
Fedorova, A., Seltzer, M., and Smith, M. D. Improving performance isolation on chip multiprocessors via an operating system scheduler. In Proc. Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT) (Brasov, Romania, Sept. 2007), pp. 25--36. Google ScholarDigital Library
Wikipedia: Generalized extreme value distribution. http://en.wikipedia.org/wiki/Generalized_extreme_value_distribution, 2011.Google Scholar
Gong, Z., Gu, X., and Wilkes, J. PRESS: PRedictive Elastic ReSource Scaling for cloud systems. In Proc. 6th IEEE/IFIP Int'l Conf. on Network and Service Management (CNSM 2010) (Niagara Falls, Canada, Oct. 2010).Google ScholarCross Ref
Govindan, S., Liu, J., Kansal, A., and Sivasubramaniam, A. Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines. In Proc. ACM Symp. on Cloud Computing (SoCC) (Cascais, Portugal, Oct. 2011). Google ScholarDigital Library
Apache Hadoop Project. http://hadoop.apache.org/, 2009.Google Scholar
Isard, M., Budiu, M., Yu, Y., Birrell, A., and Fetterly, D. Dryad: distributed data-parallel programs from sequential building blocks. In Proc. European Conf. on Computer Systems (EuroSys) (Lisbon, Portugal, Apr. 2007). Google ScholarDigital Library
Iyer, R., Illikkal, R., Tickoo, O., Zhao, L., Apparao, P., and Newell, D. VM3: measuring, modeling and managing VM shared resources. In Computer Networks (Dec. 2009), vol. 53, pp. 2873--2887. Google ScholarDigital Library
Kambadur, M., Moseley, T., Hank, R., and Kim, M. A. Measuring interference between live datacenter applications. In Proc. Int'l Conf. for High Performance Computing, Networking, Storage and Analysis (SC) (Salt Lake City, UT, Nov. 2012). Google ScholarDigital Library
Koh, Y., Knauerhase, R., Brett, P., Bowman, M., Wen, Z., and Pu, C. An analysis of performance interference effects in virtual environments. In Proc. IEEE Int'l Symposium on Performance Analysis of Systems and Software (ISPASS) (San Jose, CA, Apr. 2007).Google ScholarCross Ref
Mars, J., Vachharajani, N., Hundt, R., and Soffa, M. L. Contention aware execution: online contention detection and response. In Int'l Symposium on Code Generation and Optimization (CGO) (Toronto, Canada, Apr. 2010). Google ScholarDigital Library
Matthews, J. N., Hu, W., Hapuarachchi, M., Deshane, T., Dimatos, D., Hamilton, G., McCabe, M., and Owens, J. Quantifying the performance isolation properties of virtualization systems. In Proc. Workshop on Experimental Computer Science (San Diego, California, June 2007). Google ScholarDigital Library
Meisner, D., Sadler, C. M., Barroso, L. A., Weber, W.-D., and Wenisch, T. F. Power management of online data-intensive services. In Proc. Int'l Symposium on Computer Architecture (ISCA) (San Jose, CA, June 2011). Google ScholarDigital Library
Melnik, S., Gubarev, A., Long, J. J., Romer, G., Shivakumar, S., Tolton, M., and Vassilakis, T. Dremel: Interactive analysis of web-scale datasets. In Proc. of the Int'l Conf. on Very Large Data Bases (VLDB) (Singapore, Sept. 2010), pp. 330--339. Google ScholarDigital Library
Menage, P. Linux control groups. http://www.kernel.org/doc/Documentation/cgroups/cgroups.txt, 2007.Google Scholar
Nathuji, R., Kansal, A., and Ghaffarkhah, A. Q-Clouds: managing performance interference effects for QoSaware clouds. In Proc. European Conf. on Computer Systems (EuroSys) (Paris, France, Apr. 2010). Google ScholarDigital Library
Olston, C., Reed, B., Srivastava, U., Kumar, R., and Tomkins, A. Pig Latin: a not-so-foreign language for data processing. In Proc. ACM SIGMOD Conference (Vancouver, Canada, June 2008). Google ScholarDigital Library
Reiss, C., Tumanov, A., Ganger, G., Katz, R., and Kozuch, M. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proc. ACM Symp. on Cloud Computing (SoCC) (San Jose, CA, Oct. 2012). Google ScholarDigital Library
Ren, G., Tune, E., Moseley, T., Shi, Y., Rus, S., and Hundt, R. Google-Wide Profiling: a continuous profiling infrastructure for data centers. IEEE Micro, 4 (July 2010), 65--79. Google ScholarDigital Library
Sanchez, D., and Kozyrakis, C. Vantage: scalable and efficient fine-grain cache partitioning. In Proc. Int'l Symposium on Computer Architecture (ISCA) (San Jose, CA, 2011). Google ScholarDigital Library
Schurman, E., and Brutlag, J. The user and business impact of server delays, additional bytes, and HTTP chunking in web search. In Proc. Velocity, Web Performance and Operations Conference (2009).Google Scholar
Shen, Z., Subbiah, S., Gu, X., and Wilkes, J. Cloud-Scale: Elastic resource scaling for multi-tenant cloud systems. In Proc. ACM Symp. on Cloud Computing (SoCC) (Cascais, Portugal, Oct. 2011). Google ScholarDigital Library
Suh, G. E., Devadas, S., and Rudolph, L. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proc. Int'l Symp. on High Performance Computer Architecture (HPCA) (Boston, MA, Feb 2002). Google ScholarDigital Library
Suh, G. E., Rudolph, L., and Devadas, S. Dynamic partitioning of shared cache memory. The Journal of Supercomputing 28 (2004), 7--26. Google ScholarDigital Library
Turner, P., Rao, B., and Rao, N. CPU bandwidth control for CFS. In Proc. Linux Symposium (July 2010), pp. 245--254.Google Scholar
West, R., Zaroo, P., Waldspurger, C. A., and Zhang, X. Online cache modeling for commodity multicore processors. Operating Systems Review 44, 4 (Dec. 2010). Google ScholarDigital Library
Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R., and Stoica, I. Improving MapReduce performance in heterogeneous environments. In Proc. USENIX Symp. on Operating Systems Design and Implementation (OSDI) (San Diego, CA, Dec. 2008). Google ScholarDigital Library
Zhang, X., Dwarkadas, S., Folkmanis, G., and Shen, K. Processor hardware counter statistics as a first-class system resource. In Proc. Workshop on Hot Topics in Operating Systems (HotOS) (San Diego, CA, May 2007). Google ScholarDigital Library
Zhang, X., Dwarkadas, S., and Shen, K. Hardware execution throttling for multi-core resource management. In Proc. USENIX Annual Technical Conf. (USENIX ATC) (Santa Diego, CA, June 2009). Google ScholarDigital Library
Zhao, L., Iyer, R., Illikkal, R., Moses, J., Newell, D., and Makineni, S. CacheScouts: Fine-grain monitoring of shared caches in CMP platforms. In Proc. Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT) (Brasov, Romania, Sept. 2007), pp. 339--352. Google ScholarDigital Library
Zhuravlev, S., Blagodurov, S., and Fedorova, A. Managing contention for shared resources on multicore processors. In Proc. Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (Pittsburgh, PA, Mar. 2010), pp. 129--142.Google Scholar

Index Terms

CPI²: CPU performance isolation for shared compute clusters

Recommendations

Increasing hardware data prefetching performance using the second-level cache

Techniques to reduce or tolerate large memory latencies are critical for achieving high processor performance. Hardware data prefetching is one of the most heavily studied solutions, but it is essentially applied to first-level caches where it can ...
Read More
SELECTIVE VICTIM CACHING: A METHOD TO IMPROVE THE PERFORMANCE OF DIRECT-MAPPED CACHES
Read More
TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs

Translation Lookaside Buffers (TLBs) are critical to overall system performance. Much past research has addressed uniprocessor TLBs, lowering access times and miss rates. However, as Chip MultiProcessors (CMPs) become ubiquitous, TLB design and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EuroSys '13: Proceedings of the 8th ACM European Conference on Computer Systems
April 2013
401 pages
ISBN:9781450319942
DOI:10.1145/2465351
General Chairs:
Zdenek Hanzálek
Czech Technical University Prague
,
Hermann Härtig
Technische Universität Dresden
,
Program Chairs:
Miguel Castro
Microsoft Research Cambridge
,
M. Frans Kaashoek
MIT
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 April 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
EuroSys '13 Paper Acceptance Rate28of143submissions,20%Overall Acceptance Rate241of1,308submissions,18%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 241
  Total Citations
  View Citations
- 1,258
  Total Downloads
- Downloads (Last 12 months)108
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

CPI²: CPU performance isolation for shared compute clusters

EuroSys '13: Proceedings of the 8th ACM European Conference on Computer Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Increasing hardware data prefetching performance using the second-level cache

SELECTIVE VICTIM CACHING: A METHOD TO IMPROVE THE PERFORMANCE OF DIRECT-MAPPED CACHES

TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

CPI2: CPU performance isolation for shared compute clusters

EuroSys '13: Proceedings of the 8th ACM European Conference on Computer Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Increasing hardware data prefetching performance using the second-level cache

SELECTIVE VICTIM CACHING: A METHOD TO IMPROVE THE PERFORMANCE OF DIRECT-MAPPED CACHES

TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

CPI²: CPU performance isolation for shared compute clusters