skip to main content
10.1145/2465351.2465388acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

CPI2: CPU performance isolation for shared compute clusters

Published:15 April 2013Publication History

ABSTRACT

Performance isolation is a key challenge in cloud computing. Unfortunately, Linux has few defenses against performance interference in shared resources such as processor caches and memory buses, so applications in a cloud can experience unpredictable performance caused by other programs' behavior.

Our solution, CPI2, uses cycles-per-instruction (CPI) data obtained by hardware performance counters to identify problems, select the likely perpetrators, and then optionally throttle them so that the victims can return to their expected behavior. It automatically learns normal and anomalous behaviors by aggregating data from multiple tasks in the same job.

We have rolled out CPI2 to all of Google's shared compute clusters. The paper presents the analysis that lead us to that outcome, including both case studies and a large-scale evaluation of its ability to solve real production issues.

References

  1. Alameldeen, A. R., and Wood, D. A. IPC considered harmful for multiprocessor workloads. IEEE Micro 26, 4 (July 2006), 8--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amazon Elastic Compute Cloud. http://aws.amazon.com/ec2/, 2008.Google ScholarGoogle Scholar
  3. Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., and Harris, E. Reining in the outliers in Map-Reduce clusters using Mantri. In Proc. USENIX Symp. on Operating Systems Design and Implementation (OSDI) (Vancouver, Canada, Nov. 2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Awasthi, M., Sudan, K., Balasubramonian, R., and Carter, J. Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches. In Proc. Int'l Symp. on High Performance Computer Architecture (HPCA) (Raleigh, NC, Feb. 2009).Google ScholarGoogle ScholarCross RefCross Ref
  5. Barker, S. K., and Shenoy, P. Empirical evaluation of latency-sensitive application performance in the cloud. In Proc. 1st ACM Multimedia Systems (MMSys) (Phoenix, AZ, Feb. 2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Barroso, L. A., Dean, J., and Holzle, U. Web search for a planet: the Google cluster architecture. In IEEE Micro (2003), pp. 22--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Blagodurov, S., Zhuravlev, S., Dashti, M., and Fedorova, A. A case for NUMA-aware contention management on multicore systems. In Proc. USENIX Annual Technical Conf. (USENIX ATC) (Portland, OR, June 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chiang, R. C., and Huang, H. H. TRACON: Interference-aware scheduling for data-intensive applications in virtualized environments. In Proc. Int'l Conf. for High Performance Computing, Networking, Storage and Analysis (SC) (Seattle, WA, Nov. 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cho, S., and Jin, L. Managing distributed, shared L2 caches through OS-level page allocation. In Proc. Int'l Symp. on Microarchitecture (Micro) (Orlando, FL, Dec. 2006), pp. 455--468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dai, J., Huang, J., Huang, S., Huang, B., and Liu, Y. HiTune: Dataflow-based performance analysis for big data cloud. In Proc. USENIX Annual Technical Conf. (USENIX ATC) (Portland, OR, June 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dean, J., and Barroso, L. A. The tail at scale. Communications of the ACM 56, 2 (Feb. 2012), 74--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dean, J., and Ghemawat, S. MapReduce: simplified data processing on large clusters. In Proc. USENIX Symp. on Operating Systems Design and Implementation (OSDI) (San Francisco, CA, Dec. 2004), pp. 137--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Eranian, S. perfmon2: the hardware-based performance monitoring interface for Linux. http://perfmon2.sourceforge.net/, 2008.Google ScholarGoogle Scholar
  14. Fedorova, A., Seltzer, M., and Smith, M. D. Improving performance isolation on chip multiprocessors via an operating system scheduler. In Proc. Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT) (Brasov, Romania, Sept. 2007), pp. 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Wikipedia: Generalized extreme value distribution. http://en.wikipedia.org/wiki/Generalized_extreme_value_distribution, 2011.Google ScholarGoogle Scholar
  16. Gong, Z., Gu, X., and Wilkes, J. PRESS: PRedictive Elastic ReSource Scaling for cloud systems. In Proc. 6th IEEE/IFIP Int'l Conf. on Network and Service Management (CNSM 2010) (Niagara Falls, Canada, Oct. 2010).Google ScholarGoogle ScholarCross RefCross Ref
  17. Govindan, S., Liu, J., Kansal, A., and Sivasubramaniam, A. Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines. In Proc. ACM Symp. on Cloud Computing (SoCC) (Cascais, Portugal, Oct. 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Apache Hadoop Project. http://hadoop.apache.org/, 2009.Google ScholarGoogle Scholar
  19. Isard, M., Budiu, M., Yu, Y., Birrell, A., and Fetterly, D. Dryad: distributed data-parallel programs from sequential building blocks. In Proc. European Conf. on Computer Systems (EuroSys) (Lisbon, Portugal, Apr. 2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Iyer, R., Illikkal, R., Tickoo, O., Zhao, L., Apparao, P., and Newell, D. VM3: measuring, modeling and managing VM shared resources. In Computer Networks (Dec. 2009), vol. 53, pp. 2873--2887. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kambadur, M., Moseley, T., Hank, R., and Kim, M. A. Measuring interference between live datacenter applications. In Proc. Int'l Conf. for High Performance Computing, Networking, Storage and Analysis (SC) (Salt Lake City, UT, Nov. 2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Koh, Y., Knauerhase, R., Brett, P., Bowman, M., Wen, Z., and Pu, C. An analysis of performance interference effects in virtual environments. In Proc. IEEE Int'l Symposium on Performance Analysis of Systems and Software (ISPASS) (San Jose, CA, Apr. 2007).Google ScholarGoogle ScholarCross RefCross Ref
  23. Mars, J., Vachharajani, N., Hundt, R., and Soffa, M. L. Contention aware execution: online contention detection and response. In Int'l Symposium on Code Generation and Optimization (CGO) (Toronto, Canada, Apr. 2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Matthews, J. N., Hu, W., Hapuarachchi, M., Deshane, T., Dimatos, D., Hamilton, G., McCabe, M., and Owens, J. Quantifying the performance isolation properties of virtualization systems. In Proc. Workshop on Experimental Computer Science (San Diego, California, June 2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Meisner, D., Sadler, C. M., Barroso, L. A., Weber, W.-D., and Wenisch, T. F. Power management of online data-intensive services. In Proc. Int'l Symposium on Computer Architecture (ISCA) (San Jose, CA, June 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Melnik, S., Gubarev, A., Long, J. J., Romer, G., Shivakumar, S., Tolton, M., and Vassilakis, T. Dremel: Interactive analysis of web-scale datasets. In Proc. of the Int'l Conf. on Very Large Data Bases (VLDB) (Singapore, Sept. 2010), pp. 330--339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Menage, P. Linux control groups. http://www.kernel.org/doc/Documentation/cgroups/cgroups.txt, 2007.Google ScholarGoogle Scholar
  28. Nathuji, R., Kansal, A., and Ghaffarkhah, A. Q-Clouds: managing performance interference effects for QoSaware clouds. In Proc. European Conf. on Computer Systems (EuroSys) (Paris, France, Apr. 2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Olston, C., Reed, B., Srivastava, U., Kumar, R., and Tomkins, A. Pig Latin: a not-so-foreign language for data processing. In Proc. ACM SIGMOD Conference (Vancouver, Canada, June 2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Reiss, C., Tumanov, A., Ganger, G., Katz, R., and Kozuch, M. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proc. ACM Symp. on Cloud Computing (SoCC) (San Jose, CA, Oct. 2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ren, G., Tune, E., Moseley, T., Shi, Y., Rus, S., and Hundt, R. Google-Wide Profiling: a continuous profiling infrastructure for data centers. IEEE Micro, 4 (July 2010), 65--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sanchez, D., and Kozyrakis, C. Vantage: scalable and efficient fine-grain cache partitioning. In Proc. Int'l Symposium on Computer Architecture (ISCA) (San Jose, CA, 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Schurman, E., and Brutlag, J. The user and business impact of server delays, additional bytes, and HTTP chunking in web search. In Proc. Velocity, Web Performance and Operations Conference (2009).Google ScholarGoogle Scholar
  34. Shen, Z., Subbiah, S., Gu, X., and Wilkes, J. Cloud-Scale: Elastic resource scaling for multi-tenant cloud systems. In Proc. ACM Symp. on Cloud Computing (SoCC) (Cascais, Portugal, Oct. 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Suh, G. E., Devadas, S., and Rudolph, L. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proc. Int'l Symp. on High Performance Computer Architecture (HPCA) (Boston, MA, Feb 2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Suh, G. E., Rudolph, L., and Devadas, S. Dynamic partitioning of shared cache memory. The Journal of Supercomputing 28 (2004), 7--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Turner, P., Rao, B., and Rao, N. CPU bandwidth control for CFS. In Proc. Linux Symposium (July 2010), pp. 245--254.Google ScholarGoogle Scholar
  38. West, R., Zaroo, P., Waldspurger, C. A., and Zhang, X. Online cache modeling for commodity multicore processors. Operating Systems Review 44, 4 (Dec. 2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R., and Stoica, I. Improving MapReduce performance in heterogeneous environments. In Proc. USENIX Symp. on Operating Systems Design and Implementation (OSDI) (San Diego, CA, Dec. 2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Zhang, X., Dwarkadas, S., Folkmanis, G., and Shen, K. Processor hardware counter statistics as a first-class system resource. In Proc. Workshop on Hot Topics in Operating Systems (HotOS) (San Diego, CA, May 2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Zhang, X., Dwarkadas, S., and Shen, K. Hardware execution throttling for multi-core resource management. In Proc. USENIX Annual Technical Conf. (USENIX ATC) (Santa Diego, CA, June 2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Zhao, L., Iyer, R., Illikkal, R., Moses, J., Newell, D., and Makineni, S. CacheScouts: Fine-grain monitoring of shared caches in CMP platforms. In Proc. Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT) (Brasov, Romania, Sept. 2007), pp. 339--352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zhuravlev, S., Blagodurov, S., and Fedorova, A. Managing contention for shared resources on multicore processors. In Proc. Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (Pittsburgh, PA, Mar. 2010), pp. 129--142.Google ScholarGoogle Scholar

Index Terms

  1. CPI2: CPU performance isolation for shared compute clusters

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              EuroSys '13: Proceedings of the 8th ACM European Conference on Computer Systems
              April 2013
              401 pages
              ISBN:9781450319942
              DOI:10.1145/2465351

              Copyright © 2013 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 15 April 2013

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              EuroSys '13 Paper Acceptance Rate28of143submissions,20%Overall Acceptance Rate241of1,308submissions,18%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader