ABSTRACT
Performance isolation is a key challenge in cloud computing. Unfortunately, Linux has few defenses against performance interference in shared resources such as processor caches and memory buses, so applications in a cloud can experience unpredictable performance caused by other programs' behavior.
Our solution, CPI2, uses cycles-per-instruction (CPI) data obtained by hardware performance counters to identify problems, select the likely perpetrators, and then optionally throttle them so that the victims can return to their expected behavior. It automatically learns normal and anomalous behaviors by aggregating data from multiple tasks in the same job.
We have rolled out CPI2 to all of Google's shared compute clusters. The paper presents the analysis that lead us to that outcome, including both case studies and a large-scale evaluation of its ability to solve real production issues.
- Alameldeen, A. R., and Wood, D. A. IPC considered harmful for multiprocessor workloads. IEEE Micro 26, 4 (July 2006), 8--17. Google ScholarDigital Library
- Amazon Elastic Compute Cloud. http://aws.amazon.com/ec2/, 2008.Google Scholar
- Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., and Harris, E. Reining in the outliers in Map-Reduce clusters using Mantri. In Proc. USENIX Symp. on Operating Systems Design and Implementation (OSDI) (Vancouver, Canada, Nov. 2010). Google ScholarDigital Library
- Awasthi, M., Sudan, K., Balasubramonian, R., and Carter, J. Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches. In Proc. Int'l Symp. on High Performance Computer Architecture (HPCA) (Raleigh, NC, Feb. 2009).Google ScholarCross Ref
- Barker, S. K., and Shenoy, P. Empirical evaluation of latency-sensitive application performance in the cloud. In Proc. 1st ACM Multimedia Systems (MMSys) (Phoenix, AZ, Feb. 2010). Google ScholarDigital Library
- Barroso, L. A., Dean, J., and Holzle, U. Web search for a planet: the Google cluster architecture. In IEEE Micro (2003), pp. 22--28. Google ScholarDigital Library
- Blagodurov, S., Zhuravlev, S., Dashti, M., and Fedorova, A. A case for NUMA-aware contention management on multicore systems. In Proc. USENIX Annual Technical Conf. (USENIX ATC) (Portland, OR, June 2011). Google ScholarDigital Library
- Chiang, R. C., and Huang, H. H. TRACON: Interference-aware scheduling for data-intensive applications in virtualized environments. In Proc. Int'l Conf. for High Performance Computing, Networking, Storage and Analysis (SC) (Seattle, WA, Nov. 2011). Google ScholarDigital Library
- Cho, S., and Jin, L. Managing distributed, shared L2 caches through OS-level page allocation. In Proc. Int'l Symp. on Microarchitecture (Micro) (Orlando, FL, Dec. 2006), pp. 455--468. Google ScholarDigital Library
- Dai, J., Huang, J., Huang, S., Huang, B., and Liu, Y. HiTune: Dataflow-based performance analysis for big data cloud. In Proc. USENIX Annual Technical Conf. (USENIX ATC) (Portland, OR, June 2011). Google ScholarDigital Library
- Dean, J., and Barroso, L. A. The tail at scale. Communications of the ACM 56, 2 (Feb. 2012), 74--80. Google ScholarDigital Library
- Dean, J., and Ghemawat, S. MapReduce: simplified data processing on large clusters. In Proc. USENIX Symp. on Operating Systems Design and Implementation (OSDI) (San Francisco, CA, Dec. 2004), pp. 137--150. Google ScholarDigital Library
- Eranian, S. perfmon2: the hardware-based performance monitoring interface for Linux. http://perfmon2.sourceforge.net/, 2008.Google Scholar
- Fedorova, A., Seltzer, M., and Smith, M. D. Improving performance isolation on chip multiprocessors via an operating system scheduler. In Proc. Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT) (Brasov, Romania, Sept. 2007), pp. 25--36. Google ScholarDigital Library
- Wikipedia: Generalized extreme value distribution. http://en.wikipedia.org/wiki/Generalized_extreme_value_distribution, 2011.Google Scholar
- Gong, Z., Gu, X., and Wilkes, J. PRESS: PRedictive Elastic ReSource Scaling for cloud systems. In Proc. 6th IEEE/IFIP Int'l Conf. on Network and Service Management (CNSM 2010) (Niagara Falls, Canada, Oct. 2010).Google ScholarCross Ref
- Govindan, S., Liu, J., Kansal, A., and Sivasubramaniam, A. Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines. In Proc. ACM Symp. on Cloud Computing (SoCC) (Cascais, Portugal, Oct. 2011). Google ScholarDigital Library
- Apache Hadoop Project. http://hadoop.apache.org/, 2009.Google Scholar
- Isard, M., Budiu, M., Yu, Y., Birrell, A., and Fetterly, D. Dryad: distributed data-parallel programs from sequential building blocks. In Proc. European Conf. on Computer Systems (EuroSys) (Lisbon, Portugal, Apr. 2007). Google ScholarDigital Library
- Iyer, R., Illikkal, R., Tickoo, O., Zhao, L., Apparao, P., and Newell, D. VM3: measuring, modeling and managing VM shared resources. In Computer Networks (Dec. 2009), vol. 53, pp. 2873--2887. Google ScholarDigital Library
- Kambadur, M., Moseley, T., Hank, R., and Kim, M. A. Measuring interference between live datacenter applications. In Proc. Int'l Conf. for High Performance Computing, Networking, Storage and Analysis (SC) (Salt Lake City, UT, Nov. 2012). Google ScholarDigital Library
- Koh, Y., Knauerhase, R., Brett, P., Bowman, M., Wen, Z., and Pu, C. An analysis of performance interference effects in virtual environments. In Proc. IEEE Int'l Symposium on Performance Analysis of Systems and Software (ISPASS) (San Jose, CA, Apr. 2007).Google ScholarCross Ref
- Mars, J., Vachharajani, N., Hundt, R., and Soffa, M. L. Contention aware execution: online contention detection and response. In Int'l Symposium on Code Generation and Optimization (CGO) (Toronto, Canada, Apr. 2010). Google ScholarDigital Library
- Matthews, J. N., Hu, W., Hapuarachchi, M., Deshane, T., Dimatos, D., Hamilton, G., McCabe, M., and Owens, J. Quantifying the performance isolation properties of virtualization systems. In Proc. Workshop on Experimental Computer Science (San Diego, California, June 2007). Google ScholarDigital Library
- Meisner, D., Sadler, C. M., Barroso, L. A., Weber, W.-D., and Wenisch, T. F. Power management of online data-intensive services. In Proc. Int'l Symposium on Computer Architecture (ISCA) (San Jose, CA, June 2011). Google ScholarDigital Library
- Melnik, S., Gubarev, A., Long, J. J., Romer, G., Shivakumar, S., Tolton, M., and Vassilakis, T. Dremel: Interactive analysis of web-scale datasets. In Proc. of the Int'l Conf. on Very Large Data Bases (VLDB) (Singapore, Sept. 2010), pp. 330--339. Google ScholarDigital Library
- Menage, P. Linux control groups. http://www.kernel.org/doc/Documentation/cgroups/cgroups.txt, 2007.Google Scholar
- Nathuji, R., Kansal, A., and Ghaffarkhah, A. Q-Clouds: managing performance interference effects for QoSaware clouds. In Proc. European Conf. on Computer Systems (EuroSys) (Paris, France, Apr. 2010). Google ScholarDigital Library
- Olston, C., Reed, B., Srivastava, U., Kumar, R., and Tomkins, A. Pig Latin: a not-so-foreign language for data processing. In Proc. ACM SIGMOD Conference (Vancouver, Canada, June 2008). Google ScholarDigital Library
- Reiss, C., Tumanov, A., Ganger, G., Katz, R., and Kozuch, M. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proc. ACM Symp. on Cloud Computing (SoCC) (San Jose, CA, Oct. 2012). Google ScholarDigital Library
- Ren, G., Tune, E., Moseley, T., Shi, Y., Rus, S., and Hundt, R. Google-Wide Profiling: a continuous profiling infrastructure for data centers. IEEE Micro, 4 (July 2010), 65--79. Google ScholarDigital Library
- Sanchez, D., and Kozyrakis, C. Vantage: scalable and efficient fine-grain cache partitioning. In Proc. Int'l Symposium on Computer Architecture (ISCA) (San Jose, CA, 2011). Google ScholarDigital Library
- Schurman, E., and Brutlag, J. The user and business impact of server delays, additional bytes, and HTTP chunking in web search. In Proc. Velocity, Web Performance and Operations Conference (2009).Google Scholar
- Shen, Z., Subbiah, S., Gu, X., and Wilkes, J. Cloud-Scale: Elastic resource scaling for multi-tenant cloud systems. In Proc. ACM Symp. on Cloud Computing (SoCC) (Cascais, Portugal, Oct. 2011). Google ScholarDigital Library
- Suh, G. E., Devadas, S., and Rudolph, L. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proc. Int'l Symp. on High Performance Computer Architecture (HPCA) (Boston, MA, Feb 2002). Google ScholarDigital Library
- Suh, G. E., Rudolph, L., and Devadas, S. Dynamic partitioning of shared cache memory. The Journal of Supercomputing 28 (2004), 7--26. Google ScholarDigital Library
- Turner, P., Rao, B., and Rao, N. CPU bandwidth control for CFS. In Proc. Linux Symposium (July 2010), pp. 245--254.Google Scholar
- West, R., Zaroo, P., Waldspurger, C. A., and Zhang, X. Online cache modeling for commodity multicore processors. Operating Systems Review 44, 4 (Dec. 2010). Google ScholarDigital Library
- Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R., and Stoica, I. Improving MapReduce performance in heterogeneous environments. In Proc. USENIX Symp. on Operating Systems Design and Implementation (OSDI) (San Diego, CA, Dec. 2008). Google ScholarDigital Library
- Zhang, X., Dwarkadas, S., Folkmanis, G., and Shen, K. Processor hardware counter statistics as a first-class system resource. In Proc. Workshop on Hot Topics in Operating Systems (HotOS) (San Diego, CA, May 2007). Google ScholarDigital Library
- Zhang, X., Dwarkadas, S., and Shen, K. Hardware execution throttling for multi-core resource management. In Proc. USENIX Annual Technical Conf. (USENIX ATC) (Santa Diego, CA, June 2009). Google ScholarDigital Library
- Zhao, L., Iyer, R., Illikkal, R., Moses, J., Newell, D., and Makineni, S. CacheScouts: Fine-grain monitoring of shared caches in CMP platforms. In Proc. Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT) (Brasov, Romania, Sept. 2007), pp. 339--352. Google ScholarDigital Library
- Zhuravlev, S., Blagodurov, S., and Fedorova, A. Managing contention for shared resources on multicore processors. In Proc. Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (Pittsburgh, PA, Mar. 2010), pp. 129--142.Google Scholar
Index Terms
- CPI2: CPU performance isolation for shared compute clusters
Recommendations
Increasing hardware data prefetching performance using the second-level cache
Techniques to reduce or tolerate large memory latencies are critical for achieving high processor performance. Hardware data prefetching is one of the most heavily studied solutions, but it is essentially applied to first-level caches where it can ...
TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs
Translation Lookaside Buffers (TLBs) are critical to overall system performance. Much past research has addressed uniprocessor TLBs, lowering access times and miss rates. However, as Chip MultiProcessors (CMPs) become ubiquitous, TLB design and ...
Comments