ABSTRACT
As virtualization becomes ubiquitous in datacenters, there is a growing interest in characterizing application performance in multi-tenant environments to improve datacenter resource management. The performance of parallel programs is notoriously difficult to reason about in virtualized environments. Although performance degradations caused by virtualization and interferences have been extensively studied, there still lacks a comprehensive understanding why parallel programs have unpredictable slowdowns when co-located with different types of workloads.
This paper presents a systematic and quantitative study of multithreaded performance under interference. We design synthetic workloads to emulate different types of interference and study the behavior of parallel programs under such interferences. We find that unpredictable performance is the result of complex interplays between the design of the program, the memory hierarchy of the host system, and the CPU scheduling at the hypervisor. To understand the intricate relationships between multiple factors, we decompose parallel runtime into compute, synchronization and steal time, and use the runtime breakdown to measure program progress and identify execution inefficiency under interference. Based on these findings, we develop an online approach to predicting performance slowdown without requiring parallel programs to be completed, and devise two scheduling optimizations at the hypervisor to reduce slowdowns. Experimental results with Xen and representative parallel workloads show that the online performance prediction achieves on average less than 4.5% error and the optimizations reduce runtime slowdown by as much as 38% compared to stock Xen.
- J. Ahn, C. H. Park, and J. Huh. Micro-sliced virtual processors to hide the effect of discontinuous cpu availability for consolidated systems. In Proc of MICRO, 2014. Google ScholarDigital Library
- D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The nas parallel benchmarks summary and preliminary results. In Proc. of SC, 1991. Google ScholarDigital Library
- K. based virtual machine. http://www.linux-kvm.org/.Google Scholar
- S. Blagodurov, S. Zhuravlev, and A. Fedorova. Contention-aware scheduling on multicore systems. ACM Trans. Comput. Syst., 28(4), 2010. Google ScholarDigital Library
- K. Chakraborty, P. M. Wells, and G. S. Sohi. Supporting overcommitted virtual machines through hardware spin detection. IEEE Trans. Parallel Distrib. Syst., 23(2), 2012. Google ScholarDigital Library
- D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proc. of HPCA, 2005. Google ScholarDigital Library
- T. Creech, A. Kotha, and R. Barua. Efficient multiprogramming for multicores with scaf. In Proc. of MICRO-46, 2013. Google ScholarDigital Library
- C. Delimitrou and C. Kozyrakis. Paragon: Qos-aware scheduling for heterogeneous datacenters. In Proc. of ASPLOS, 2013. Google ScholarDigital Library
- C. Delimitrou and C. Kozyrakis. Quasar: Resource-efficient and qos-aware cluster management. In Proc. of ASPLOS, 2014. Google ScholarDigital Library
- T. Dey, W. Wang, J. W. Davidson, and M. L. Soffa. Characterizing multi-threaded applications based on shared-resource contention. In Proc. of ISPASS, 2011. Google ScholarDigital Library
- X. Ding, P. B. Gibbons, M. A. Kozuch, and J. Shan. Gleaner: Mitigating the blocked-waiter wakeup problem for virtualized multicore applications. In Proc. of USENIX ATC, 2014. Google ScholarDigital Library
- F. Guo and Y. Solihin. A framework for providing quality of service in chip multi-processors. In Proc. of MICRO, 2007. Google ScholarDigital Library
- T. Harris, M. Mass, and V. J. Marathe. Callisto: Co-scheduling parallel runtime systems. In Proc. of EuroSys, 2014. Google ScholarDigital Library
- R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, and S. Reinhardt. Qos policy and architecture for cache/memory in cmp platforms. In Proc. of SIGMETRICS, 2007. Google ScholarDigital Library
- M. Kambadur, T. Moseley, R. Hank, and M. A. Kim. Measuring interference between live datacenter applications. In Proc. of SC, 2012. Google ScholarDigital Library
- H. Kim, S. Kim, J. Jeong, J. Lee, and S. Maeng. Demand-based coordinated scheduling for smp vms. In Proc. of ASPLOS, 2013. Google ScholarDigital Library
- F. Liu and Y. Solihin. Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors. In Proc. of SIGMETRICS, 2011. Google ScholarDigital Library
- X. Liu and B. Wu. Scaanalyzer: A tool to identify memory scalability bottlenecks in parallel programs. In Proc. of SC, 2015. Google ScholarDigital Library
- J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proc. of MICRO, 2011. Google ScholarDigital Library
- P. E. McKenney. Differential profiling. Software - Practice and Experience, 29(3), 1999. Google ScholarDigital Library
- O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In Proc. of MICRO, 2007. Google ScholarDigital Library
- D. Novaković, N. Vasić, S. Novaković, D. Kostić, and R. Bianchini. Deepdive: Transparently identifying and managing performance interference in virtualized environments. In Proc. of USENIX ATC, 2013. Google ScholarDigital Library
- J. Ousterhout. Scheduling techniques for concurrent systems.Google Scholar
- A. Raman, A. Zaks, J. W. Lee, and D. I. August. Parcae: A system for flexible parallel execution. In Proc. of PLDI, 2012. Google ScholarDigital Library
- J. Rao and X. Zhou. Towards fair and efficient smp virtual machine scheduling. In Proc. of PPoPP, 2014. Google ScholarDigital Library
- X. Song, H. Chen, and B. Zang. Characterizing the performance and scalability of many-core applications on virtualized platforms. Technical Report FDUPPITR-2010-002, Parallel Processing Institute, Fudan University, 2010.Google Scholar
- S. Sridharan, G. Gupta, and G. S. Sohi. Holistic run-time parallelism management for time and energy efficiency. In Proc. of ICS, 2013. Google ScholarDigital Library
- S. Sridharan, G. Gupta, and G. S. Sohi. Adaptive, efficient, parallel execution of parallel programs. In Proc. of PLDI, 2014. Google ScholarDigital Library
- O. Sukwong and H. S. Kim. Is co-scheduling too expensive for smp vms? In Proc. of EuroSys, 2011. Google ScholarDigital Library
- L. Tang, J. Mars, W. Wang, T. Dey, and M. L. Soffa. Reqos: Reactive static/dynamic compilation for qos in warehouse scale computers. In Proc. of ASPLOS, 2013. Google ScholarDigital Library
- The PARSEC benchmarks. http://parsec.cs.princeton.edu/.Google Scholar
- V. Uhlig, J. LeVasseur, E. Skoglund, and U. Dannowski. Towards scalable multiprocessor virtual machines. In Proc. of VM, 2004. Google ScholarDigital Library
- C. Weng, Q. Liu, L. Yu, and M. Li. Dynamic adaptive scheduling for virtual machines. In Proc. of HPDC, 2011. Google ScholarDigital Library
- C. Xu, S. Gamage, P. N. Rao, A. Kangarlou, R. R. Kompella, and D. Xu. vslicer: Latency-aware virtual machine scheduling via differentiated-frequency cpu slicing. In Proc. of HPDC, 2012. Google ScholarDigital Library
- H. Yang, A. Breslow, J. Mars, and L. Tang. Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers. In Proc. of ISCA, 2013. Google ScholarDigital Library
- X. Zhang, E. Tune, R. Hagmann, R. Jnagal, V. Gokhale, and J. Wilkes. Cpi2: Cpu performance isolation for shared compute clusters. In Proc. of EuroSys, 2013. Google ScholarDigital Library
- F. Zhou, M. Goel, P. Desnoyers, and R. Sundaram. Scheduler vulnerabilities and coordinated attacks in cloud computing. J. Comput. Secur., 2013. Google ScholarDigital Library
Index Terms
- Characterizing and Optimizing the Performance of Multithreaded Programs Under Interference
Recommendations
Towards fair and efficient SMP virtual machine scheduling
PPoPP '14As multicore processors become prevalent in modern computer systems, there is a growing need for increasing hardware utilization and exploiting the parallelism of such platforms. With virtualization technology, hardware utilization is improved by ...
Towards fair and efficient SMP virtual machine scheduling
PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programmingAs multicore processors become prevalent in modern computer systems, there is a growing need for increasing hardware utilization and exploiting the parallelism of such platforms. With virtualization technology, hardware utilization is improved by ...
A lock-aware virtual machine scheduling scheme for synchronization performance
In virtualized environments, multiprocessor virtual machines encounter synchronization problems such as lock holder preemption (LHP) and lock waiter preemption (LWP). When the issue happens, a virtual CPU (VCPU) waiting for such locks spins for an ...
Comments