Characterizing and Optimizing the Performance of Multithreaded Programs Under Interference

Authors:
Yong Zhao

The University of Texas at Arlington, Arlington, TX, USA

The University of Texas at Arlington, Arlington, TX, USA
View Profile

,
Jia Rao

The University of Texas at Arlington, Arlington, TX, USA

The University of Texas at Arlington, Arlington, TX, USA
View Profile

,
Qing Yi

The University of Colorado Colorado Springs, Colorado Springs, CO, USA

The University of Colorado Colorado Springs, Colorado Springs, CO, USA
View Profile

PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationSeptember 2016Pages 287–297https://doi.org/10.1145/2967938.2967939

Published:11 September 2016Publication History

PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation

Pages 287–297

ABSTRACT

As virtualization becomes ubiquitous in datacenters, there is a growing interest in characterizing application performance in multi-tenant environments to improve datacenter resource management. The performance of parallel programs is notoriously difficult to reason about in virtualized environments. Although performance degradations caused by virtualization and interferences have been extensively studied, there still lacks a comprehensive understanding why parallel programs have unpredictable slowdowns when co-located with different types of workloads.

This paper presents a systematic and quantitative study of multithreaded performance under interference. We design synthetic workloads to emulate different types of interference and study the behavior of parallel programs under such interferences. We find that unpredictable performance is the result of complex interplays between the design of the program, the memory hierarchy of the host system, and the CPU scheduling at the hypervisor. To understand the intricate relationships between multiple factors, we decompose parallel runtime into compute, synchronization and steal time, and use the runtime breakdown to measure program progress and identify execution inefficiency under interference. Based on these findings, we develop an online approach to predicting performance slowdown without requiring parallel programs to be completed, and devise two scheduling optimizations at the hypervisor to reduce slowdowns. Experimental results with Xen and representative parallel workloads show that the online performance prediction achieves on average less than 4.5% error and the optimizations reduce runtime slowdown by as much as 38% compared to stock Xen.

References

J. Ahn, C. H. Park, and J. Huh. Micro-sliced virtual processors to hide the effect of discontinuous cpu availability for consolidated systems. In Proc of MICRO, 2014. Google ScholarDigital Library
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The nas parallel benchmarks summary and preliminary results. In Proc. of SC, 1991. Google ScholarDigital Library
K. based virtual machine. http://www.linux-kvm.org/.Google Scholar
S. Blagodurov, S. Zhuravlev, and A. Fedorova. Contention-aware scheduling on multicore systems. ACM Trans. Comput. Syst., 28(4), 2010. Google ScholarDigital Library
K. Chakraborty, P. M. Wells, and G. S. Sohi. Supporting overcommitted virtual machines through hardware spin detection. IEEE Trans. Parallel Distrib. Syst., 23(2), 2012. Google ScholarDigital Library
D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proc. of HPCA, 2005. Google ScholarDigital Library
T. Creech, A. Kotha, and R. Barua. Efficient multiprogramming for multicores with scaf. In Proc. of MICRO-46, 2013. Google ScholarDigital Library
C. Delimitrou and C. Kozyrakis. Paragon: Qos-aware scheduling for heterogeneous datacenters. In Proc. of ASPLOS, 2013. Google ScholarDigital Library
C. Delimitrou and C. Kozyrakis. Quasar: Resource-efficient and qos-aware cluster management. In Proc. of ASPLOS, 2014. Google ScholarDigital Library
T. Dey, W. Wang, J. W. Davidson, and M. L. Soffa. Characterizing multi-threaded applications based on shared-resource contention. In Proc. of ISPASS, 2011. Google ScholarDigital Library
X. Ding, P. B. Gibbons, M. A. Kozuch, and J. Shan. Gleaner: Mitigating the blocked-waiter wakeup problem for virtualized multicore applications. In Proc. of USENIX ATC, 2014. Google ScholarDigital Library
F. Guo and Y. Solihin. A framework for providing quality of service in chip multi-processors. In Proc. of MICRO, 2007. Google ScholarDigital Library
T. Harris, M. Mass, and V. J. Marathe. Callisto: Co-scheduling parallel runtime systems. In Proc. of EuroSys, 2014. Google ScholarDigital Library
R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, and S. Reinhardt. Qos policy and architecture for cache/memory in cmp platforms. In Proc. of SIGMETRICS, 2007. Google ScholarDigital Library
M. Kambadur, T. Moseley, R. Hank, and M. A. Kim. Measuring interference between live datacenter applications. In Proc. of SC, 2012. Google ScholarDigital Library
H. Kim, S. Kim, J. Jeong, J. Lee, and S. Maeng. Demand-based coordinated scheduling for smp vms. In Proc. of ASPLOS, 2013. Google ScholarDigital Library
F. Liu and Y. Solihin. Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors. In Proc. of SIGMETRICS, 2011. Google ScholarDigital Library
X. Liu and B. Wu. Scaanalyzer: A tool to identify memory scalability bottlenecks in parallel programs. In Proc. of SC, 2015. Google ScholarDigital Library
J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proc. of MICRO, 2011. Google ScholarDigital Library
P. E. McKenney. Differential profiling. Software - Practice and Experience, 29(3), 1999. Google ScholarDigital Library
O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In Proc. of MICRO, 2007. Google ScholarDigital Library
D. Novaković, N. Vasić, S. Novaković, D. Kostić, and R. Bianchini. Deepdive: Transparently identifying and managing performance interference in virtualized environments. In Proc. of USENIX ATC, 2013. Google ScholarDigital Library
J. Ousterhout. Scheduling techniques for concurrent systems.Google Scholar
A. Raman, A. Zaks, J. W. Lee, and D. I. August. Parcae: A system for flexible parallel execution. In Proc. of PLDI, 2012. Google ScholarDigital Library
J. Rao and X. Zhou. Towards fair and efficient smp virtual machine scheduling. In Proc. of PPoPP, 2014. Google ScholarDigital Library
X. Song, H. Chen, and B. Zang. Characterizing the performance and scalability of many-core applications on virtualized platforms. Technical Report FDUPPITR-2010-002, Parallel Processing Institute, Fudan University, 2010.Google Scholar
S. Sridharan, G. Gupta, and G. S. Sohi. Holistic run-time parallelism management for time and energy efficiency. In Proc. of ICS, 2013. Google ScholarDigital Library
S. Sridharan, G. Gupta, and G. S. Sohi. Adaptive, efficient, parallel execution of parallel programs. In Proc. of PLDI, 2014. Google ScholarDigital Library
O. Sukwong and H. S. Kim. Is co-scheduling too expensive for smp vms? In Proc. of EuroSys, 2011. Google ScholarDigital Library
L. Tang, J. Mars, W. Wang, T. Dey, and M. L. Soffa. Reqos: Reactive static/dynamic compilation for qos in warehouse scale computers. In Proc. of ASPLOS, 2013. Google ScholarDigital Library
The PARSEC benchmarks. http://parsec.cs.princeton.edu/.Google Scholar
V. Uhlig, J. LeVasseur, E. Skoglund, and U. Dannowski. Towards scalable multiprocessor virtual machines. In Proc. of VM, 2004. Google ScholarDigital Library
C. Weng, Q. Liu, L. Yu, and M. Li. Dynamic adaptive scheduling for virtual machines. In Proc. of HPDC, 2011. Google ScholarDigital Library
C. Xu, S. Gamage, P. N. Rao, A. Kangarlou, R. R. Kompella, and D. Xu. vslicer: Latency-aware virtual machine scheduling via differentiated-frequency cpu slicing. In Proc. of HPDC, 2012. Google ScholarDigital Library
H. Yang, A. Breslow, J. Mars, and L. Tang. Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers. In Proc. of ISCA, 2013. Google ScholarDigital Library
X. Zhang, E. Tune, R. Hagmann, R. Jnagal, V. Gokhale, and J. Wilkes. Cpi2: Cpu performance isolation for shared compute clusters. In Proc. of EuroSys, 2013. Google ScholarDigital Library
F. Zhou, M. Goel, P. Desnoyers, and R. Sundaram. Scheduler vulnerabilities and coordinated attacks in cloud computing. J. Comput. Secur., 2013. Google ScholarDigital Library

Index Terms

Characterizing and Optimizing the Performance of Multithreaded Programs Under Interference
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Process synchronization
        Scheduling
      2. Software infrastructure
        Virtual machines

Recommendations

Towards fair and efficient SMP virtual machine scheduling
PPoPP '14

As multicore processors become prevalent in modern computer systems, there is a growing need for increasing hardware utilization and exploiting the parallelism of such platforms. With virtualization technology, hardware utilization is improved by ...
Read More
Towards fair and efficient SMP virtual machine scheduling
PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

As multicore processors become prevalent in modern computer systems, there is a growing need for increasing hardware utilization and exploiting the parallelism of such platforms. With virtualization technology, hardware utilization is improved by ...
Read More
A lock-aware virtual machine scheduling scheme for synchronization performance

In virtualized environments, multiprocessor virtual machines encounter synchronization problems such as lock holder preemption (LHP) and lock waiter preemption (LWP). When the issue happens, a virtual CPU (VCPU) waiting for such locks spins for an ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation
September 2016
474 pages
ISBN:9781450341219
DOI:10.1145/2967938
General Chairs:
Ayal Zaks
Intel, Israel
,
Bilha Mendelson
Optitura, Israel
,
Program Chairs:
Lawrence Rauchwerger
Texas A&M University, USA
,
Wen-mei W. Hwu
University of Illinois at Urbana-Champaign, USA
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 September 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
multicore systems
parallel performance modeling and optimization
virtual machine scheduling
Qualifiers
- research-article
Conference

Acceptance Rates
PACT '16 Paper Acceptance Rate31of119submissions,26%Overall Acceptance Rate121of471submissions,26%
More
Upcoming Conference
PACT '24

Sponsor:

sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Southern California , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 357
  Total Downloads
- Downloads (Last 12 months)30
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.