research-article

Towards Exploiting CPU Elasticity via Efficient Thread Oversubscription

Authors:

Hang Huang,

Jia Rao,

Song Wu,

Hai Jin,

Hong Jiang,

Hao Che,

Xiaofeng WuAuthors Info & Claims

HPDC '21: Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing

Pages 215 - 226

https://doi.org/10.1145/3431379.3460641

Published: 21 June 2021 Publication History

Get Access

Abstract

Elasticity is an essential feature of cloud computing, which allows users to dynamically add or remove resources in response to workload changes. However, building applications that truly exploit elasticity is non-trivial. Traditional applications need to be modified to efficiently utilize variable resources. This paper explores thread oversubscription, i.e., provisioning more threads than the available cores, to exploit CPU elasticity in the cloud. While maintaining sufficient concurrency allows applications to utilize additional CPUs when more are made available, it is widely believed that thread oversubscription introduces prohibitive overheads due to excessive context switches, loss of locality, and contention on shared resources.

In this paper, we conduct a comprehensive study of the overhead of thread oversubscription. We find that 1) the direct cost of context switching (i.e., 1-2 μs on modern processors) does not cause noticeable performance slow down to most applications; 2) oversubscription can be both constructive and destructive to the performance of CPU caches and TLB. We identify two previously under-studied issues that are responsible for drastic slowdowns in many applications under oversubscription. First, the existing thread sleep and wakeup process in the OS kernel is inefficient in handling oversubscribed threads. Second, pervasive busy-waiting operations in program code can waste CPU and starve critical threads. To this end, we devise two OS mechanisms, virtual blocking and busy-waiting detection, to enable efficient thread oversubscription without requiring program code changes. Experimental results show that our approaches can achieve an efficiency close to that in under-subscribed scenarios while preserving the capability to expand to many more CPUs. The performance gain is up to 77% for blocking- and 19x for busy-waiting-based applications compared to the vanilla Linux.

References

[1]

Jelena Antić, Georgios Chatzopoulos, Rachid Guerraoui, and Vasileios Trigonakis. 2016. Locking made easy. In Proceedings of the International Middleware Conference (Middleware). 1--14.

Abstract

References

Cited By

Index Terms

Recommendations

Transparently bridging semantic gap in CPU management for virtualized environments

AWS EC2 vs. Joyent's Triton: A Comparison of Docker Container-hosting Platforms

A Virtual CPU Scheduling Model for I/O Performance in Paravirtualized Environments

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations