WiseThrottling: a new asynchronous task scheduler for mitigating I/O bottleneck in large-scale datacenter servers

Lv, Fang; Liu, Lei; Cui, Hui-min; Wang, Lei; Liu, Ying; Feng, Xiao-bing; Yew, Pen-Chung

doi:10.1007/s11227-015-1427-7

WiseThrottling: a new asynchronous task scheduler for mitigating I/O bottleneck in large-scale datacenter servers

Published: 26 April 2015

Volume 71, pages 3054–3093, (2015)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Fang Lv ORCID: orcid.org/0000-0001-9723-9410¹,
Lei Liu¹,
Hui-min Cui¹,
Lei Wang¹,
Ying Liu¹,
Xiao-bing Feng¹ &
…
Pen-Chung Yew²

298 Accesses
6 Citations
Explore all metrics

Abstract

Datacenter servers are stepping into an era marked by powerful multi-/many-core processors. Severe problems such as I/O contentions in those large-scale platforms pose an unprecedented challenge. Prior studies primarily considered I/O bandwidth as a major performance bottleneck. However, our work reveals that in many cases the fundamental cause of I/O contentions is the inefficiency of OS schedulers. Particularly, the modern system is not aware of this fact and thus suffers from poor I/O performance, especially for datacenter servers. Based on our findings, we propose a new software-based scheduling approach, WiseThrottling, to reduce I/O contention. WiseThrottling performs asynchronous and self-adjustment scheduling for concurrent tasks. We evaluate our approach across a wide range of C/OpenMP/MapReduce workloads on a 64-core server in Dawning Cluster datacenter. The experimental results exhibit that WiseThrottling is effective for reducing the I/O bottleneck and it can improve the overall system performance by up to 207 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Load Balancing Prioritized Tasks via Work-Stealing

Staccato: Cache-Aware Work-Stealing Task Scheduler for Shared-Memory Systems

Understanding the Effect of Task Granularity on Execution Time in Asynchronous Many-Task Runtime Systems

References

Alvarez GA, Chambliss DD, Jadav D et al (2009) Utilizing informed throttling to guarantee quality of service to I/O streams. US Patent, Google Patents
Armbrust M, Fox A, Griffith R et al (2009) Above the clouds: a berkeley view of cloud computing. Technical Report UCB/EECS-2009-28
Barroso L, Holzle U (2007) The case for energy-proportional computing. IEEE Comput 40(12):33–37
Article Google Scholar
Bienia C (2011) Benchmarking Modern Multiprocessors. Princeton University. http://parsec.cs.princeton.edu/publications/bienia11benchmarking.pdf
Boneti C, Cazorla FJ, Gioiosa R, Buyuktosunoglu A, Cher C-Y, Valero M (2008) Software-controlled priority characterization of POWER5 processor. In: Proceedings of the 35th international symposium on computer architecture, June 21–25, pp 415–426
Bordawekar R, Rosario JM, Choudhary AN (1993) Design and evaluation of primitives for parallel I/O. In: Proceedings of SC’93, pp 452–461
Ching A, Choudhary A, Coloma K, Liao WK, Ross R, Gropp W (2003) Noncontiguous access through MPI-IO. In: Proceedings of CCGrid’03. pp 104–111
Das R, Ausavarungnirun R, Mutlu O, Kumar A et al (Feb 2013) Application-to-core mapping policies to reduce memory interference in multi-core systems. In: Proceedings of PACT’13
Dhodapkar A, Smith J (2003) Comparing program phase detection techniques [C]. In: Proceedings of the 36th annual IEEE/ACM international symposium on microarchitecture. IEEE Computer Society, Los Alamitos, pp 217–217
Ding C, Dwarkadas S, Huang MC et al (2006) Program phase detection and exploitation. In: Proceedings of the 20th international conference on parallel and distributed processing. IEEE Computer Society, Los Alamitos, pp 279–279
Durand D, Jain R, Tseytlin D et al (2003) Parallel I/O scheduling using randomized, distributed edge coloring algorithms. J Parallel Distrib Comput 63(6):611–618
Article MATH Google Scholar
Govindan S, Nath AR, Das A et al (2007) Xen and co.: communication-aware CPU scheduling for consolidated xen-based hosting platforms. In: Proceedings of VEE’07, pp 126–136
Hastings A, Choudhary A (Sep 2006) Exploiting shared memory to improve parallel i/o performance. In: EuroPVM/MPI’06, pp 212–221
Jain R, Somalwar K, Werth J et al (1992) Scheduling parallel I/O operations in multiple-bus systems. IEEE Trans Parallel Distrib Syst 16(4):352–362
MATH Google Scholar
Jain R, Somalwar K, Werth J et al (1997) Heuristics for scheduling I/O operations. IEEE Trans Parallel Distrib Syst 8(3):310–320
Article Google Scholar
Jiang Y, Tian K, Shen X (2010) Combining locality analysis with online proactive job co-scheduling in chip multiprocessors. In: Proceedings of the 5th international conference on high performance embedded architectures and compilers. Springer, Berlin, pp 201–215
Kambadur M, Moseley T, Hank R, Kim Martha A (2012) Measuring interference between live datacenter applications. In: IEEE/ACM SC’12, pp 51
Lin Z, Zhou S (1993) Parallelizing I/O intensive applications for a workstation cluster: a case study. SIGARCH Comput Arch News 21(5):15–22
Article Google Scholar
Ling X, Jin H, Ibrahim S et al (2012) Efficient Disk I/O scheduling with QoS guarantee for Xen-based hosting platforms. In: Proceedings of CCGRID ’12, pp 81–89
Lu Y, Chen Y, Amritkar P, Thakur R et al (2012) A new data sieving approach for high performance I/O. In: Proceedings of the 7th international conference on future information technology (FutureTech’12)
Lv F, Cui H-M, Wang L, Liu L, Wu CG, Feng X-B, Yew PC (2014) Dynamic I/O-aware scheduling for batch-mode applications on chip multiprocessor systems of cluster platforms. J Comput Sci Technol 29(1):21–37
Ma S, Sun X-H, Ioan R (2012) I/O throttling and coordination for MapReduce. Technical Report, Illinois Institute of Technology
Mars J, Tang L, Hundt R et al (2011) Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: Proceedings of Micro’11, pp 248–259
Mishra AK, Hellerstein JL, Cirne W, Das CR (2010) Towards characterizing cloud backend workloads: insights from google compute clusters. SIGMETRICS Perform Eval Rev 37(4):34–41
Article Google Scholar
Moreira JE, Franke H, Chan W et al (1999) A gang-scheduling system for ASCI Blue-Pacific. In: HPCN’99, pp 831–840
Ma L, Chamberlain R, Agrawal K (2014) Performance modeling for highly-threaded many-core GPUs. In: Proceedings of IEEE ASAP’14, pp 84–91
Ma L, Agrawal K, Chamberlain RD (2014) A memory access model for highly-threaded many-core architectures. Future Gener Comput Syst 30:202–215
Article Google Scholar
Ongaro D, Cox AL, Rixner S (2018) Scheduling I/O in virtual machine monitors. In: Proceedings of VEE’08, pp 1–10
Park S, Shen K (2012) FIOS: a fair, efficient flash i/o scheduler. In: FAST’12
Ryu KD, Hollingsworth JK, Keleher PJ (2001) Efficient network and I/O throttling for fine-grain cycle stealing. In: Proceedings of SC’01, pp 3–3 (CDROM)
Schulz G (2006) Data center I/O performance issues and impacts a look at I/O performance bottlenecks and their impact on time sensitive applications. White paper
Shakshober DJ (2015) Choosing an I/O Scheduler for Red Hat \(\textregistered \) Enterprise Linux \(\textregistered \) 4 and the 2.6 Kernel. http://www.redhat.com/magazine/008jun05/features/schedulers/
Snavely A, Tullsen D (2000) Symbiotic jobscheduling for a simultaneous multithreaded processor. In: Proc of ASPLOS’00, pp 234–244
Sun N-H, Meng D (2007) Dawning4000A high performance computer. Front Comput Sci China 1(1):20–25
Thakur R, Gropp W, Lusk E (1999) Data sieving and collective I/O in romio. In: Frontiers’99, pp 182–189
Thakur R, Ross R, Lusk E, Gropp W, Latham R (2004) Users guide for ROMIO: a high-performance, portable MPI-IO implementation. Technical Memorandum ANL/MCS-TM-234, Mathematics and Computer Science Division. Argonne National Laboratory (revised)
Zhang Y, Yang A, Sivasubramaniam A et al (2003) Gang scheduling extensions for I/O intensive workloads. In: JSSPP’03, pp 183–207
http://hadoop.apache.org/releases.html. Accessed Apr 2015
http://www.graph500.org. Accessed Apr 2015
http://parsec.cs.princeton.edu/. Accessed Apr 2015

Download references

Author information

Authors and Affiliations

State Key Laboratory of Computer Architecture, ICT, CAS, Beijing, China
Fang Lv, Lei Liu, Hui-min Cui, Lei Wang, Ying Liu & Xiao-bing Feng
Department of Computer Science and Engineering, University of Minnesota at Twin-Cities, Minneapolis, MN, USA
Pen-Chung Yew

Authors

Fang Lv
View author publications
You can also search for this author in PubMed Google Scholar
Lei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hui-min Cui
View author publications
You can also search for this author in PubMed Google Scholar
Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ying Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-bing Feng
View author publications
You can also search for this author in PubMed Google Scholar
Pen-Chung Yew
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fang Lv.

Additional information

This research is supported by the National High Technology Research and Development Program of China under Grants No. 2012AA010902 and 2015AA011505; the NSFC under Grants No. 61202055, 61221062, 61303053, 61432016 and 61402445; and the National Basic Research Program of China under Grant No. 2011CB302504.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lv, F., Liu, L., Cui, Hm. et al. WiseThrottling: a new asynchronous task scheduler for mitigating I/O bottleneck in large-scale datacenter servers. J Supercomput 71, 3054–3093 (2015). https://doi.org/10.1007/s11227-015-1427-7

Download citation

Published: 26 April 2015
Issue Date: August 2015
DOI: https://doi.org/10.1007/s11227-015-1427-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

WiseThrottling: a new asynchronous task scheduler for mitigating I/O bottleneck in large-scale datacenter servers

Abstract

Access this article

Similar content being viewed by others

Load Balancing Prioritized Tasks via Work-Stealing

Staccato: Cache-Aware Work-Stealing Task Scheduler for Shared-Memory Systems

Understanding the Effect of Task Granularity on Execution Time in Asynchronous Many-Task Runtime Systems

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

WiseThrottling: a new asynchronous task scheduler for mitigating I/O bottleneck in large-scale datacenter servers

Abstract

Access this article

Similar content being viewed by others

Load Balancing Prioritized Tasks via Work-Stealing

Staccato: Cache-Aware Work-Stealing Task Scheduler for Shared-Memory Systems

Understanding the Effect of Task Granularity on Execution Time in Asynchronous Many-Task Runtime Systems

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation