Dynamic I/O-Aware Scheduling for Batch-Mode Applications on Chip Multiprocessor Systems of Cluster Platforms

Lv, Fang; Cui, Hui-Min; Wang, Lei; Liu, Lei; Wu, Cheng-Gang; Feng, Xiao-Bing; Yew, Pen-Chung

doi:10.1007/s11390-013-1409-2

Dynamic I/O-Aware Scheduling for Batch-Mode Applications on Chip Multiprocessor Systems of Cluster Platforms

Original Paper
Published: 10 January 2014

Volume 29, pages 21–37, (2014)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Fang Lv^1,2,
Hui-Min Cui¹,
Lei Wang¹,
Lei Liu^1,2,
Cheng-Gang Wu¹,
Xiao-Bing Feng¹ &
…
Pen-Chung Yew^3,4

187 Accesses
6 Citations
Explore all metrics

Abstract

Efficiency of batch processing is becoming increasingly important for many modern commercial service centers, e.g., clusters and cloud computing datacenters. However, periodical resource contentions have become the major performance obstacles for concurrently running applications on mainstream CMP servers. I/O contention is such a kind of obstacle, which may impede both the co-running performance of batch jobs and the system throughput seriously. In this paper, a dynamic I/O-aware scheduling algorithm is proposed to lower the impacts of I/O contention and to enhance the co-running performance in batch processing. We set up our environment on an 8-socket, 64-core server in Dawning Linux Cluster. Fifteen workloads ranging from 8 jobs to 256 jobs are evaluated. Our experimental results show significant improvements on the throughputs of the workloads, which range from 7% to 431%. Meanwhile, noticeable improvements on the slowdown of workloads and the average runtime for each job can be achieved. These results show that a well-tuned dynamic I/O-aware scheduler is beneficial for batch-mode services. It can also enhance the resource utilization via throughput improvement on modern service platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Resource Scheduling Algorithm for Heterogeneous High-Performance Clusters

Study of Scheduling Approaches for Batch Processing in Big Data Cluster

Enhancing System Utilization by Dynamic Reallocation of Computing Nodes

References

Armbrust M, Fox A, Griffith R et al. Above the clouds: A Berkeley view of cloud computing. Technical Report, UCB/EECS-2009-28, University of California, Berkeley, February 10, 2009.
Mars J, Tang L, Hundt R et al. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proc. the 44th Int. Symp. Microarchitecture, December 2011, pp.248-259.
Mishra A K, Hellerstein J L, Cirne W et al. Towards characterizing cloud backend workloads: Insights from Google compute clusters. ACM SIGMETRICS Performance Evaluation Review, 2010, 37(4): 34–41.
Article Google Scholar
Tang L, Mars J, Soffa M L. Compiling for niceness: Mitigating contention for QoS in warehouse scale computers. In Proc. the 10th Int. Symp. Code Generation and Optimization, March 31-April 4, 2012, pp.1-12.
Barroso L, Holzle U. The case for energy-proportional computing. IEEE Trans. Computer, 2007, 40(12): 33–37.
Google Scholar
Höelzle U, Barroso L A. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan and Claypool Publishers, 2009.
Snavely A, Tullsen D. Symbiotic jobscheduling for a simultaneous multithreaded processor. In Proc. the 9th ASPLOS, November 2000, pp.234-244.
Xu D, Wu C G, Yew P C. On mitigating memory bandwidth contention through bandwidth-aware scheduling. In Proc. the 19th Int. Conf. Parallel Architectures and Compilation Techniques, September 2010, pp.237-248.
Zhuravlev S, Blagodurov S, Fedorova A. Addressing shared resource contention in multicore processors via scheduling. In Proc. the 15th Int. Conf. Architectural Support for Programming Languages and Operating Systems, March 2010, pp.129-142.
Gao L, Nguyen Q H, Li L et al. Thread-sensitive modulo scheduling for multicore processors. In Proc. the 37th Int. Conf. Parallel Processing, September 2008, pp.132-140.
Gao L, Xue J L, Ngai T F. Loop recreation for thread-level speculation on multicore processors. Software: Practice and Engineering (SPE), 2010, 40(1): 45–72.
Google Scholar
Gao L, Li L, Xue J L et al. Loop recreation for thread-level speculation. In Proc. the 13th Int. Conf. Parallel and Distributed Systems, Dec, 2007, pp.1-10.
Gao L, Li L, Xue J L et al. Exploiting speculative TLP in recursive programs by dynamic thread prediction. In Proc. the 18th Int. Conf. Compiler Construction, Mar, 2009, pp.78-93.
Ghoshal D, Canon R S, Ramakrishnan L. I/O performance of virtualized cloud environments. In Proc. the 2nd Int. Workshop on Data Intensive Computing in the Clouds, November 2011, pp.71-80.
Jain R, Somalwar K, Werth J et al. Scheduling parallel I/O operations in multiple bus systems. Journal of Parallel and Distributed Systems, 1992, 16(4): 352–362.
Article MATH Google Scholar
Jain R, Somalwar K, Werth J et al. Heuristics for scheduling I/O operations. IEEE Trans. Parallel and Distributed Systems, 1997, 8(3): 310–320.
Article Google Scholar
Thakur R, Gropp W, Lusk E. Data sieving and collective I/O in ROMIO. In Proc. the 7th Symp. Frontiers of Massively Parallel Computation, February 1999, pp.182-189.
Acharya A, Uysal M, Bennett R et al. Tuning the performance of I/O-intensive parallel applications. In Proc. the 4th Workshop on I/O in Parallel and Distributed Systems, May 1996, pp.15-27.
Lin Z, Zhou S. Parallelizing I/O intensive applications for a workstation cluster: A case study. ACM SIGARCH Computer Architecture News, 1993, 21(5): 15–22.
Article Google Scholar
Hastings A, Choudhary A. Exploiting shared memory to improve parallel I/O performance. In Proc. the 13th European PVM/MPI User’s Group Conf. Recent Advances in Parallel Virtual Machine and Message Passing Interface, September 2006, pp.212-221.
Lameter C. Local and remote memory: Memory in a Linux/NUMA system. In Linux Symp., July 2006, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.138.7986& rep=rep1&type=pdf, Nov. 2013.
Shakshober D J. Choosing an I/O scheduler for Red Hat^r Enterprise Linux^r 4 and the 2.6 kernel. http://www.redhat.com/magazine/008jun05/features/schedulers/, Dec. 2012.
Jiang Y L, Shen X P, Chen J et al. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In Proc. the 17th Int. Conf. Parallel Architectures and Compilation Techniques, October 2008, pp.220-229.
Zhuravlev S, Saez J C, Blagodurov S et al. Survey of scheduling techniques for addressing shared resources in multicore processors. ACM Computing Surveys, 2011, 45(1): Article No.4.
Majo Z, Gross T R. Memory management in NUMA multicore systems: Trapped between cache contention and interconnect overhead. In Proc. the Int. Symp. Memory Management, June 2011, pp.11-20.
Mi W, Feng X B, Xue J L, Jia Y C. Software-hardware cooperative DRAM bank partitioning for chip multiprocessors. In Proc. the 7th Int. Conf. Network and Parallel Computing, September 2010, pp.329-343.
Mi W, Feng X B, Jia Y C et al. PARBLO: Page-allocation-based DRAM row buffer locality optimization. Journal of Computer Science and Technology, 2009, 24(6): 1086–1097.
Article Google Scholar
Zhao J C, Cui H M, Xue J L et al. An empirical model for predicting cross-core performance interference on multicore processors. In Proc. the 22nd Int. Conf. Parallel Architectures and Compilation Techniques, September 2013, pp.201-212.
Bao B, Ding C. Defensive loop tiling for shared cache. In Proc. the IEEE/ACM Int. Symp. Code Generation and Optimization, February 2013, pp.1-11.
Cui H M, Wang L, Xue J L et al. Automatic library generation for BLAS3 on GPUs. In Proc. the 25th IEEE Int. Symp. J. Comput. Sci. & Technol., Jan. 2014, Vol.29, No.1 Parallel and Distributed Processing, May 2011, pp.255-265.
Cui H M, Xue J L, Wang L et al. Extendable pattern-oriented optimization directives. In Proc. the 9th Annual IEEE/ACM Int. Symp. Code Generation and Optimization, April 2011, pp.107-118.
Cui H M, Yi Q, Xue J L, Feng X B. Layout-oblivious compiler optimization for matrix computations. ACM Trans. Architecture and Code Optimization, 2013, 9(4): Article No.35.
Ongaro D, Cox A L, Rixner S. Scheduling I/O in virtual machine monitors. In Proc. the 4th ACM SIGPLAN/SIGOPS Int. Conf. Virtual Execution Environments, Mar. 2008, pp.1-10.
Govindan S, Nath A R, Das A et al. Xen and co.: Communication-aware CPU scheduling for consolidated xen-based hosting platforms. In Proc. the 3rd ACM SIG-PLAN/SIGOPS Int. Conf. Virtual Execution Environments, June 2007, pp.126-136.
Park S, Shen K. FIOS: A fair, efficient flash I/O scheduler. In Proc. the 10th USENIX Conf. File and Storage Technologies, February 2012, Article No.13.
Ryu K D, Hollingsworth J K, Keleher P J. Efficient network and I/O throttling for fine-grain cycle stealing. In Proc. the 2001 ACM/IEEE Conf. Supercomputing, November 2001, Article No.3.
Ma S, Sun X H, Raicu I. I/O throttling and coordination for MapReduce. Technical Report, Illinois Institute of Technology, 2012.
Domingo D. Linux 5 IO tuning guide-performance tuning whitepaper for Red Hat Enterprise Linux 5.2. http://wenku.baidu.com/view/b7fc01ee5ef7ba0d4a733b74.html, August 2013.

Download references

Author information

Authors and Affiliations

State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Fang Lv, Hui-Min Cui, Lei Wang, Lei Liu, Cheng-Gang Wu & Xiao-Bing Feng
University of Chinese Academy of Sciences, Beijing, 100049, China
Fang Lv & Lei Liu
Department of Computer Science and Engineering, University of Minnesota, Twin Cities, Minneapolis, MN, 55455, U.S.A.
Pen-Chung Yew
Institute of Information Science, Academia Sinica, Taibei, 115, China
Pen-Chung Yew

Authors

Fang Lv
View author publications
You can also search for this author in PubMed Google Scholar
Hui-Min Cui
View author publications
You can also search for this author in PubMed Google Scholar
Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Gang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Bing Feng
View author publications
You can also search for this author in PubMed Google Scholar
Pen-Chung Yew
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fang Lv.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(DOC 28 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lv, F., Cui, HM., Wang, L. et al. Dynamic I/O-Aware Scheduling for Batch-Mode Applications on Chip Multiprocessor Systems of Cluster Platforms. J. Comput. Sci. Technol. 29, 21–37 (2014). https://doi.org/10.1007/s11390-013-1409-2

Download citation

Published: 10 January 2014
Issue Date: January 2014
DOI: https://doi.org/10.1007/s11390-013-1409-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic I/O-Aware Scheduling for Batch-Mode Applications on Chip Multiprocessor Systems of Cluster Platforms

Abstract

Access this article

Similar content being viewed by others

Resource Scheduling Algorithm for Heterogeneous High-Performance Clusters

Study of Scheduling Approaches for Batch Processing in Big Data Cluster

Enhancing System Utilization by Dynamic Reallocation of Computing Nodes

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

(DOC 28 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dynamic I/O-Aware Scheduling for Batch-Mode Applications on Chip Multiprocessor Systems of Cluster Platforms

Abstract

Access this article

Similar content being viewed by others

Resource Scheduling Algorithm for Heterogeneous High-Performance Clusters

Study of Scheduling Approaches for Batch Processing in Big Data Cluster

Enhancing System Utilization by Dynamic Reallocation of Computing Nodes

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

(DOC 28 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation