Abstract
Efficiency of batch processing is becoming increasingly important for many modern commercial service centers, e.g., clusters and cloud computing datacenters. However, periodical resource contentions have become the major performance obstacles for concurrently running applications on mainstream CMP servers. I/O contention is such a kind of obstacle, which may impede both the co-running performance of batch jobs and the system throughput seriously. In this paper, a dynamic I/O-aware scheduling algorithm is proposed to lower the impacts of I/O contention and to enhance the co-running performance in batch processing. We set up our environment on an 8-socket, 64-core server in Dawning Linux Cluster. Fifteen workloads ranging from 8 jobs to 256 jobs are evaluated. Our experimental results show significant improvements on the throughputs of the workloads, which range from 7% to 431%. Meanwhile, noticeable improvements on the slowdown of workloads and the average runtime for each job can be achieved. These results show that a well-tuned dynamic I/O-aware scheduler is beneficial for batch-mode services. It can also enhance the resource utilization via throughput improvement on modern service platforms.
Similar content being viewed by others
References
Armbrust M, Fox A, Griffith R et al. Above the clouds: A Berkeley view of cloud computing. Technical Report, UCB/EECS-2009-28, University of California, Berkeley, February 10, 2009.
Mars J, Tang L, Hundt R et al. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proc. the 44th Int. Symp. Microarchitecture, December 2011, pp.248-259.
Mishra A K, Hellerstein J L, Cirne W et al. Towards characterizing cloud backend workloads: Insights from Google compute clusters. ACM SIGMETRICS Performance Evaluation Review, 2010, 37(4): 34–41.
Tang L, Mars J, Soffa M L. Compiling for niceness: Mitigating contention for QoS in warehouse scale computers. In Proc. the 10th Int. Symp. Code Generation and Optimization, March 31-April 4, 2012, pp.1-12.
Barroso L, Holzle U. The case for energy-proportional computing. IEEE Trans. Computer, 2007, 40(12): 33–37.
Höelzle U, Barroso L A. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan and Claypool Publishers, 2009.
Snavely A, Tullsen D. Symbiotic jobscheduling for a simultaneous multithreaded processor. In Proc. the 9th ASPLOS, November 2000, pp.234-244.
Xu D, Wu C G, Yew P C. On mitigating memory bandwidth contention through bandwidth-aware scheduling. In Proc. the 19th Int. Conf. Parallel Architectures and Compilation Techniques, September 2010, pp.237-248.
Zhuravlev S, Blagodurov S, Fedorova A. Addressing shared resource contention in multicore processors via scheduling. In Proc. the 15th Int. Conf. Architectural Support for Programming Languages and Operating Systems, March 2010, pp.129-142.
Gao L, Nguyen Q H, Li L et al. Thread-sensitive modulo scheduling for multicore processors. In Proc. the 37th Int. Conf. Parallel Processing, September 2008, pp.132-140.
Gao L, Xue J L, Ngai T F. Loop recreation for thread-level speculation on multicore processors. Software: Practice and Engineering (SPE), 2010, 40(1): 45–72.
Gao L, Li L, Xue J L et al. Loop recreation for thread-level speculation. In Proc. the 13th Int. Conf. Parallel and Distributed Systems, Dec, 2007, pp.1-10.
Gao L, Li L, Xue J L et al. Exploiting speculative TLP in recursive programs by dynamic thread prediction. In Proc. the 18th Int. Conf. Compiler Construction, Mar, 2009, pp.78-93.
Ghoshal D, Canon R S, Ramakrishnan L. I/O performance of virtualized cloud environments. In Proc. the 2nd Int. Workshop on Data Intensive Computing in the Clouds, November 2011, pp.71-80.
Jain R, Somalwar K, Werth J et al. Scheduling parallel I/O operations in multiple bus systems. Journal of Parallel and Distributed Systems, 1992, 16(4): 352–362.
Jain R, Somalwar K, Werth J et al. Heuristics for scheduling I/O operations. IEEE Trans. Parallel and Distributed Systems, 1997, 8(3): 310–320.
Thakur R, Gropp W, Lusk E. Data sieving and collective I/O in ROMIO. In Proc. the 7th Symp. Frontiers of Massively Parallel Computation, February 1999, pp.182-189.
Acharya A, Uysal M, Bennett R et al. Tuning the performance of I/O-intensive parallel applications. In Proc. the 4th Workshop on I/O in Parallel and Distributed Systems, May 1996, pp.15-27.
Lin Z, Zhou S. Parallelizing I/O intensive applications for a workstation cluster: A case study. ACM SIGARCH Computer Architecture News, 1993, 21(5): 15–22.
Hastings A, Choudhary A. Exploiting shared memory to improve parallel I/O performance. In Proc. the 13th European PVM/MPI User’s Group Conf. Recent Advances in Parallel Virtual Machine and Message Passing Interface, September 2006, pp.212-221.
Lameter C. Local and remote memory: Memory in a Linux/NUMA system. In Linux Symp., July 2006, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.138.7986& rep=rep1&type=pdf, Nov. 2013.
Shakshober D J. Choosing an I/O scheduler for Red Hatr Enterprise Linuxr 4 and the 2.6 kernel. http://www.redhat.com/magazine/008jun05/features/schedulers/, Dec. 2012.
Jiang Y L, Shen X P, Chen J et al. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In Proc. the 17th Int. Conf. Parallel Architectures and Compilation Techniques, October 2008, pp.220-229.
Zhuravlev S, Saez J C, Blagodurov S et al. Survey of scheduling techniques for addressing shared resources in multicore processors. ACM Computing Surveys, 2011, 45(1): Article No.4.
Majo Z, Gross T R. Memory management in NUMA multicore systems: Trapped between cache contention and interconnect overhead. In Proc. the Int. Symp. Memory Management, June 2011, pp.11-20.
Mi W, Feng X B, Xue J L, Jia Y C. Software-hardware cooperative DRAM bank partitioning for chip multiprocessors. In Proc. the 7th Int. Conf. Network and Parallel Computing, September 2010, pp.329-343.
Mi W, Feng X B, Jia Y C et al. PARBLO: Page-allocation-based DRAM row buffer locality optimization. Journal of Computer Science and Technology, 2009, 24(6): 1086–1097.
Zhao J C, Cui H M, Xue J L et al. An empirical model for predicting cross-core performance interference on multicore processors. In Proc. the 22nd Int. Conf. Parallel Architectures and Compilation Techniques, September 2013, pp.201-212.
Bao B, Ding C. Defensive loop tiling for shared cache. In Proc. the IEEE/ACM Int. Symp. Code Generation and Optimization, February 2013, pp.1-11.
Cui H M, Wang L, Xue J L et al. Automatic library generation for BLAS3 on GPUs. In Proc. the 25th IEEE Int. Symp. J. Comput. Sci. & Technol., Jan. 2014, Vol.29, No.1 Parallel and Distributed Processing, May 2011, pp.255-265.
Cui H M, Xue J L, Wang L et al. Extendable pattern-oriented optimization directives. In Proc. the 9th Annual IEEE/ACM Int. Symp. Code Generation and Optimization, April 2011, pp.107-118.
Cui H M, Yi Q, Xue J L, Feng X B. Layout-oblivious compiler optimization for matrix computations. ACM Trans. Architecture and Code Optimization, 2013, 9(4): Article No.35.
Ongaro D, Cox A L, Rixner S. Scheduling I/O in virtual machine monitors. In Proc. the 4th ACM SIGPLAN/SIGOPS Int. Conf. Virtual Execution Environments, Mar. 2008, pp.1-10.
Govindan S, Nath A R, Das A et al. Xen and co.: Communication-aware CPU scheduling for consolidated xen-based hosting platforms. In Proc. the 3rd ACM SIG-PLAN/SIGOPS Int. Conf. Virtual Execution Environments, June 2007, pp.126-136.
Park S, Shen K. FIOS: A fair, efficient flash I/O scheduler. In Proc. the 10th USENIX Conf. File and Storage Technologies, February 2012, Article No.13.
Ryu K D, Hollingsworth J K, Keleher P J. Efficient network and I/O throttling for fine-grain cycle stealing. In Proc. the 2001 ACM/IEEE Conf. Supercomputing, November 2001, Article No.3.
Ma S, Sun X H, Raicu I. I/O throttling and coordination for MapReduce. Technical Report, Illinois Institute of Technology, 2012.
Domingo D. Linux 5 IO tuning guide-performance tuning whitepaper for Red Hat Enterprise Linux 5.2. http://wenku.baidu.com/view/b7fc01ee5ef7ba0d4a733b74.html, August 2013.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Lv, F., Cui, HM., Wang, L. et al. Dynamic I/O-Aware Scheduling for Batch-Mode Applications on Chip Multiprocessor Systems of Cluster Platforms. J. Comput. Sci. Technol. 29, 21–37 (2014). https://doi.org/10.1007/s11390-013-1409-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-013-1409-2