Skip to main content

Advertisement

Log in

Dynamic I/O-Aware Scheduling for Batch-Mode Applications on Chip Multiprocessor Systems of Cluster Platforms

  • Original Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Efficiency of batch processing is becoming increasingly important for many modern commercial service centers, e.g., clusters and cloud computing datacenters. However, periodical resource contentions have become the major performance obstacles for concurrently running applications on mainstream CMP servers. I/O contention is such a kind of obstacle, which may impede both the co-running performance of batch jobs and the system throughput seriously. In this paper, a dynamic I/O-aware scheduling algorithm is proposed to lower the impacts of I/O contention and to enhance the co-running performance in batch processing. We set up our environment on an 8-socket, 64-core server in Dawning Linux Cluster. Fifteen workloads ranging from 8 jobs to 256 jobs are evaluated. Our experimental results show significant improvements on the throughputs of the workloads, which range from 7% to 431%. Meanwhile, noticeable improvements on the slowdown of workloads and the average runtime for each job can be achieved. These results show that a well-tuned dynamic I/O-aware scheduler is beneficial for batch-mode services. It can also enhance the resource utilization via throughput improvement on modern service platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Armbrust M, Fox A, Griffith R et al. Above the clouds: A Berkeley view of cloud computing. Technical Report, UCB/EECS-2009-28, University of California, Berkeley, February 10, 2009.

  2. Mars J, Tang L, Hundt R et al. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proc. the 44th Int. Symp. Microarchitecture, December 2011, pp.248-259.

  3. Mishra A K, Hellerstein J L, Cirne W et al. Towards characterizing cloud backend workloads: Insights from Google compute clusters. ACM SIGMETRICS Performance Evaluation Review, 2010, 37(4): 34–41.

    Article  Google Scholar 

  4. Tang L, Mars J, Soffa M L. Compiling for niceness: Mitigating contention for QoS in warehouse scale computers. In Proc. the 10th Int. Symp. Code Generation and Optimization, March 31-April 4, 2012, pp.1-12.

  5. Barroso L, Holzle U. The case for energy-proportional computing. IEEE Trans. Computer, 2007, 40(12): 33–37.

    Google Scholar 

  6. Höelzle U, Barroso L A. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan and Claypool Publishers, 2009.

  7. Snavely A, Tullsen D. Symbiotic jobscheduling for a simultaneous multithreaded processor. In Proc. the 9th ASPLOS, November 2000, pp.234-244.

  8. Xu D, Wu C G, Yew P C. On mitigating memory bandwidth contention through bandwidth-aware scheduling. In Proc. the 19th Int. Conf. Parallel Architectures and Compilation Techniques, September 2010, pp.237-248.

  9. Zhuravlev S, Blagodurov S, Fedorova A. Addressing shared resource contention in multicore processors via scheduling. In Proc. the 15th Int. Conf. Architectural Support for Programming Languages and Operating Systems, March 2010, pp.129-142.

  10. Gao L, Nguyen Q H, Li L et al. Thread-sensitive modulo scheduling for multicore processors. In Proc. the 37th Int. Conf. Parallel Processing, September 2008, pp.132-140.

  11. Gao L, Xue J L, Ngai T F. Loop recreation for thread-level speculation on multicore processors. Software: Practice and Engineering (SPE), 2010, 40(1): 45–72.

    Google Scholar 

  12. Gao L, Li L, Xue J L et al. Loop recreation for thread-level speculation. In Proc. the 13th Int. Conf. Parallel and Distributed Systems, Dec, 2007, pp.1-10.

  13. Gao L, Li L, Xue J L et al. Exploiting speculative TLP in recursive programs by dynamic thread prediction. In Proc. the 18th Int. Conf. Compiler Construction, Mar, 2009, pp.78-93.

  14. Ghoshal D, Canon R S, Ramakrishnan L. I/O performance of virtualized cloud environments. In Proc. the 2nd Int. Workshop on Data Intensive Computing in the Clouds, November 2011, pp.71-80.

  15. Jain R, Somalwar K, Werth J et al. Scheduling parallel I/O operations in multiple bus systems. Journal of Parallel and Distributed Systems, 1992, 16(4): 352–362.

    Article  MATH  Google Scholar 

  16. Jain R, Somalwar K, Werth J et al. Heuristics for scheduling I/O operations. IEEE Trans. Parallel and Distributed Systems, 1997, 8(3): 310–320.

    Article  Google Scholar 

  17. Thakur R, Gropp W, Lusk E. Data sieving and collective I/O in ROMIO. In Proc. the 7th Symp. Frontiers of Massively Parallel Computation, February 1999, pp.182-189.

  18. Acharya A, Uysal M, Bennett R et al. Tuning the performance of I/O-intensive parallel applications. In Proc. the 4th Workshop on I/O in Parallel and Distributed Systems, May 1996, pp.15-27.

  19. Lin Z, Zhou S. Parallelizing I/O intensive applications for a workstation cluster: A case study. ACM SIGARCH Computer Architecture News, 1993, 21(5): 15–22.

    Article  Google Scholar 

  20. Hastings A, Choudhary A. Exploiting shared memory to improve parallel I/O performance. In Proc. the 13th European PVM/MPI User’s Group Conf. Recent Advances in Parallel Virtual Machine and Message Passing Interface, September 2006, pp.212-221.

  21. Lameter C. Local and remote memory: Memory in a Linux/NUMA system. In Linux Symp., July 2006, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.138.7986& rep=rep1&type=pdf, Nov. 2013.

  22. Shakshober D J. Choosing an I/O scheduler for Red Hatr Enterprise Linuxr 4 and the 2.6 kernel. http://www.redhat.com/magazine/008jun05/features/schedulers/, Dec. 2012.

  23. Jiang Y L, Shen X P, Chen J et al. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In Proc. the 17th Int. Conf. Parallel Architectures and Compilation Techniques, October 2008, pp.220-229.

  24. Zhuravlev S, Saez J C, Blagodurov S et al. Survey of scheduling techniques for addressing shared resources in multicore processors. ACM Computing Surveys, 2011, 45(1): Article No.4.

  25. Majo Z, Gross T R. Memory management in NUMA multicore systems: Trapped between cache contention and interconnect overhead. In Proc. the Int. Symp. Memory Management, June 2011, pp.11-20.

  26. Mi W, Feng X B, Xue J L, Jia Y C. Software-hardware cooperative DRAM bank partitioning for chip multiprocessors. In Proc. the 7th Int. Conf. Network and Parallel Computing, September 2010, pp.329-343.

  27. Mi W, Feng X B, Jia Y C et al. PARBLO: Page-allocation-based DRAM row buffer locality optimization. Journal of Computer Science and Technology, 2009, 24(6): 1086–1097.

    Article  Google Scholar 

  28. Zhao J C, Cui H M, Xue J L et al. An empirical model for predicting cross-core performance interference on multicore processors. In Proc. the 22nd Int. Conf. Parallel Architectures and Compilation Techniques, September 2013, pp.201-212.

  29. Bao B, Ding C. Defensive loop tiling for shared cache. In Proc. the IEEE/ACM Int. Symp. Code Generation and Optimization, February 2013, pp.1-11.

  30. Cui H M, Wang L, Xue J L et al. Automatic library generation for BLAS3 on GPUs. In Proc. the 25th IEEE Int. Symp. J. Comput. Sci. & Technol., Jan. 2014, Vol.29, No.1 Parallel and Distributed Processing, May 2011, pp.255-265.

  31. Cui H M, Xue J L, Wang L et al. Extendable pattern-oriented optimization directives. In Proc. the 9th Annual IEEE/ACM Int. Symp. Code Generation and Optimization, April 2011, pp.107-118.

  32. Cui H M, Yi Q, Xue J L, Feng X B. Layout-oblivious compiler optimization for matrix computations. ACM Trans. Architecture and Code Optimization, 2013, 9(4): Article No.35.

  33. Ongaro D, Cox A L, Rixner S. Scheduling I/O in virtual machine monitors. In Proc. the 4th ACM SIGPLAN/SIGOPS Int. Conf. Virtual Execution Environments, Mar. 2008, pp.1-10.

  34. Govindan S, Nath A R, Das A et al. Xen and co.: Communication-aware CPU scheduling for consolidated xen-based hosting platforms. In Proc. the 3rd ACM SIG-PLAN/SIGOPS Int. Conf. Virtual Execution Environments, June 2007, pp.126-136.

  35. Park S, Shen K. FIOS: A fair, efficient flash I/O scheduler. In Proc. the 10th USENIX Conf. File and Storage Technologies, February 2012, Article No.13.

  36. Ryu K D, Hollingsworth J K, Keleher P J. Efficient network and I/O throttling for fine-grain cycle stealing. In Proc. the 2001 ACM/IEEE Conf. Supercomputing, November 2001, Article No.3.

  37. Ma S, Sun X H, Raicu I. I/O throttling and coordination for MapReduce. Technical Report, Illinois Institute of Technology, 2012.

  38. Domingo D. Linux 5 IO tuning guide-performance tuning whitepaper for Red Hat Enterprise Linux 5.2. http://wenku.baidu.com/view/b7fc01ee5ef7ba0d4a733b74.html, August 2013.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fang Lv.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(DOC 28 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lv, F., Cui, HM., Wang, L. et al. Dynamic I/O-Aware Scheduling for Batch-Mode Applications on Chip Multiprocessor Systems of Cluster Platforms. J. Comput. Sci. Technol. 29, 21–37 (2014). https://doi.org/10.1007/s11390-013-1409-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-013-1409-2

Keywords

Navigation