Abstract
Previously we have proposed a Layered Self-Scheduling (LSS) approach that is a hybrid MPI and OpenMP based loop self-scheduling approach for dealing with the heterogeneity problem on a cluster system consisting of multi-core compute nodes, where the allocation functions of several well-known schemes have been modified for better performance. Though LSS provides better performance than the conventional self-scheduling schemes, we found the performance can be improved further after our comprehensive experiments and analyses. The newly proposed task scheduling strategy, called Enhanced Layered Self-Scheduling (ELSS), aims at how to utilize the compute powers of multiple processor cores more efficiently in the master compute node and how to schedule tasks to have more stable performance improvements. We have evaluated the new task scheduling strategy by three benchmark applications: Matrix Multiplication, Monte Carlo Integration, and Mandelbrot Set Computation. It is recommended that the global scheduler adopts Guided Self-Scheduling (GSS) for all, and the local scheduler adopts the static scheme for applications with regular workload distribution but any scheme for applications with irregular workload distribution. Experimental results show the best speedups obtained by ELSS for the three benchmark programs are 1.373, 13.34 and 2.4, respectively, compared with that scheduled by LSS.
Similar content being viewed by others
References
Banicescu I, Carino RL, Pabico JP, Balasubramaniam M (2005) Overhead analysis of a dynamic load balancing library for cluster computing. In: Proceedings of the 19th IEEE international parallel and distributed processing symposium, p 122.2
Caflisch RE (1998) Monte Carlo and quasi-Monte Carlo methods. Acta Numer 7:1–49
Chronopoulos AT, Penmatsa S, Xu J, Ali S (2006) Distributed loop-self-scheduling schemes for heterogeneous computer systems. Concurr Comput 18(7):771–785
Chronopoulos AT, Penmatsa S, Yu N (2002) Scalable loop self-scheduling schemes for heterogeneous clusters. In: Proceedings of the 2002 IEEE international conference on cluster computing, pp 353–359
Herrera J, Huedo E, Montero RS, Llorente IM (2006) Loosely-coupled loop scheduling in computational grids. In: Proceedings of the 20th IEEE international parallel and distributed processing symposium, p 6
HINT performance analyzer. http://hint.byu.edu/
Hummel SF, Schonberg E, LE Flynn (1992) Factoring: a method scheme for scheduling parallel loops. Commun ACM 35(8):90–101
Li H, Tandri S, Stumm M, Sevcik KC (1993) Locality and loop scheduling on NUMA multiprocessors. In: Proceedings of the 1993 international conference on parallel processing, vol II, pp 140–147
Mandelbrot BB (1988) Fractal geometry of nature. Freeman, New York
Polychronopoulos CD, Kuck D (1987) Guided self-scheduling: a practical scheduling scheme for parallel supercomputers. IEEE Trans Comput 36(12):1425–1439
Shih W-C, Yang C-T, Tseng S-S (2007) A performance-based parallel loop scheduling on grid environments. J Supercomput 41(3):247–267
Smith L, Bull M (2001) Development of mixed mode MPI/OpenMP applications. Sci Program 9(2–3):83–98
Spooner DP, Jarvis SA, Cao J, Saini S, Nudd GR (2003) Local grid scheduling techniques using performance prediction. IEE Proc, Comput Digit Tech 150(2):87–96
Tsuji M, Sato M (2009) Performance evaluation of OpenMP and MPI hybrid programs on a large scale multi-core multi-socket cluster, T2K Open Supercomputer. In: Proceedings of international conference on parallel processing workshops, pp 206–213
Tzen TH, Ni LM (1993) Trapezoid self-scheduling: a practical scheduling scheme for parallel compilers. IEEE Trans Parallel Distrib Syst 4:87–98
Wu C-C, Lai L-F, Chiu P-H (2008) Parallel loop self-scheduling for heterogeneous cluster systems with multi-core computers. In: Proceedings of Asia-pacific services computing conference, vol 1, pp 251–256
Wu C-C, Lai L-F, Yang C-T, Chiu P-H (2009) Using hybrid MPI and OpenMP programming to optimize communications in parallel loop self-scheduling schemes for multicore PC clusters. J Supercomput. doi:10.1007/s11227-009-0271-z
Wu C-C, Yang C-T, Lai K-C, Chiu P-H (2010) Designing parallel loop self-scheduling schemes using the hybrid MPI and OpenMP programming model for multi-core Grid systems. J Supercomput. doi:10.1007/s11227-010-0418-y
Yang C-T, Chang S-C (2004) A parallel loop self-scheduling on extremely heterogeneous PC clusters. J Inf Sci Eng 20(2):263–273
Yang C-T, Cheng K-W, Li K-C (2005) An enhanced parallel loop self-scheduling scheme for cluster environments. J Supercomput 34(3):315–335
Yang C-T, Cheng K-W, Shih W-C (2007) On development of an efficient parallel loop self-scheduling for grid computing environments. Parallel Comput 33(7–8):467–487
Yang C-T, Shih W-C, Tseng S-S (2008) Dynamic partitioning of loop iterations on heterogeneous PC clusters. J Supercomput 44(1):1–23
Yang C-T, Wu C-C, Chang J-H (2011) Performance-based parallel loop self-scheduling using hybrid OpenMP and MPI Programming on multicore SMP clusters. Concurr Comput 23(8):721–744
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, CC., Lai, LF., Huang, LT. et al. Performance evaluation of enhancement of the layered self-scheduling approach for heterogeneous multicore cluster systems. J Supercomput 62, 399–430 (2012). https://doi.org/10.1007/s11227-011-0726-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-011-0726-x