Skip to main content
Log in

Performance evaluation of enhancement of the layered self-scheduling approach for heterogeneous multicore cluster systems

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Previously we have proposed a Layered Self-Scheduling (LSS) approach that is a hybrid MPI and OpenMP based loop self-scheduling approach for dealing with the heterogeneity problem on a cluster system consisting of multi-core compute nodes, where the allocation functions of several well-known schemes have been modified for better performance. Though LSS provides better performance than the conventional self-scheduling schemes, we found the performance can be improved further after our comprehensive experiments and analyses. The newly proposed task scheduling strategy, called Enhanced Layered Self-Scheduling (ELSS), aims at how to utilize the compute powers of multiple processor cores more efficiently in the master compute node and how to schedule tasks to have more stable performance improvements. We have evaluated the new task scheduling strategy by three benchmark applications: Matrix Multiplication, Monte Carlo Integration, and Mandelbrot Set Computation. It is recommended that the global scheduler adopts Guided Self-Scheduling (GSS) for all, and the local scheduler adopts the static scheme for applications with regular workload distribution but any scheme for applications with irregular workload distribution. Experimental results show the best speedups obtained by ELSS for the three benchmark programs are 1.373, 13.34 and 2.4, respectively, compared with that scheduled by LSS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Banicescu I, Carino RL, Pabico JP, Balasubramaniam M (2005) Overhead analysis of a dynamic load balancing library for cluster computing. In: Proceedings of the 19th IEEE international parallel and distributed processing symposium, p 122.2

    Google Scholar 

  2. Caflisch RE (1998) Monte Carlo and quasi-Monte Carlo methods. Acta Numer 7:1–49

    Article  MathSciNet  Google Scholar 

  3. Chronopoulos AT, Penmatsa S, Xu J, Ali S (2006) Distributed loop-self-scheduling schemes for heterogeneous computer systems. Concurr Comput 18(7):771–785

    Article  Google Scholar 

  4. Chronopoulos AT, Penmatsa S, Yu N (2002) Scalable loop self-scheduling schemes for heterogeneous clusters. In: Proceedings of the 2002 IEEE international conference on cluster computing, pp 353–359

    Google Scholar 

  5. Herrera J, Huedo E, Montero RS, Llorente IM (2006) Loosely-coupled loop scheduling in computational grids. In: Proceedings of the 20th IEEE international parallel and distributed processing symposium, p 6

    Google Scholar 

  6. HINT performance analyzer. http://hint.byu.edu/

  7. Hummel SF, Schonberg E, LE Flynn (1992) Factoring: a method scheme for scheduling parallel loops. Commun ACM 35(8):90–101

    Article  Google Scholar 

  8. Li H, Tandri S, Stumm M, Sevcik KC (1993) Locality and loop scheduling on NUMA multiprocessors. In: Proceedings of the 1993 international conference on parallel processing, vol II, pp 140–147

    Google Scholar 

  9. Mandelbrot BB (1988) Fractal geometry of nature. Freeman, New York

    Google Scholar 

  10. MPI. http://www.mcs.anl.gov/research/projects/mpi/

  11. OpenMP. http://en.wikipedia.org/wiki/OpenMP/

  12. Polychronopoulos CD, Kuck D (1987) Guided self-scheduling: a practical scheduling scheme for parallel supercomputers. IEEE Trans Comput 36(12):1425–1439

    Article  Google Scholar 

  13. Shih W-C, Yang C-T, Tseng S-S (2007) A performance-based parallel loop scheduling on grid environments. J Supercomput 41(3):247–267

    Article  Google Scholar 

  14. Smith L, Bull M (2001) Development of mixed mode MPI/OpenMP applications. Sci Program 9(2–3):83–98

    Google Scholar 

  15. Spooner DP, Jarvis SA, Cao J, Saini S, Nudd GR (2003) Local grid scheduling techniques using performance prediction. IEE Proc, Comput Digit Tech 150(2):87–96

    Article  Google Scholar 

  16. Tsuji M, Sato M (2009) Performance evaluation of OpenMP and MPI hybrid programs on a large scale multi-core multi-socket cluster, T2K Open Supercomputer. In: Proceedings of international conference on parallel processing workshops, pp 206–213

    Chapter  Google Scholar 

  17. Tzen TH, Ni LM (1993) Trapezoid self-scheduling: a practical scheduling scheme for parallel compilers. IEEE Trans Parallel Distrib Syst 4:87–98

    Article  Google Scholar 

  18. Wu C-C, Lai L-F, Chiu P-H (2008) Parallel loop self-scheduling for heterogeneous cluster systems with multi-core computers. In: Proceedings of Asia-pacific services computing conference, vol 1, pp 251–256

    Chapter  Google Scholar 

  19. Wu C-C, Lai L-F, Yang C-T, Chiu P-H (2009) Using hybrid MPI and OpenMP programming to optimize communications in parallel loop self-scheduling schemes for multicore PC clusters. J Supercomput. doi:10.1007/s11227-009-0271-z

    Google Scholar 

  20. Wu C-C, Yang C-T, Lai K-C, Chiu P-H (2010) Designing parallel loop self-scheduling schemes using the hybrid MPI and OpenMP programming model for multi-core Grid systems. J Supercomput. doi:10.1007/s11227-010-0418-y

    Google Scholar 

  21. Yang C-T, Chang S-C (2004) A parallel loop self-scheduling on extremely heterogeneous PC clusters. J Inf Sci Eng 20(2):263–273

    Google Scholar 

  22. Yang C-T, Cheng K-W, Li K-C (2005) An enhanced parallel loop self-scheduling scheme for cluster environments. J Supercomput 34(3):315–335

    Article  Google Scholar 

  23. Yang C-T, Cheng K-W, Shih W-C (2007) On development of an efficient parallel loop self-scheduling for grid computing environments. Parallel Comput 33(7–8):467–487

    Article  Google Scholar 

  24. Yang C-T, Shih W-C, Tseng S-S (2008) Dynamic partitioning of loop iterations on heterogeneous PC clusters. J Supercomput 44(1):1–23

    Article  Google Scholar 

  25. Yang C-T, Wu C-C, Chang J-H (2011) Performance-based parallel loop self-scheduling using hybrid OpenMP and MPI Programming on multicore SMP clusters. Concurr Comput 23(8):721–744

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao-Chin Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, CC., Lai, LF., Huang, LT. et al. Performance evaluation of enhancement of the layered self-scheduling approach for heterogeneous multicore cluster systems. J Supercomput 62, 399–430 (2012). https://doi.org/10.1007/s11227-011-0726-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-011-0726-x

Keywords

Navigation