Performance evaluation of enhancement of the layered self-scheduling approach for heterogeneous multicore cluster systems

Wu, Chao-Chin; Lai, Lien-Fu; Huang, Liang-Tsung; Chen, MingLung

doi:10.1007/s11227-011-0726-x

Performance evaluation of enhancement of the layered self-scheduling approach for heterogeneous multicore cluster systems

Published: 21 December 2011

Volume 62, pages 399–430, (2012)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Chao-Chin Wu¹,
Lien-Fu Lai¹,
Liang-Tsung Huang² &
…
MingLung Chen¹

119 Accesses
Explore all metrics

Abstract

Previously we have proposed a Layered Self-Scheduling (LSS) approach that is a hybrid MPI and OpenMP based loop self-scheduling approach for dealing with the heterogeneity problem on a cluster system consisting of multi-core compute nodes, where the allocation functions of several well-known schemes have been modified for better performance. Though LSS provides better performance than the conventional self-scheduling schemes, we found the performance can be improved further after our comprehensive experiments and analyses. The newly proposed task scheduling strategy, called Enhanced Layered Self-Scheduling (ELSS), aims at how to utilize the compute powers of multiple processor cores more efficiently in the master compute node and how to schedule tasks to have more stable performance improvements. We have evaluated the new task scheduling strategy by three benchmark applications: Matrix Multiplication, Monte Carlo Integration, and Mandelbrot Set Computation. It is recommended that the global scheduler adopts Guided Self-Scheduling (GSS) for all, and the local scheduler adopts the static scheme for applications with regular workload distribution but any scheme for applications with irregular workload distribution. Experimental results show the best speedups obtained by ELSS for the three benchmark programs are 1.373, 13.34 and 2.4, respectively, compared with that scheduled by LSS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance improvement of the triangular matrix product in commodity clusters

Article Open access 15 April 2024

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

Article Open access 17 April 2024

Shared Memory Parallelism in Modern C++ and HPX

Article 20 April 2024

References

Banicescu I, Carino RL, Pabico JP, Balasubramaniam M (2005) Overhead analysis of a dynamic load balancing library for cluster computing. In: Proceedings of the 19th IEEE international parallel and distributed processing symposium, p 122.2
Google Scholar
Caflisch RE (1998) Monte Carlo and quasi-Monte Carlo methods. Acta Numer 7:1–49
Article MathSciNet Google Scholar
Chronopoulos AT, Penmatsa S, Xu J, Ali S (2006) Distributed loop-self-scheduling schemes for heterogeneous computer systems. Concurr Comput 18(7):771–785
Article Google Scholar
Chronopoulos AT, Penmatsa S, Yu N (2002) Scalable loop self-scheduling schemes for heterogeneous clusters. In: Proceedings of the 2002 IEEE international conference on cluster computing, pp 353–359
Google Scholar
Herrera J, Huedo E, Montero RS, Llorente IM (2006) Loosely-coupled loop scheduling in computational grids. In: Proceedings of the 20th IEEE international parallel and distributed processing symposium, p 6
Google Scholar
HINT performance analyzer. http://hint.byu.edu/
Hummel SF, Schonberg E, LE Flynn (1992) Factoring: a method scheme for scheduling parallel loops. Commun ACM 35(8):90–101
Article Google Scholar
Li H, Tandri S, Stumm M, Sevcik KC (1993) Locality and loop scheduling on NUMA multiprocessors. In: Proceedings of the 1993 international conference on parallel processing, vol II, pp 140–147
Google Scholar
Mandelbrot BB (1988) Fractal geometry of nature. Freeman, New York
Google Scholar
MPI. http://www.mcs.anl.gov/research/projects/mpi/
OpenMP. http://en.wikipedia.org/wiki/OpenMP/
Polychronopoulos CD, Kuck D (1987) Guided self-scheduling: a practical scheduling scheme for parallel supercomputers. IEEE Trans Comput 36(12):1425–1439
Article Google Scholar
Shih W-C, Yang C-T, Tseng S-S (2007) A performance-based parallel loop scheduling on grid environments. J Supercomput 41(3):247–267
Article Google Scholar
Smith L, Bull M (2001) Development of mixed mode MPI/OpenMP applications. Sci Program 9(2–3):83–98
Google Scholar
Spooner DP, Jarvis SA, Cao J, Saini S, Nudd GR (2003) Local grid scheduling techniques using performance prediction. IEE Proc, Comput Digit Tech 150(2):87–96
Article Google Scholar
Tsuji M, Sato M (2009) Performance evaluation of OpenMP and MPI hybrid programs on a large scale multi-core multi-socket cluster, T2K Open Supercomputer. In: Proceedings of international conference on parallel processing workshops, pp 206–213
Chapter Google Scholar
Tzen TH, Ni LM (1993) Trapezoid self-scheduling: a practical scheduling scheme for parallel compilers. IEEE Trans Parallel Distrib Syst 4:87–98
Article Google Scholar
Wu C-C, Lai L-F, Chiu P-H (2008) Parallel loop self-scheduling for heterogeneous cluster systems with multi-core computers. In: Proceedings of Asia-pacific services computing conference, vol 1, pp 251–256
Chapter Google Scholar
Wu C-C, Lai L-F, Yang C-T, Chiu P-H (2009) Using hybrid MPI and OpenMP programming to optimize communications in parallel loop self-scheduling schemes for multicore PC clusters. J Supercomput. doi:10.1007/s11227-009-0271-z
Google Scholar
Wu C-C, Yang C-T, Lai K-C, Chiu P-H (2010) Designing parallel loop self-scheduling schemes using the hybrid MPI and OpenMP programming model for multi-core Grid systems. J Supercomput. doi:10.1007/s11227-010-0418-y
Google Scholar
Yang C-T, Chang S-C (2004) A parallel loop self-scheduling on extremely heterogeneous PC clusters. J Inf Sci Eng 20(2):263–273
Google Scholar
Yang C-T, Cheng K-W, Li K-C (2005) An enhanced parallel loop self-scheduling scheme for cluster environments. J Supercomput 34(3):315–335
Article Google Scholar
Yang C-T, Cheng K-W, Shih W-C (2007) On development of an efficient parallel loop self-scheduling for grid computing environments. Parallel Comput 33(7–8):467–487
Article Google Scholar
Yang C-T, Shih W-C, Tseng S-S (2008) Dynamic partitioning of loop iterations on heterogeneous PC clusters. J Supercomput 44(1):1–23
Article Google Scholar
Yang C-T, Wu C-C, Chang J-H (2011) Performance-based parallel loop self-scheduling using hybrid OpenMP and MPI Programming on multicore SMP clusters. Concurr Comput 23(8):721–744
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Changhua University of Education, Changhua, 500, Taiwan
Chao-Chin Wu, Lien-Fu Lai & MingLung Chen
Department of Biotechnology, MingDao University, Changhua, Taiwan
Liang-Tsung Huang

Authors

Chao-Chin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Lien-Fu Lai
View author publications
You can also search for this author in PubMed Google Scholar
Liang-Tsung Huang
View author publications
You can also search for this author in PubMed Google Scholar
MingLung Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao-Chin Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, CC., Lai, LF., Huang, LT. et al. Performance evaluation of enhancement of the layered self-scheduling approach for heterogeneous multicore cluster systems. J Supercomput 62, 399–430 (2012). https://doi.org/10.1007/s11227-011-0726-x

Download citation

Published: 21 December 2011
Issue Date: October 2012
DOI: https://doi.org/10.1007/s11227-011-0726-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance evaluation of enhancement of the layered self-scheduling approach for heterogeneous multicore cluster systems

Abstract

Access this article

Similar content being viewed by others

Performance improvement of the triangular matrix product in commodity clusters

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

Shared Memory Parallelism in Modern C++ and HPX

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance evaluation of enhancement of the layered self-scheduling approach for heterogeneous multicore cluster systems

Abstract

Access this article

Similar content being viewed by others

Performance improvement of the triangular matrix product in commodity clusters

The Egyptian national HPC grid (EN-HPCG): open-source Slurm implementation from cluster to grid approach

Shared Memory Parallelism in Modern C++ and HPX

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation