Using hybrid MPI and OpenMP programming to optimize communications in parallel loop self-scheduling schemes for multicore PC clusters

Wu, Chao-Chin; Lai, Lien-Fu; Yang, Chao-Tung; Chiu, Po-Hsun

doi:10.1007/s11227-009-0271-z

Using hybrid MPI and OpenMP programming to optimize communications in parallel loop self-scheduling schemes for multicore PC clusters

Published: 25 February 2009

Volume 60, pages 31–61, (2012)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Chao-Chin Wu¹,
Lien-Fu Lai¹,
Chao-Tung Yang² &
…
Po-Hsun Chiu¹

330 Accesses
15 Citations
3 Altmetric
Explore all metrics

Abstract

Recently, a series of parallel loop self-scheduling schemes have been proposed, especially for heterogeneous cluster systems. However, they employed the MPI programming model to construct the applications without considering whether the computing node is multicore architecture or not. As a result, every processor core has to communicate directly with the master node for requesting new tasks no matter the fact that the processor cores on the same node can communicate with each other through the underlying shared memory. To address the problem of higher communication overhead, in this paper we propose to adopt hybrid MPI and OpenMP programming model to design two-level parallel loop self-scheduling schemes. In the first level, each computing node runs an MPI process for inter-node communications. In the second level, each processor core runs an OpenMP thread to execute the iterations assigned for its resident node. Experimental results show that our method outperforms the previous works.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance improvement of the triangular matrix product in commodity clusters

Article Open access 15 April 2024

Parallelizing the dual revised simplex method

Article Open access 14 December 2017

Shared Memory Parallelism in Modern C++ and HPX

Article 20 April 2024

References

Baker M, Buyya R (1999) Cluster computing: the commodity supercomputer. Int J Softw Pract Exp 29(6):551–575
Article Google Scholar
Beaumont O, Casanova H, Legrand A, Robert Y, Yang Y (2005) Scheduling divisible loads on star and tree networks: results and open problems. IEEE Trans Parallel Distrib Syst 16:207–218
Article Google Scholar
Bennett BH, Davis E, Kunau T, Wren W (2000) Beowulf parallel processing for dynamic loadbalancing. In: Proceedings of IEEE aerospace conference, 2000, vol 4, pp 389–395
Bohn CA, Lamont GB (2002) Load balancing for heterogeneous clusters of PCs. Future Gener Comput Syst 18:389–400
Article Google Scholar
Cheng K-W, Yang C-T, Lai C-L, Chang S-C (2004) A parallel loop self-scheduling on grid computing environments. In: Proceedings of the 2004 IEEE international symposium on parallel architectures, algorithms and networks, KH, China, May 2004, pp 409–414
Chronopoulos AT, Andonie R, Benche M, Grosu D (2001) A class of loop self-scheduling for heterogeneous clusters. In: Proceedings of the 2001 IEEE international conference on cluster computing, 2001, pp 282–291
He Y, Ding HQ (2002) MPI and OpenMP paradigms on cluster of SMP architectures: the vacancy tracking algorithm for multi-dimensional array transposition. In: Proceedings of the 2002 ACM/IEEE conference on supercomputing, 2002, pp 1–14
Hummel SF, Schonberg E, Flynn LE (1992) Factoring: a method scheme for scheduling parallel loops. Commun ACM 35:90–101
Article Google Scholar
Introduction to the Mandelbrot set (2008) http://www.ddewey.net/mandelbrot/
Li H, Tandri S, Stumm M, Sevcik KC (1993) Locality and loop scheduling on NUMA multiprocessors. In: Proceedings of the 1993 international conference on parallel processing, vol II, 1993, pp 140–147
Polychronopoulos CD, Kuck D (1987) Guided self-scheduling: a practical scheduling scheme for parallel supercomputers. IEEE Trans Comput 36(12):1425–1439
Article Google Scholar
Post E, Goosen HA (2001) Evaluation the parallel performance of a heterogeneous system. In: Proceedings of 5th international conference and exhibition on high-performance computing in the Asia-Pacific region (HPC Asia 2001)
Rosenberg R, Norton G, Novarini JC, Anderson W, Lanzagorta M (2006) Modeling pulse propagation and scattering in a dispersive medium: performance of MPI/OpenMP hybrid code. In: Proceedings of the ACM/IEEE conference on supercomputing, 2006, pp 47–47
Shih W-C, Yang C-T, Tseng S-S (2007) A performance-based parallel loop scheduling on grid environments. J Supercomput 41(3):247–267
Article Google Scholar
Sterling T, Bell G, Kowalik JS (2002) Beowulf cluster computing with Linux. MIT Press, Cambridge
Google Scholar
Tang P, Yew PC (1986) Processor self-scheduling for multiple-nested parallel loops. In: Proceedings of the 1986 international conference on parallel processing, 1986, pp 528–535
The Scalable Computing Laboratory (SCL) (2008) http://www.scl.ameslab.gov/
Tzen TH, Ni LM (1993) Trapezoid self-scheduling: a practical scheduling scheme for parallel compilers. IEEE Trans Parallel Distrib Syst 4:87–98
Article Google Scholar
Yang C-T, Chang S-C (2004) A parallel loop self-scheduling on extremely heterogeneous PC clusters. J Inf Sci Eng 20(2):263–273
Google Scholar
Yang C-T, Cheng K-W, Li K-C (2005) An enhanced parallel loop self-scheduling scheme for cluster environments. J Supercomput 34(3):315–335
Article Google Scholar
Yang C-T, Cheng K-W, Shih W-C (2007) On development of an efficient parallel loop self-scheduling for grid computing environments. Parallel Comput 33(7–8):467–487
Google Scholar
Yang C-T, Shih W-C, Tseng S-S (2008) Dynamic partitioning of loop iterations on heterogeneous PC clusters. J Supercomput 44(1):1–23
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Changhua University of Education, Changhua City, 500, Taiwan
Chao-Chin Wu, Lien-Fu Lai & Po-Hsun Chiu
High-Performance Computing Laboratory, Department of Computer Science and Information Engineering, Tunghai University, Taichung, 40704, Taiwan
Chao-Tung Yang

Authors

Chao-Chin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Lien-Fu Lai
View author publications
You can also search for this author in PubMed Google Scholar
Chao-Tung Yang
View author publications
You can also search for this author in PubMed Google Scholar
Po-Hsun Chiu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao-Chin Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, CC., Lai, LF., Yang, CT. et al. Using hybrid MPI and OpenMP programming to optimize communications in parallel loop self-scheduling schemes for multicore PC clusters. J Supercomput 60, 31–61 (2012). https://doi.org/10.1007/s11227-009-0271-z

Download citation

Received: 01 August 2008
Accepted: 02 February 2009
Published: 25 February 2009
Issue Date: April 2012
DOI: https://doi.org/10.1007/s11227-009-0271-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using hybrid MPI and OpenMP programming to optimize communications in parallel loop self-scheduling schemes for multicore PC clusters

Abstract

Access this article

Similar content being viewed by others

Performance improvement of the triangular matrix product in commodity clusters

Parallelizing the dual revised simplex method

Shared Memory Parallelism in Modern C++ and HPX

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using hybrid MPI and OpenMP programming to optimize communications in parallel loop self-scheduling schemes for multicore PC clusters

Abstract

Access this article

Similar content being viewed by others

Performance improvement of the triangular matrix product in commodity clusters

Parallelizing the dual revised simplex method

Shared Memory Parallelism in Modern C++ and HPX

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation