Abstract
In recent years, Intel promotes its new product Xeon Phi coprocessor, which is similar to the x86 architecture coprocessor. It has about 60 cores and can be regarded as a single computing node, with the computing power that cannot be ignored. This work aims to improve the workload balance by parallel loop self-scheduling scheme performed on Xeon Phi-based computer cluster. The proposed concept is implemented by hybrid MPI and OpenMP parallel programming in C language. Since parallel loop self-scheduling composes of static and dynamic allocation, weighting algorithm is adopted in the static part, while the well-known loop self-scheduling is adopted in dynamic part. The loop block is partitioned according to the weighting of MIC and HOST nodes. Accordingly, Xeon Phi with many-core is adopted to implement parallel loop self-scheduling. Finally, we test the performance in the experiments by four applicable problems: matrix multiplication, sparse matrix multiplication, Mandelbrot set and circuit meet. The experimental results indicate how to do the weight allocation and which scheduling method can achieve the best performance.



















Similar content being viewed by others
References
Heinecke A (2013) Accelerators in scientific computing is it worth the effort? In: 2013 International Conference on High Performance Computing and Simulation (HPCS), p 504
Rosales C (2013) Porting to the intel xeon phi: opportunities and challenges. In: Extreme Scaling Workshop, pp 1–7
Hwu W mei (2014) What is ahead for parallel computing. J Parallel Distrib Comput 74:2574–2581
Andrew M, Justin R, Alan G, Herman L (2014) A multi-tiered optimization framework for heterogeneous computing. In: High Performance Extreme Computing Conference (HPEC), IEEE, pp 1–6
Yang C-T, Shih W-C, Tseng S-S (2007) Dynamic partitioning of loop iterations on heterogeneous pc clusters. J Supercomput 44:1–23
Yang C-T, Shih W-C, Cheng L-H (2012) Performance-based dynamic loop scheduling in heterogeneous computing environments. J Supercomput 59:414. doi:10.1007/s11227-010-0443-x
Wu C-C, Yang C-T, Lai K-C, Chiu P-H (2012) Designing parallel loop self-scheduling schemes using the hybrid mpi and openmp programming model for multi-core grid systems. J Supercomput 59:42–60
Shih W-C, Yang C-T, Tseng S-S (2007) A performance-based parallel loop scheduling on grid environments. J Supercomput 41:247–267
Yang C-T, Shih W-C, Cheng L-H (2012) Performance-based dynamic loop scheduling in heterogeneous computing environments. J Supercomput 59:414–442
Ca B, Gb L (2002) Load balancing for heterogeneous clusters of pcs. Future Gener Comput Syst 18:389–400
Yagoubi B, Slimani Y (2007) Load balancing strategy in grid environment. J Inf Technol Appl 1:285–296
Abdelkader DM, Omara F (2012) Dynamic task scheduling algorithm with load balancing for heterogeneous computing system. Egypt Inf J 13:135–145
Yang C-T, Wu C-C, Chang J-H (2011) Performance-based parallel loop self-scheduling using hybrid openmp and mpi programming on multicore smp clusters. Concurr Comput Pract Exp 23:721–744
Huang CW, Kuo CF, Yang CT, Liu JC, Chen ST (2015) Improvement of workload balancing using parallel loop self-scheduling on xeon phi. In: Seventh International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), pp 80–86
Intel xeon phi (2014a) URL: http://www.intel.com.tw/content/www/tw/zh/processors/xeon/xeon-phi-detail.html
Intel math kernel library-linpack (2014b) URL: https://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download
Intel ark (2014c) URL: http://ark.intel.com
Openmp wiki (2014d) URL: http://en.wikipedia.org/wiki/OpenMP
Mpi wiki (2014e) URL: http://en.wikipedia.org/wiki/Message_Passing_Interface
Open mpi (2014f) URL: http://www.open-mpi.org
Yang C-T, Cheng K-W, Li K-C (2005) An enhanced parallel loop self-scheduling scheme for cluster environments. J Supercomput 34:315–335
Yang C-T, Chang S-C (2004) A parallel loop self-scheduling on extremely heterogeneous pc clusters. J Inf Sci Eng 20:263–273
Yang C-T, Cheng K-W, Shih W-C (2007) On development of an efficient parallel loop self-scheduling for grid computing environments. Parallel Comput 33:467–487
Han Y, Chronopoulos AT Scalable loop self-scheduling schemes implemented on large-scale clusters. In: IEEE International Symposium on Parallel and Distributed Processing, pp 1735–1742
Sukhija N, Banicescu I, Ciorba FM (2015) Investigating the resilience of dynamic loop scheduling in heterogeneous computing systems. In: 14th International Symposium on Parallel and Distributed Computing, pp 194–203
Carino RL, Banicescu I Dynamic scheduling parallel loops with variable iterate execution times. In: Proceedings 16th International Parallel and Distributed Processing Symposium, p 8
Riakiotakis I, Papakonstantinou G, Chronopoulos AT (2008) Implementation of dynamic loop scheduling in reconfigurable platforms. In: 2008 International Symposium on Industrial Embedded Systems, pp 11–18
Acknowledgements
This work was sponsored by the Ministry of Science and Technology, Taiwan ROC, under Grants Numbers MOST 104-2221-E-029-010-MY3, MOST 105-2634-E-029-001 and MOST 105-2622-E-029-003-CC3.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, CT., Huang, CW. & Chen, ST. Improvement of workload balancing using parallel loop self-scheduling on Intel Xeon Phi. J Supercomput 73, 4981–5005 (2017). https://doi.org/10.1007/s11227-017-2068-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-017-2068-9