Improvement of workload balancing using parallel loop self-scheduling on Intel Xeon Phi

Yang, Chao-Tung; Huang, Chao-Wei; Chen, Shuo-Tsung

doi:10.1007/s11227-017-2068-9

Improvement of workload balancing using parallel loop self-scheduling on Intel Xeon Phi

Published: 15 May 2017

Volume 73, pages 4981–5005, (2017)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

263 Accesses
3 Citations
Explore all metrics

Abstract

In recent years, Intel promotes its new product Xeon Phi coprocessor, which is similar to the x86 architecture coprocessor. It has about 60 cores and can be regarded as a single computing node, with the computing power that cannot be ignored. This work aims to improve the workload balance by parallel loop self-scheduling scheme performed on Xeon Phi-based computer cluster. The proposed concept is implemented by hybrid MPI and OpenMP parallel programming in C language. Since parallel loop self-scheduling composes of static and dynamic allocation, weighting algorithm is adopted in the static part, while the well-known loop self-scheduling is adopted in dynamic part. The loop block is partitioned according to the weighting of MIC and HOST nodes. Accordingly, Xeon Phi with many-core is adopted to implement parallel loop self-scheduling. Finally, we test the performance in the experiments by four applicable problems: matrix multiplication, sparse matrix multiplication, Mandelbrot set and circuit meet. The experimental results indicate how to do the weight allocation and which scheduling method can achieve the best performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel Sparse Matrix Vector Multiplication on Intel MIC: Performance Analysis

NUMA-Aware Optimization of Sparse Matrix-Vector Multiplication on ARMv8-Based Many-Core Architectures

Asynchronous Parallel Dijkstra’s Algorithm on Intel Xeon Phi Processor

References

Heinecke A (2013) Accelerators in scientific computing is it worth the effort? In: 2013 International Conference on High Performance Computing and Simulation (HPCS), p 504
Rosales C (2013) Porting to the intel xeon phi: opportunities and challenges. In: Extreme Scaling Workshop, pp 1–7
Hwu W mei (2014) What is ahead for parallel computing. J Parallel Distrib Comput 74:2574–2581
Article Google Scholar
Andrew M, Justin R, Alan G, Herman L (2014) A multi-tiered optimization framework for heterogeneous computing. In: High Performance Extreme Computing Conference (HPEC), IEEE, pp 1–6
Yang C-T, Shih W-C, Tseng S-S (2007) Dynamic partitioning of loop iterations on heterogeneous pc clusters. J Supercomput 44:1–23
Article Google Scholar
Yang C-T, Shih W-C, Cheng L-H (2012) Performance-based dynamic loop scheduling in heterogeneous computing environments. J Supercomput 59:414. doi:10.1007/s11227-010-0443-x
Wu C-C, Yang C-T, Lai K-C, Chiu P-H (2012) Designing parallel loop self-scheduling schemes using the hybrid mpi and openmp programming model for multi-core grid systems. J Supercomput 59:42–60
Article Google Scholar
Shih W-C, Yang C-T, Tseng S-S (2007) A performance-based parallel loop scheduling on grid environments. J Supercomput 41:247–267
Article Google Scholar
Yang C-T, Shih W-C, Cheng L-H (2012) Performance-based dynamic loop scheduling in heterogeneous computing environments. J Supercomput 59:414–442
Article Google Scholar
Ca B, Gb L (2002) Load balancing for heterogeneous clusters of pcs. Future Gener Comput Syst 18:389–400
Article Google Scholar
Yagoubi B, Slimani Y (2007) Load balancing strategy in grid environment. J Inf Technol Appl 1:285–296
Google Scholar
Abdelkader DM, Omara F (2012) Dynamic task scheduling algorithm with load balancing for heterogeneous computing system. Egypt Inf J 13:135–145
Article Google Scholar
Yang C-T, Wu C-C, Chang J-H (2011) Performance-based parallel loop self-scheduling using hybrid openmp and mpi programming on multicore smp clusters. Concurr Comput Pract Exp 23:721–744
Article Google Scholar
Huang CW, Kuo CF, Yang CT, Liu JC, Chen ST (2015) Improvement of workload balancing using parallel loop self-scheduling on xeon phi. In: Seventh International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), pp 80–86
Intel xeon phi (2014a) URL: http://www.intel.com.tw/content/www/tw/zh/processors/xeon/xeon-phi-detail.html
Intel math kernel library-linpack (2014b) URL: https://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download
Intel ark (2014c) URL: http://ark.intel.com
Openmp wiki (2014d) URL: http://en.wikipedia.org/wiki/OpenMP
Mpi wiki (2014e) URL: http://en.wikipedia.org/wiki/Message_Passing_Interface
Open mpi (2014f) URL: http://www.open-mpi.org
Yang C-T, Cheng K-W, Li K-C (2005) An enhanced parallel loop self-scheduling scheme for cluster environments. J Supercomput 34:315–335
Article Google Scholar
Yang C-T, Chang S-C (2004) A parallel loop self-scheduling on extremely heterogeneous pc clusters. J Inf Sci Eng 20:263–273
Google Scholar
Yang C-T, Cheng K-W, Shih W-C (2007) On development of an efficient parallel loop self-scheduling for grid computing environments. Parallel Comput 33:467–487
Article Google Scholar
Han Y, Chronopoulos AT Scalable loop self-scheduling schemes implemented on large-scale clusters. In: IEEE International Symposium on Parallel and Distributed Processing, pp 1735–1742
Sukhija N, Banicescu I, Ciorba FM (2015) Investigating the resilience of dynamic loop scheduling in heterogeneous computing systems. In: 14th International Symposium on Parallel and Distributed Computing, pp 194–203
Carino RL, Banicescu I Dynamic scheduling parallel loops with variable iterate execution times. In: Proceedings 16th International Parallel and Distributed Processing Symposium, p 8
Riakiotakis I, Papakonstantinou G, Chronopoulos AT (2008) Implementation of dynamic loop scheduling in reconfigurable platforms. In: 2008 International Symposium on Industrial Embedded Systems, pp 11–18

Download references

Acknowledgements

This work was sponsored by the Ministry of Science and Technology, Taiwan ROC, under Grants Numbers MOST 104-2221-E-029-010-MY3, MOST 105-2634-E-029-001 and MOST 105-2622-E-029-003-CC3.

Author information

Authors and Affiliations

Department of Computer Science, Tunghai University, Taichung City, 40704, Taiwan, ROC
Chao-Tung Yang, Chao-Wei Huang & Shuo-Tsung Chen

Authors

Chao-Tung Yang
View author publications
You can also search for this author inPubMed Google Scholar
Chao-Wei Huang
View author publications
You can also search for this author inPubMed Google Scholar
Shuo-Tsung Chen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Chao-Tung Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, CT., Huang, CW. & Chen, ST. Improvement of workload balancing using parallel loop self-scheduling on Intel Xeon Phi. J Supercomput 73, 4981–5005 (2017). https://doi.org/10.1007/s11227-017-2068-9

Download citation

Published: 15 May 2017
Issue Date: November 2017
DOI: https://doi.org/10.1007/s11227-017-2068-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improvement of workload balancing using parallel loop self-scheduling on Intel Xeon Phi

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Parallel Sparse Matrix Vector Multiplication on Intel MIC: Performance Analysis

NUMA-Aware Optimization of Sparse Matrix-Vector Multiplication on ARMv8-Based Many-Core Architectures

Asynchronous Parallel Dijkstra’s Algorithm on Intel Xeon Phi Processor

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now