Skip to main content
Log in

Improvement of workload balancing using parallel loop self-scheduling on Intel Xeon Phi

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In recent years, Intel promotes its new product Xeon Phi coprocessor, which is similar to the x86 architecture coprocessor. It has about 60 cores and can be regarded as a single computing node, with the computing power that cannot be ignored. This work aims to improve the workload balance by parallel loop self-scheduling scheme performed on Xeon Phi-based computer cluster. The proposed concept is implemented by hybrid MPI and OpenMP parallel programming in C language. Since parallel loop self-scheduling composes of static and dynamic allocation, weighting algorithm is adopted in the static part, while the well-known loop self-scheduling is adopted in dynamic part. The loop block is partitioned according to the weighting of MIC and HOST nodes. Accordingly, Xeon Phi with many-core is adopted to implement parallel loop self-scheduling. Finally, we test the performance in the experiments by four applicable problems: matrix multiplication, sparse matrix multiplication, Mandelbrot set and circuit meet. The experimental results indicate how to do the weight allocation and which scheduling method can achieve the best performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Heinecke A (2013) Accelerators in scientific computing is it worth the effort? In: 2013 International Conference on High Performance Computing and Simulation (HPCS), p 504

  2. Rosales C (2013) Porting to the intel xeon phi: opportunities and challenges. In: Extreme Scaling Workshop, pp 1–7

  3. Hwu W mei (2014) What is ahead for parallel computing. J Parallel Distrib Comput 74:2574–2581

    Article  Google Scholar 

  4. Andrew M, Justin R, Alan G, Herman L (2014) A multi-tiered optimization framework for heterogeneous computing. In: High Performance Extreme Computing Conference (HPEC), IEEE, pp 1–6

  5. Yang C-T, Shih W-C, Tseng S-S (2007) Dynamic partitioning of loop iterations on heterogeneous pc clusters. J Supercomput 44:1–23

    Article  Google Scholar 

  6. Yang C-T, Shih W-C, Cheng L-H (2012) Performance-based dynamic loop scheduling in heterogeneous computing environments. J Supercomput 59:414. doi:10.1007/s11227-010-0443-x

  7. Wu C-C, Yang C-T, Lai K-C, Chiu P-H (2012) Designing parallel loop self-scheduling schemes using the hybrid mpi and openmp programming model for multi-core grid systems. J Supercomput 59:42–60

    Article  Google Scholar 

  8. Shih W-C, Yang C-T, Tseng S-S (2007) A performance-based parallel loop scheduling on grid environments. J Supercomput 41:247–267

    Article  Google Scholar 

  9. Yang C-T, Shih W-C, Cheng L-H (2012) Performance-based dynamic loop scheduling in heterogeneous computing environments. J Supercomput 59:414–442

    Article  Google Scholar 

  10. Ca B, Gb L (2002) Load balancing for heterogeneous clusters of pcs. Future Gener Comput Syst 18:389–400

    Article  Google Scholar 

  11. Yagoubi B, Slimani Y (2007) Load balancing strategy in grid environment. J Inf Technol Appl 1:285–296

    Google Scholar 

  12. Abdelkader DM, Omara F (2012) Dynamic task scheduling algorithm with load balancing for heterogeneous computing system. Egypt Inf J 13:135–145

    Article  Google Scholar 

  13. Yang C-T, Wu C-C, Chang J-H (2011) Performance-based parallel loop self-scheduling using hybrid openmp and mpi programming on multicore smp clusters. Concurr Comput Pract Exp 23:721–744

    Article  Google Scholar 

  14. Huang CW, Kuo CF, Yang CT, Liu JC, Chen ST (2015) Improvement of workload balancing using parallel loop self-scheduling on xeon phi. In: Seventh International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), pp 80–86

  15. Intel xeon phi (2014a) URL: http://www.intel.com.tw/content/www/tw/zh/processors/xeon/xeon-phi-detail.html

  16. Intel math kernel library-linpack (2014b) URL: https://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download

  17. Intel ark (2014c) URL: http://ark.intel.com

  18. Openmp wiki (2014d) URL: http://en.wikipedia.org/wiki/OpenMP

  19. Mpi wiki (2014e) URL: http://en.wikipedia.org/wiki/Message_Passing_Interface

  20. Open mpi (2014f) URL: http://www.open-mpi.org

  21. Yang C-T, Cheng K-W, Li K-C (2005) An enhanced parallel loop self-scheduling scheme for cluster environments. J Supercomput 34:315–335

    Article  Google Scholar 

  22. Yang C-T, Chang S-C (2004) A parallel loop self-scheduling on extremely heterogeneous pc clusters. J Inf Sci Eng 20:263–273

    Google Scholar 

  23. Yang C-T, Cheng K-W, Shih W-C (2007) On development of an efficient parallel loop self-scheduling for grid computing environments. Parallel Comput 33:467–487

    Article  Google Scholar 

  24. Han Y, Chronopoulos AT Scalable loop self-scheduling schemes implemented on large-scale clusters. In: IEEE International Symposium on Parallel and Distributed Processing, pp 1735–1742

  25. Sukhija N, Banicescu I, Ciorba FM (2015) Investigating the resilience of dynamic loop scheduling in heterogeneous computing systems. In: 14th International Symposium on Parallel and Distributed Computing, pp 194–203

  26. Carino RL, Banicescu I Dynamic scheduling parallel loops with variable iterate execution times. In: Proceedings 16th International Parallel and Distributed Processing Symposium, p 8

  27. Riakiotakis I, Papakonstantinou G, Chronopoulos AT (2008) Implementation of dynamic loop scheduling in reconfigurable platforms. In: 2008 International Symposium on Industrial Embedded Systems, pp 11–18

Download references

Acknowledgements

This work was sponsored by the Ministry of Science and Technology, Taiwan ROC, under Grants Numbers MOST 104-2221-E-029-010-MY3, MOST 105-2634-E-029-001 and MOST 105-2622-E-029-003-CC3.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao-Tung Yang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, CT., Huang, CW. & Chen, ST. Improvement of workload balancing using parallel loop self-scheduling on Intel Xeon Phi. J Supercomput 73, 4981–5005 (2017). https://doi.org/10.1007/s11227-017-2068-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-2068-9

Keywords

Navigation