E-OSched: a load balancing scheduler for heterogeneous multicores

Khalid, Yasir Noman; Aleem, Muhammad; Prodan, Radu; Iqbal, Muhammad Azhar; Islam, Muhammad Arshad

doi:10.1007/s11227-018-2435-1

E-OSched: a load balancing scheduler for heterogeneous multicores

Published: 23 May 2018

Volume 74, pages 5399–5431, (2018)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Yasir Noman Khalid¹,
Muhammad Aleem ORCID: orcid.org/0000-0001-8342-5757¹,
Radu Prodan²,
Muhammad Azhar Iqbal¹ &
…
Muhammad Arshad Islam¹

653 Accesses
23 Citations
Explore all metrics

Abstract

The contemporary multicore era has adhered to the heterogeneous computing devices as one of the proficient platforms to execute compute-intensive applications. These heterogeneous devices are based on CPUs and GPUs. OpenCL is deemed as one of the industry standards to program heterogeneous machines. The conventional application scheduling mechanisms allocate most of the applications to GPUs while leaving CPU device underutilized. This underutilization of slower devices (such as CPU) often originates the sub-optimal performance of data-parallel applications in terms of load balance, execution time, and throughput. Moreover, multiple scheduled applications on a heterogeneous system further aggravate the problem of performance inefficiency. This paper is an attempt to evade the aforementioned deficiencies via initiating a novel scheduling strategy named OSched. An enhancement to the OSched named E-OSched is also part of this study. The OSched performs the resource-aware assignment of jobs to both CPUs and GPUs while ensuring a balanced load. The load balancing is achieved via contemplation on computational requirements of jobs and computing potential of a device. The load-balanced execution is beneficiary in terms of lower execution time, higher throughput, and improved utilization. The E-OSched reduces the magnitude of the main memory contention during concurrent job execution phase. The mathematical model of the proposed algorithms is evaluated by comparison of simulation results with different state-of-the-art scheduling heuristics. The results revealed that the proposed E-OSched has performed significantly well than the state-of-the-art scheduling heuristics by obtaining up to 8.09% improved execution time and up to 7.07% better throughput.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of Kubernetes scheduling algorithms

Article Open access 13 June 2023

Containerization technologies: taxonomies, applications and challenges

Article 08 June 2021

Cloud benchmarking and performance analysis of an HPC application in Amazon EC2

Article Open access 28 June 2023

Notes

In this research job terminology is used to define an OpenCL application that consists of a host program and kernel functions.
FLOPS = Floating Point Operations Per Second.
https://www.top500.org/lists/2017/11/.

References

Albayrak OE, Akturk I, Ozturk O (2012) Effective kernel mapping for OpenCL applications in heterogeneous platforms. In: Proceedings of International Conference on Parallel Processing Work, pp 81–88. https://doi.org/10.1109/ICPPW.2012.14
Aleem M, Prodan R, Fahringer T (2011) Scheduling javasymphony applications on many-core parallel computers. In: Euro-Par 2011 Parallel Processing. Springer, pp 167–179
APP SDK [WWW Document], n.d. http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/. Accessed 1 May 2017
Augonnet C, Thibault S, Namyst R, Wacrenier P-A, Wacrenier StarPU P-A (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp 23:187–198
Article Google Scholar
Becchi M, Byna S, Cadambi S, Chakradhar S (2010) Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory. In: Proceedings of 22nd ACM Symposium Parallelism algorithms Architecture, pp 82–91. https://doi.org/10.1145/1810479.1810498
Belviranli ME, Bhuyan LN, Gupta R (2013) A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures. ACM Trans Archit Code Optim 9:1–20. https://doi.org/10.1145/2400682.2400716
Article Google Scholar
Binotto APD, Pereira CE, Kuijper A, Stork A, Fellner DW (2011) An effective dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms. In: 2011 IEEE 13th International Conference on High Performance Computing and Communications (HPCC). IEEE, pp 78–85
Boyer M, Skadron K, Che S, Jayasena N (2013) Load balancing in a changing world: dealing with heterogeneity and performance variability. In: Proceedings of the ACM International Conference on Computing Frontiers. ACM, p 21
Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee S-H, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: IISWC 2009. IEEE International Symposium on Workload Characterization, 2009. IEEE, pp 44–54
Chen Z, Marculescu D (2017) Task scheduling for heterogeneous multicore systems. arXiv Prepr. arXiv1712.03209
Choi HJ, Son DO, Kang SG, Kim JM, Lee H-H, Kim CH (2013) An efficient scheduling scheme using estimated execution time for heterogeneous computing systems. J. Supercomput 65:886–902. https://doi.org/10.1007/s11227-013-0870-6
Article Google Scholar
Dolbeau R (2018) Theoretical peak FLOPS per instruction set: a tutorial. J Supercomput 74:1341–1377. https://doi.org/10.1007/s11227-017-2177-5
Article Google Scholar
Ghose A, Dey S, Mitra P, Chaudhuri M (2016) Divergence aware automated partitioning of OpenCL workloads. In: Proceedings of the 9th India Software Engineering Conference. ACM, pp 131–135. https://doi.org/10.1145/2856636.2856639
Grauer-Gray S, Xu L, Searles R, Ayalasomayajula S, Cavazos J (2012) Auto-tuning a high-level language targeted to GPU codes. In: Innovative Parallel Computing (InPar). IEEE, pp 1–10
Gregg C, Boyer M, Hazelwood K, Skadron K (2011) Dynamic heterogeneous scheduling decisions using historical runtime data. In: Proceedings of the 2nd Workshop on Applications for Multi-and Many-Core Processors. San Jose, CA
Gregg C, Brantley JS, Hazelwood K (2010) Contention-aware scheduling of parallel code for heterogeneous systems. In: 2nd USENIX Workshop on Hot Topics Parallelism
Grewe D, O’Boyle MF (2011) A static task partitioning approach for heterogeneous systems using OpenCL. In: International Conference on Compiler Construction. Springer, pp 286–305
IMPACT Research Group and others (2007) IMPACT: parboil benchmarks [WWW Document]. http://impact.crhc.illinois.edu/parboil/parboil.aspx. Accessed 1 May 2017
Insieme Compiler Project [WWW Document], n.d. http://www.insieme-compiler.org/. Accessed 9 July 2017
Jiménez VJ, Vilanova L, Gelado I, Gil M, Fursin G, Navarro N (2009) Predictive runtime code scheduling for heterogeneous architectures. In: International Conference on High-Performance Embedded Architectures and Compilers. Springer Berlin Heidelberg, pp 19–33
Chapter Google Scholar
Kaleem R, Barik R, Shpeisman T, Lewis BT, Hu C, Pingali K (2014) Adaptive heterogeneous scheduling for integrated GPUs. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. ACM, pp 151–162
Kofler K, Grasso I, Cosenza B, Fahringer T (2013) An automatic input-sensitive approach for heterogeneous task partitioning categories and subject descriptors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing—ICS’13. pp 149–160. https://doi.org/10.1145/2464996.2465007
Lee J, Samadi M, Mahlke S (2015a) Orchestrating multiple data-parallel kernels on multiple devices. In: 2015 International Conference on Parallel Architecture and Compilation (PACT). IEEE, pp 355–366
Lee J, Samadi M, Park Y, Mahlke S (2015) Skmd: single kernel on multiple devices for transparent cpu-gpu collaboration. ACM Trans Comput Syst 33:1–27. https://doi.org/10.1145/2798725
Article Google Scholar
Lee J, Samadi M, Park Y, Mahlke S (2013) Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. IEEE Press, pp 245–256
Lösch A, Beisel T, Kenter T, Plessl C, Platzner M (2016) Performance-centric scheduling with task migration for a heterogeneous compute node in the data center. In: Proceedings of the 2016 Conference on Design, Automation and Test in Europe. EDA Consortium, pp 912–917
Luk C-K, Hong S, Kim H (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, pp 45–55
Munshi A (2009) The OpenCL specification. In: 2009 IEEE Hot Chips 21 Symposium (HCS). IEEE, pp 1–314. https://doi.org/10.1109/HOTCHIPS.2009.7478342
OpenCL—The open standard for parallel programming of heterogeneous systems [WWW Document], n.d. https://www.khronos.org/opencl/. Accessed 1 Mar 17
Owens JD, Houston M, Luebke D, Green S, Stone JE, Phillips JC (2008) GPU computing. Proc IEEE 96:879–899. https://doi.org/10.1109/JPROC.2008.917757
Article Google Scholar
Pandit P, Govindarajan R (2014) Fluidic kernels: Cooperative execution of opencl programs on multiple heterogeneous devices. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, p 273. https://doi.org/10.1145/2544137.2544163
Ravi VT, Agrawal G (2011) A dynamic scheduling framework for emerging heterogeneous systems. In: 18th International Conference on High Performance Computing, HiPC 2011. IEEE, pp 1–10. https://doi.org/10.1109/HiPC.2011.6152724
Rohr D, Kalcher S, Bach M, Alaqeeliy AA, Alzaidy HM, Eschweiler D, Lindenstruth V, Alkhereyfy SB, Alharthiy A, Almubaraky A, Alqwaizy I, Suliman RB (2014) An energy-efficient multi-GPU supercomputer. In: 2014 IEEE International Conference on High Performance Computing and Communications, 2014 IEEE 6th International Symposium on Cyberspace Safety and Security, 2014 IEEE 11th International Conference on Embedded Software and Systems (HPCC, CSS, ICESS). IEEE, Paris, pp 42–45. https://doi.org/10.1109/HPCC.2014.14
Rul S, Vandierendonck H, D’haene J, De Bosschere K (2010) An experimental study on performance portability of OpenCL kernels. Papers presented at the 2010 Symposium on Application Accelerators in High Performance Computing (SAAHPC ’10)
Samsung Galaxy S8+—Full phone specifications [WWW Document], n.d. http://www.gsmarena.com/samsung_galaxy_s8+-8523.php. Accessed 7 Oct 2017
Sun E, Schaa D, Bagley R, Rubin N, Kaeli D (2012) Enabling task-level scheduling on heterogeneous platforms *. In: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units. ACM, pp 84–93
Wang Z, Zheng L, Chen Q, Guo M (2013) CAP: co-scheduling based on asymptotic profiling in CPU + GPU hybrid systems. In: Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores—PMAM’13. ACM, pp 107–114. https://doi.org/10.1145/2442992.2443004
Wen Y, O’Boyle MF (2017) Merge or separate? Multi-job scheduling for OpenCL kernels on CPU/GPU platforms. In: Proceedings of the General Purpose GPUs. ACM, pp 22–31. https://doi.org/10.1145/3038228.3038235
Wen Y, Wang Z, O’boyle MFP (2014) Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms. In: 2014 21st International Conference on High Performance Computing (HiPC). IEEE, pp 1–10
Yan X, Shi X, Wang L, Yang H (2014) An OpenCL micro-benchmark suite for GPUs and CPUs. J Supercomput 69:693–713. https://doi.org/10.1007/s11227-014-1112-2
Article Google Scholar

Download references

Acknowledgements

The Austrian Promotion Agency (FFG) partially funded this work as part of the project 848448 “Tiroler Cloud”.

Author information

Authors and Affiliations

Capital University of Science and Technology, Islamabad, 44000, Pakistan
Yasir Noman Khalid, Muhammad Aleem, Muhammad Azhar Iqbal & Muhammad Arshad Islam
Alpen-Adria-Universität, 9020, Klagenfurt, Austria
Radu Prodan

Authors

Yasir Noman Khalid
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Aleem
View author publications
You can also search for this author in PubMed Google Scholar
Radu Prodan
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Azhar Iqbal
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Arshad Islam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Aleem.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khalid, Y.N., Aleem, M., Prodan, R. et al. E-OSched: a load balancing scheduler for heterogeneous multicores. J Supercomput 74, 5399–5431 (2018). https://doi.org/10.1007/s11227-018-2435-1

Download citation

Published: 23 May 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s11227-018-2435-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

E-OSched: a load balancing scheduler for heterogeneous multicores

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

Containerization technologies: taxonomies, applications and challenges

Cloud benchmarking and performance analysis of an HPC application in Amazon EC2

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

E-OSched: a load balancing scheduler for heterogeneous multicores

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

Containerization technologies: taxonomies, applications and challenges

Cloud benchmarking and performance analysis of an HPC application in Amazon EC2

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation