Skip to main content

Advertisement

Log in

Scheduling for heterogeneous systems in accelerator-rich environments

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The world is creating ever more data and the applications are required to deal with ever-increasing datasets. To process such datasets heterogeneous and manycore accelerators are being deployed in various computing systems to improve energy efficiency. In this work, we present a runtime management system designed for such heterogeneous systems with manycore accelerators. More specifically, we design a resource-based runtime management system that considers application characteristics and respective execution properties on the nodes and accelerators. We propose scheduling heuristics and run time environment solutions to achieve better throughput and reduced energy in computing systems with different accelerators. We give implementation details about our framework; show different scheduling algorithms, and present experimental evaluation of our system. We also compare our approaches with an optimal scheme where integer linear programming approach has been implemented for mapping applications on the heterogeneous system. While it is possible to extend the proposed framework to a wide variety of accelerators, our initial focus is on Graphics Processing Units (GPUs). Our experimental evaluations show that including accelerator support in the management framework improves energy consumption and execution time significantly. We believe that this approach has the potential to provide an effective solution for next generation accelerator-based computing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Alma and the European Alma Regional Centre. http://www.eso.org/sci/facilities/alma.html. Accessed 07 May 2021

  2. Amd app sdk. http://developer.amd.com/tools-and-sdks/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/. Accessed 07 May 2021

  3. Amd Radeon HD 6990. http://www.amd.com/us/products/desktop/graphics/amd-radeon-hd-6000/hd-6990/Pages/amd-radeon-hd-6990-overview.aspx. Accessed 07 May 2021

  4. Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I, Zaharia M (2010) A view of cloud computing. Commun. ACM 53(4):50–58. https://doi.org/10.1145/1721654.1721672

    Article  Google Scholar 

  5. Bao Z, Chen C, Zhang W (2018) Task scheduling of data-parallel applications on HSA platform. In: Zhou Q, Gan Y, Jing W, Song X, Wang Y, Lu Z (eds) Data Science. Springer, Singapore, pp 452–461

    Chapter  Google Scholar 

  6. Barik R, Farooqui N, Lewis BT, Hu C, Shpeisman T (2016) A black-box approach to energy-aware scheduling on integrated cpu-gpu systems. In: 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp 70–81

  7. Bellavista P, Corradi A, Reale A, Ticca N (2014) Priority-based resource scheduling in distributed stream processing systems for big data applications. In: 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, pp 363–370. https://doi.org/10.1109/UCC.2014.46

  8. Chen YR, Lee CR (2016) G-storm: a gpu-aware storm scheduler. In: 2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), pp 738–745. https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2016.130

  9. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun. of ACM 51(1):107–113

    Article  Google Scholar 

  10. Dev K, Zhan X, Reda S (2016) Power-aware characterization and mapping of workloads on cpu-gpu processors. In: 2016 IEEE International Symposium on Workload Characterization (IISWC), pp 1–2. https://doi.org/10.1109/IISWC.2016.7581285

  11. Doka K, Papailiou N, Tsoumakos D, Mantas C, Koziris N (2015) Ires: intelligent, multi-engine resource scheduler for big data analytics workflows. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD ’15. ACM New York, pp 1451–1456. https://doi.org/10.1145/2723372.2735377

  12. Du P, Sun Z, Zhang H, Ma H (2019) Feature-aware task scheduling on CPU-FPGA heterogeneous platforms. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp 534–541

  13. Fang J, Zhang J, Lu S, Zhao H (2020) Exploration on task scheduling strategy for cpu-gpu heterogeneous computing system. In: 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp 306–311

  14. Feitelson DG, Rudolph L, Schwiegelshohn U (2005) Parallel job scheduling—a status report. In: Proceedings of the 10th International Conference on Job Scheduling Strategies for Parallel Processing, JSSPP’04. Springer, Berlin, Heidelberg, pp 1–16. https://doi.org/10.1007/11407522_1

  15. Goswami A, Young J, Schwan K, Farooqu N, Gavrilovska A, Wolf M, Eisenhauer G (2016) Gpushare: fair-sharing middleware for gpu clouds. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp 1769–1776. https://doi.org/10.1109/IPDPSW.2016.94

  16. GPU based cloud computing. http://www.ogf.org/OGF28/materials/1914/OpenGridForum28.pdf. Accessed 07 May 2021

  17. GPU based cloud computing—open grid forum. http://www.ogf.org/OGF28/materials/1914/OpenGridForum28.pdf. Accessed 07 May 2021

  18. Graphics processing unit. http://en.wikipedia.org/wiki/Graphics_processing_unit. Accessed 07 May 2021

  19. Graham RL, Shipman GM, Barrett BW, Castain RH, Bosilca G, Lumsdaine A (2006) Open MPI: a high-performance, heterogeneous MPI. In: 2006 IEEE International Conference on Cluster Computing, pp 1–9. https://doi.org/10.1109/CLUSTR.2006.311904

  20. IBM, Zikopoulos P, Eaton C (2011) Understanding big data: analytics for enterprise class hadoop and streaming data, 1st edn. McGraw-Hill Osborne Media

  21. Intel many integrated core architecture. http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html. Accessed 07 May 2021

  22. Lee G, Chun BG, Katz H (2011) Heterogeneity-aware resource allocation and scheduling in the cloud. In: Proceedings of the 3rd USENIX Conference on Hot Topics in Cloud Computing, HotCloud’11. USENIX Association, Berkeley, p 4. http://dl.acm.org/citation.cfm?id=2170444.2170448

  23. Liu X, Liu P, Hu L, Zou C, Cheng Z (2020) Energy-aware task scheduling with time constraint for heterogeneous cloud datacenters. Concurr Comput: Pract Exp 32(18):e5437. https://doi.org/10.1002/cpe.5437 E5437 cpe.5437

    Article  Google Scholar 

  24. Ma S, Jiang J, Li B, Li B (2016) Custody: towards data-aware resource sharing in cloud-based big data processing. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp 451–460. https://doi.org/10.1109/CLUSTER.2016.59

  25. Mei X, Chu X, Liu H, Leung YW, Li Z (2017) Energy efficient real-time task scheduling on cpu-gpu hybrid clusters. In: IEEE INFOCOM 2017—IEEE Conference on Computer Communications, pp 1–9. https://doi.org/10.1109/INFOCOM.2017.8057205

  26. Memeti S, Li L, Pllana S, Kolodziej J, Kessler C (2017) Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption. In: Proceedings of the 2017 workshop on adaptive resource management and scheduling for cloud computing (ARMS-CC '17). Association for Computing Machinery, New York, NY, USA, pp 1–6. https://doi.org/10.1145/3110355.3110356

    Chapter  Google Scholar 

  27. Mohammadi R, Shekofteh SK, Naghibzadeh M, Noori H (2016) A dynamic special-purpose scheduler for concurrent kernels on gpu. In: 2016 6th International Conference on Computer and Knowledge Engineering (ICCKE), pp 218–222. https://doi.org/10.1109/ICCKE.2016.7802143

  28. Nas benchmark. http://www.nas.nasa.gov/publications/npb.html. Accessed 03 May 2021

  29. Nvidia’s next generation cuda compute architecture: Fermi. http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf. Accessed 07 May 2021

  30. Pinto VG, Stanisic L, Legrand A, Schnorr LM, Thibault S, Danjean V (2016) Analyzing dynamic task-based applications on hybrid platforms: an agile scripting approach. In: 2016 3rd Workshop on Visual Performance Analysis (VPA), pp 17–24. https://doi.org/10.1109/VPA.2016.008

  31. Open mpi. https://www.open-mpi.org/faq/?category=running. Accessed 07 May 2021

  32. Opencl project. http://www.khronos.org/opencl/. Accessed Accessed 03 May 2021

  33. Openmp. http://openmp.org/. Accessed 07 May 2021

  34. Ravi V, Becchi M, Agrawal G, Chakradhar S (2012) Valuepack: value-based scheduling framework for CPU-GPU clusters. In: High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for, pp. 1–12. https://doi.org/10.1109/SC.2012.111

  35. Ravi VT, Becchi M, Jiang W, Agrawal G, Chakradhar S (2012) Scheduling concurrent applications on a cluster of CPU-GPU nodes. In: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (Ccgrid 2012), CCGRID ’12. IEEE Computer Society, Washington, pp 140–147. https://doi.org/10.1109/CCGrid.2012.78

  36. Sabin G, Kettimuthu R, Rajan A (2003) Scheduling of parallel jobs in a heterogeneous multi-site environment. In: The Proc. of the 9th International Workshop on Job Scheduling Strategies for Parallel Processing, Lecture Notes in Computer Science, pp 87–104

  37. Sabin G, Sahasrabudhe V, Sadayappan P (2005) Assessment and enhancement of meta-schedulers for multi-site job sharing. In: High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium on, pp 144–153. https://doi.org/10.1109/HPDC.2005.1520949

  38. Scogland TR, Rountree B, Feng Wc, de Supinski BR (2012) Heterogeneous task scheduling for accelerated openmp. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp 144–155. https://doi.org/10.1109/IPDPS.2012.23

  39. Shirahata K, Sato H, Matsuoka S (2010) Hybrid map task scheduling for GPU-based heterogeneous clusters. In: Cloud Computing Technology and Science (CloudCom), 2010 IEEE 2nd International Conference on, pp 733–740. https://doi.org/10.1109/CloudCom.2010.55

  40. Shulga DA, Kapustin AA, Kozlov AA, Kozyrev AA, Rovnyagin MM (2016) The scheduling based on machine learning for heterogeneous CPU/GPU systems. In: 2016 IEEE NW Russia Young Researchers in Electrical and Electronic Engineering Conference (EIConRusNW), pp 345–348. https://doi.org/10.1109/EIConRusNW.2016.7448189

  41. The nist definition of cloud computing. http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf. Accessed 07 May 2021

  42. Tegra mobile processors. http://www.nvidia.com/object/tegra.html. Accessed 07 May 2021

  43. Torque. http://www.adaptivecomputing.com/products/open-source/torque/. Accessed 07 May 2021

  44. Ukidave Y, Li X, Kaeli D (2016) Mystic: predictive scheduling for gpu based cloud servers using machine learning. In: 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp 353–362. https://doi.org/10.1109/IPDPS.2016.73

  45. Wan L, Zheng W, Yuan X (2021) Efficient inter-device task scheduling schemes for multi-device co-processing of data-parallel kernels on heterogeneous systems. IEEE Access 9:59968–59978

    Article  Google Scholar 

  46. Xpress mp. http://decisions.fico.com/aboutXpress.html. Accessed 07 May 2021

  47. Xpress-mp optimizer reference manual, fico®xpress optimization. https://www.fico.com/en/products/fico-xpress-optimization. Accessed 07 May 2021

  48. Young J, Shon SH, Yalamanchili S, Merritt A, Schwan K, Fröning H (2013) Oncilla: a gas runtime for efficient resource allocation and data movement in accelerated clusters. In: 2013 IEEE International Conference on Cluster Computing (CLUSTER), pp 1–8. https://doi.org/10.1109/CLUSTER.2013.6702679

  49. Zaharia M, Konwinski A, Joseph AD, Katz R, Stoica I (2008) Improving mapreduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08. USENIX Association, Berkeley, pp 29–42. http://dl.acm.org/citation.cfm?id=1855741.1855744

  50. Zhang K, Wu B (2012) Task scheduling for gpu heterogeneous cluster. In: 2012 IEEE International Conference on Cluster Computing Workshops, pp 161–169. https://doi.org/10.1109/ClusterW.2012.20

  51. Zhang L, Wu C, Li Z, Guo C, Chen M, Lau F (2013) Moving big data to the cloud: an online cost-minimizing approach. Sel Areas Commun, IEEE J 31(12):2710–2721. https://doi.org/10.1109/JSAC.2013.131211

    Article  Google Scholar 

Download references

Acknowledgement

This work has been supported in part by a grant from Turkish Academy of Sciences and a grant from Türk Telekom (Project Number: 3015-04).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ozcan Ozturk.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yesil, S., Ozturk, O. Scheduling for heterogeneous systems in accelerator-rich environments. J Supercomput 78, 200–221 (2022). https://doi.org/10.1007/s11227-021-03883-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-03883-5

Keywords

Navigation