Abstract
Computing systems should be designed to exploit parallelism in order to improve performance. In general, a GPU (Graphics Processing Unit) can provide more parallelism than a CPU (Central Processing Unit), resulting in the wide usage of heterogeneous computing systems that utilize both the CPU and the GPU together. In the heterogeneous computing systems, the efficiency of the scheduling scheme, which selects the device to execute the application between the CPU and the GPU, is one of the most critical factors in determining the performance. This paper proposes a dynamic scheduling scheme for the selection of the device between the CPU and the GPU to execute the application based on the estimated-execution-time information. The proposed scheduling scheme enables the selection between the CPU and the GPU to minimize the completion time, resulting in a better system performance, even though it requires the training period to collect the execution history. According to our simulations, the proposed estimated-execution-time scheduling can improve the utilization of the CPU and the GPU compared to existing scheduling schemes, resulting in reduced execution time and enhanced energy efficiency of heterogeneous computing systems.
data:image/s3,"s3://crabby-images/cd10c/cd10ca98828539a0f7d0220de77ba43b6f46d880" alt=""
data:image/s3,"s3://crabby-images/d2c9f/d2c9f8a68db836a0342f4d4ceadb899e898ba996" alt=""
data:image/s3,"s3://crabby-images/25199/25199fdacdffd2fbf5de2ddc37ad7974d285e396" alt=""
data:image/s3,"s3://crabby-images/86c2f/86c2f4409f3eaef756229a43aaab3b315c1fb32b" alt=""
data:image/s3,"s3://crabby-images/16eec/16eecc43592b42afd30426ae05358bc6a2c7ebd1" alt=""
data:image/s3,"s3://crabby-images/bce33/bce3307f2ae32c941173dd305489742d05c9d977" alt=""
data:image/s3,"s3://crabby-images/044ca/044cab7fc73977d1a7f51ace9a21693bf9776c74" alt=""
data:image/s3,"s3://crabby-images/044a5/044a56fbe237997f486c5db8a5a9849effadf764" alt=""
data:image/s3,"s3://crabby-images/ae6bf/ae6bf4eba4362c1b713b1c6cf8d5f97fae75af44" alt=""
data:image/s3,"s3://crabby-images/77598/775982b99d90a018f547663c6708be0001130003" alt=""
data:image/s3,"s3://crabby-images/9e75d/9e75d5e2c9cb127312e5bf9d42b59136d1f46637" alt=""
data:image/s3,"s3://crabby-images/ed1a9/ed1a949c45ce599f86c412b84cfdbb6067f5c63f" alt=""
data:image/s3,"s3://crabby-images/4cca4/4cca40c26061dd04ba11c203f4f8656cb2fb3c10" alt=""
data:image/s3,"s3://crabby-images/68806/6880649e463dec555785fe1efcb72396dd9ced19" alt=""
data:image/s3,"s3://crabby-images/c0afd/c0afd61167a470964f62d765cf33875f6424e5f7" alt=""
data:image/s3,"s3://crabby-images/20ade/20ade1fce4d3777f7def74d953f1c8711ca4c690" alt=""
Similar content being viewed by others
References
Agarwal V, Hrishikesh MS, Keckler SW, Burger D (2000) Clock rate versus IPC: the end of the road for conventional microArchitectures. In: Proceedings of 27th international symposium on computer architecture, pp 248–259
Eberly DH (2001) 3D game engine design. Morgan Kaufmann, San Francisco
Buck I, Foley T, Horn D, Sugerman J, Fatahalian K, Houston M, Hanrahan P (2004) Brook for GPUs: stream computing on graphics hardware. In: Proceedings of 31th annual conference on computer graphics (SIGGRAPH), pp 777–786
Owens JD, Luebke D, Govindaraju N, Harris M, Kruger J, Lefohn AE, Purcell TJ (2005) A survey of general-purpose computation on graphics hardware. In: Euro-graphics 2005, state of the art reports, pp 21–51
GPGPU. Available at http://www.gpgpu.org
NVIDIA CUDA Programming. Available at http://www.nvidia.com/object/cuda_home_new.html
Che S, Meng J, Sheaer J, Skadron K (2008) A performance study of general purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380
Ryoo S, Rodrigues CI, Baghsorkhi SS, Stone SS, Kirk DB, Hwu WW (2008) Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: Proceedings of the symposium on principles and practice of parallel programming, pp 73–82
Akenine-Moller T, Haines E (2002) Real-time rendering, 2nd edn. AK Peters, Natick
Gregg C, Brantley JS, Hazelwood K (2010) Contention-aware scheduling of parallel code for heterogeneous systems. In: Proceedings of the 2nd USENIX workshop on hot topics in parallelism, 6 pages
Gregg C, Boyer M, Hazelwood K, Skadron K (2011) Dynamic heterogeneous scheduling decisions using historical runtime data. In: Proceedings of the 2nd workshop on applications for multi- and many-core processors, 12 pages
Jimenez V, Vilanova L, Gelado I, Gil M, Fursin G, Navarro N (2009) Predictive runtime code scheduling for heterogeneous architectures. In: Proceedings of the 4th international conference on high performance embedded architectures and compilers, pp 19–33
Parboil benchmark suite. Available at http://www.crhc.uiuc.edu/impact/parboil.php
YuHai Y, Shengsheng Y, XueLian B (2007) A new dynamic scheduling algorithm for real-time heterogeneous multiprocessor systems. In: Proceedings of the workshop on intelligent information technology application, pp 112–115
Acknowledgements
This work was supported by the National Research Foundation of Korea Grant funded by the Korean Government (NRF-2011-013-D00105, 2012R1A1B4003492) and the ITRC (Information Technology Research Center) support program supervised by the NIPA (NIPA-2012-H0301-12-3005).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Choi, H.J., Son, D.O., Kang, S.G. et al. An efficient scheduling scheme using estimated execution time for heterogeneous computing systems. J Supercomput 65, 886–902 (2013). https://doi.org/10.1007/s11227-013-0870-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-013-0870-6