Abstract
Owing to the rising awareness of environment protection, high performance is not the only aim in system design, energy efficiency has increasingly become an important goal. In accordance with this goal, heterogeneous systems which are more efficient than CPU-based homogeneous systems, and occupying a growing proportion in the Top500 and the Green500 lists. Nevertheless, heterogeneous system design being more complex presents greater challenges in achieving a good tradeoff between performance and energy efficiency for applications running on such systems. To address the performance energy tradeoff issue in CPU-GPU heterogeneous systems, we propose a novel feedback control optimization (FCO) method that alternates between frequency scaling of device and division of kernel workload between CPU and GPU. Given a kernel and a workload division, frequency scaling involves finding near-optimal core frequency of the CPU and of the GPU. Further, an iterative algorithm is proposed for finding a near-optimal workload division that balance workload between CPU and GPU at a frequency that was optimal for the previous workload division. The frequency scaling phase and workload division phase are alternatively performed until the proposed FCO method converges and finds a configuration including core frequency for CPU, core frequency for GPU, and the workload division. Experiments show that compared with the state-of-the-art GreenGPU method, performance can be improved by 7.9%, while energy consumption can be reduced by 4.16%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Top. 500 Supercomputer Sites (2013). http://www.top500.org
The Green500 (2013). http://www.green500.org
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S., Skadron, K.: Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), pp. 44–54. IEEE Press, October 2009
NVIDIA CUDA Toolkit 5.5. https://developer.nvidia.com/cuda-toolkit-55-archive
Matthew, S., Henry, D., Karthikeyan, S.: Porting CMP benchmarks to GPUs. Department of Computer Sciences, The University of Wisconsin-Madison, Technical report (2011)
Li, J., Martinez, J.F., Huang, M.C.: The thrifty barrier: energy-aware synchronization in shared-memory multiprocessors. In: Proceedings of the 10th International Symposium on High Performance Computer Architecture (HPCA), p. 14. IEEE Computer Society, February 2004
Lim, M., Freeh, V.W., Lowenthal, D.K.: Adaptive, transparent frequency and voltage scaling of communication phases in MPI programs. In: Proceedings of the ACM/IEEE Conference on Supercomputing (SC). IEEE Press, November 2006
Hong, S., Kim, H.: An integrated GPU power and performance model. In: Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA), pp. 280–289. ACM Press, June 2010
Song, S., Su, C., Rountree, B., Cameron, K.W.: A simplified and accurate model of power-performance efficiency on emergent GPU architectures. In: Proceedings of the IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS), pp. 673–686. IEEE Computer Society, May 2013
Luk, C., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 45–55. IEEE Press, December 2009
Diamos, G.F., Yalamanchili, S.: Harmony: an execution model and runtime for heterogeneous many core systems. In: Proceedings of the 17th International Symposium on High Performance Distributed Computing (HPDC), pp. 197–200. ACM Press, June 2008
Ravi, V.T., Ma, W., Chiu, D., Agrawal, G.: Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. In: Proceedings of the 24th ACM International Conference on Supercomputing (ICS), pp. 137–146. ACM Press, June 2010
Grewe, D., ÓBoyle, M.F.P.: A static task partitioning approach for heterogeneous systems using OpenCL. In: Knoop, J. (ed.) CC 2011. LNCS, vol. 6601, pp. 286–305. Springer, Heidelberg (2011). doi:10.1007/978-3-642-19861-8_16
Weaver, V.M., Johnson, M., Kasichayanula, K., Ralph, J., Luszczek, P., Terpstra, D., Moore, S.: Measuring energy and power with PAPI. In: Proceedings of the 41st International Conference on Parallel Processing Workshops (ICPPW), pp. 262–268. IEEE Press, September 2012
Rafique, M.M., Butt, A.R., Nikolopoulos, D.S.: A capabilities-aware framework for using computational accelerators in data-intensive computing. J. Parallel Distrib. Comput. 71(2), 185–197 (2011)
Ma, K., Li, X., Chen, W., Zhang, X., Wang, X.: GreenGPU: a holistic approach to energy efficiency in GPU-CPU heterogeneous architectures. In: Proceedings of the 41st International Conference on Parallel Processing (ICPP), pp. 48–57. IEEE Press, September 2012
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Lin, FS., Liu, PT., Li, MH., Hsiung, PA. (2016). Feedback Control Optimization for Performance and Energy Efficiency on CPU-GPU Heterogeneous Systems. In: Carretero, J., Garcia-Blas, J., Ko, R., Mueller, P., Nakano, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2016. Lecture Notes in Computer Science(), vol 10048. Springer, Cham. https://doi.org/10.1007/978-3-319-49583-5_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-49583-5_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49582-8
Online ISBN: 978-3-319-49583-5
eBook Packages: Computer ScienceComputer Science (R0)