Abstract
Currently, the energy efficiency of computational systems is of paramount relevance. In this work, an approach for improving energy efficiency is proposed in the context of the iterative computation on integrated GPU-CPU systems. The proposal, referred to as E-ADITHE, combines iterative procedures with: (1) a heuristic scheme for processing units selection according to the estimation of energy efficiency and (2) the load balancing on heterogeneous processors. There is a wide variety of iterative algorithms related to science and engineering which can take advantage of E-ADITHE. The Beltrami filter has been selected as a representative example of such procedures and its OpenCL version has been used to validate E-ADITHE. The analysis of the results shows that E-ADITHE improves automatically the energy efficiency of parallel iterative algorithm on modern heterogeneous processors.


Similar content being viewed by others
References
AMD (2015) AMD compute cores. A new era of computing. AMD enables CPU and GPU cores to work together on a single APU chip. http://www.amd.com/en-us/innovations/software-technologies/processors-for-business/compute-cores
Chen X, Xu C, Dick RP, Mao ZM (2010) Performance and power modeling in a multi-programmed multi-core environment. In: Proceedings of the 47th design automation conference, DAC ’10. ACM, New York, pp 813–818
Clarke D, Ilic A, Lastovetsky A, Rychkov V, Sousa L, Zhong Z (2014) Design and optimization of scientific applications for highly heterogeneous and hierarchical HPC platforms using functional computation performance models. Wiley, New York, pp 235–260
Cocaa-Fernndez A, Ranilla J, Snchez L (2015) Energy-efficient allocation of computing node slots in HPC clusters through parameter learning and hybrid genetic fuzzy system modeling. J Supercomput 71(3):1163–1174
Deng Y, Hu Y, Meng Xi, Zhu Y, Zhang Z, Han J (2014) Predictively booting nodes to minimize performance degradation of a power-aware web cluster. Clust Comput 17(4):1309–1322
Fernandez JJ (2009) Tomobflow: feature-preserving noise filtering for electron tomography. BMC Bioinform 10:178
Fernández JJ, Martínez JA (2010) Three-dimensional feature-preserving noise reduction for real-time electron tomography. Digit Signal Process 20(4):1162–1172
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness (Series of Books in the Mathematical Sciences) W.H. Freeman, 1st edn
Hong S, Kim H (2010) An integrated GPU power and performance model. SIGARCH Comput Archit News 38(3):280–289
Kaleem R, Barik R, Shpeisman T, Lewis BT, Hu Ch, Pingali K (2014) Adaptive heterogeneous scheduling for integrated GPUs. In: Proceedings of the 23rd international conference on parallel architectures and compilation, PACT ’14. ACM, New York, pp 151–162
Kang Y, Choi W, Kim B, Kim J (2014) On tradeoff between the two compromise factors in assigning tasks on a cluster computing. Clust Comput 17(3):861–870
Kimmel R, Sochen NA, Malladi R (1997) From high energy physics to low level vision. Lect Notes Comput Sci 1252:236–247
Leng J, Hetherington T, ElTantawy A, Gilani S, Kim NS, Aamodt TM, Reddi VJ (2013) GPUWattch: enabling energy optimizations in GPGPUs. SIGARCH Comput Archit News 41(3):487–498
Martínez JA, Vázquez F, Garzón EM, Fernández JJ (2011) Real-time electron tomography based on GPU computing. In: Euro-Par 2010 Parallel Processing Workshops, LNCS, vol 6586. Springer, Berlin, Heidelberg, pp 201–208
Martinez JA, Almeida F, Garzon EM, Acosta A, Blanco V (2011) Adaptive load balancing of iterative computation on heterogeneous nondedicated systems. J Supercomput 58(3):385–393
Martinez JA, Garzon EM, Plaza A, Garcia I (2011) Automatic tuning of iterative computation on heterogeneous multiprocessors with ADITHE. J Supercomput 58(2):151–159
Mittal S, Vetter JS (2014) A survey of methods for analyzing and improving GPU energy efficiency. ACM Comput Surv 47(2):19:1–19:23
NVIDIA (2015) Tegra processors. http://www.nvidia.com/object/tegra-x1-processor.html
Press WH, Flannery BP, Teukolsky SA (1992) Vetterling WT numerical recipes: the art of scientific computing. Cambridge University Press, Cambridge
Scogland TRW, Lin H, Feng W (2010) A first look at integrated gpus for green high-performance computing. Comput Sci Res Dev 25(3–4):125–134
Tian Y, Lin C, Li K (2014) Managing performance and power consumption tradeoff for multiple heterogeneous servers in cloud computing. Clust Comput 17(3):943–955
Ukidave Y, Kaeli DR (2013) Analyzing optimization techniques for power efficiency on heterogeneous platforms. In: Parallel and distributed processing symposium workshops PhD Forum (IPDPSW), 2013 IEEE 27th International, pp 1040–1049
Wang H, Sathish V, Singh R, Schulte MJ, Kim NS (2012) Workload and Power budget partitioning for single-chip heterogeneous processors. In: Proceedings of the 21st international conference on parallel architectures and compilation techniques, PACT ’12. ACM, New York, pp 401–410
Weaver VM, Johnson M, Kasichayanula K, Ralph J, Luszczek P, Terpstra D, Moore S (2012) Measuring energy and power with PAPI. In: Proceedings of the 2012 41st international conference on parallel processing workshops, ICPPW ’12. IEEE Computer Society, Washington, DC, pp 262–268
Yuffe M, Knoll E, Mehalel M, Shor J, Kurts T (2011) A fully integrated multi-CPU, GPU and memory controller 32nm processor. In: Solid-state circuits conference digest of technical papers (ISSCC), 2011 IEEE International, pp 264–266
Zhong Z, Rychkov V, Lastovetsky A (2014) Data partitioning on multicore and multi-GPU platforms using functional performance models. Comput IEEE Trans PP(99):1–1
Author information
Authors and Affiliations
Corresponding author
Additional information
This work has been funded by Grants from the Spanish Ministry of Science and Innovation (TIN2012-37483-C03-03, CAPAP-H5 network TIN2014-53522) and Junta de Andalucia (P12-TIC-301) in part financed by the European Regional Development Fund (ERDF).
Rights and permissions
About this article
Cite this article
Garzón, E.M., Moreno, J.J. & Martínez, J.A. An approach to optimise the energy efficiency of iterative computation on integrated GPU–CPU systems. J Supercomput 73, 114–125 (2017). https://doi.org/10.1007/s11227-016-1643-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1643-9