Skip to main content
Log in

An approach to optimise the energy efficiency of iterative computation on integrated GPU–CPU systems

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Currently, the energy efficiency of computational systems is of paramount relevance. In this work, an approach for improving energy efficiency is proposed in the context of the iterative computation on integrated GPU-CPU systems. The proposal, referred to as E-ADITHE, combines iterative procedures with: (1) a heuristic scheme for processing units selection according to the estimation of energy efficiency and (2) the load balancing on heterogeneous processors. There is a wide variety of iterative algorithms related to science and engineering which can take advantage of E-ADITHE. The Beltrami filter has been selected as a representative example of such procedures and its OpenCL version has been used to validate E-ADITHE. The analysis of the results shows that E-ADITHE improves automatically the energy efficiency of parallel iterative algorithm on modern heterogeneous processors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://www.intel.com/support/processors/sb/CS-028855.htm.

  2. http://www.amd.com/en-us/innovations/2000-2009.

  3. http://www.khronos.org/opencl/.

References

  1. AMD (2015) AMD compute cores. A new era of computing. AMD enables CPU and GPU cores to work together on a single APU chip. http://www.amd.com/en-us/innovations/software-technologies/processors-for-business/compute-cores

  2. Chen X, Xu C, Dick RP, Mao ZM (2010) Performance and power modeling in a multi-programmed multi-core environment. In: Proceedings of the 47th design automation conference, DAC ’10. ACM, New York, pp 813–818

  3. Clarke D, Ilic A, Lastovetsky A, Rychkov V, Sousa L, Zhong Z (2014) Design and optimization of scientific applications for highly heterogeneous and hierarchical HPC platforms using functional computation performance models. Wiley, New York, pp 235–260

  4. Cocaa-Fernndez A, Ranilla J, Snchez L (2015) Energy-efficient allocation of computing node slots in HPC clusters through parameter learning and hybrid genetic fuzzy system modeling. J Supercomput 71(3):1163–1174

    Article  Google Scholar 

  5. Deng Y, Hu Y, Meng Xi, Zhu Y, Zhang Z, Han J (2014) Predictively booting nodes to minimize performance degradation of a power-aware web cluster. Clust Comput 17(4):1309–1322

    Article  Google Scholar 

  6. Fernandez JJ (2009) Tomobflow: feature-preserving noise filtering for electron tomography. BMC Bioinform 10:178

    Article  Google Scholar 

  7. Fernández JJ, Martínez JA (2010) Three-dimensional feature-preserving noise reduction for real-time electron tomography. Digit Signal Process 20(4):1162–1172

    Article  Google Scholar 

  8. Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness (Series of Books in the Mathematical Sciences) W.H. Freeman, 1st edn

  9. Hong S, Kim H (2010) An integrated GPU power and performance model. SIGARCH Comput Archit News 38(3):280–289

    Article  Google Scholar 

  10. Kaleem R, Barik R, Shpeisman T, Lewis BT, Hu Ch, Pingali K (2014) Adaptive heterogeneous scheduling for integrated GPUs. In: Proceedings of the 23rd international conference on parallel architectures and compilation, PACT ’14. ACM, New York, pp 151–162

  11. Kang Y, Choi W, Kim B, Kim J (2014) On tradeoff between the two compromise factors in assigning tasks on a cluster computing. Clust Comput 17(3):861–870

    Article  Google Scholar 

  12. Kimmel R, Sochen NA, Malladi R (1997) From high energy physics to low level vision. Lect Notes Comput Sci 1252:236–247

    Article  Google Scholar 

  13. Leng J, Hetherington T, ElTantawy A, Gilani S, Kim NS, Aamodt TM, Reddi VJ (2013) GPUWattch: enabling energy optimizations in GPGPUs. SIGARCH Comput Archit News 41(3):487–498

    Article  Google Scholar 

  14. Martínez JA, Vázquez F, Garzón EM, Fernández JJ (2011) Real-time electron tomography based on GPU computing. In: Euro-Par 2010 Parallel Processing Workshops, LNCS, vol 6586. Springer, Berlin, Heidelberg, pp 201–208

  15. Martinez JA, Almeida F, Garzon EM, Acosta A, Blanco V (2011) Adaptive load balancing of iterative computation on heterogeneous nondedicated systems. J Supercomput 58(3):385–393

    Article  Google Scholar 

  16. Martinez JA, Garzon EM, Plaza A, Garcia I (2011) Automatic tuning of iterative computation on heterogeneous multiprocessors with ADITHE. J Supercomput 58(2):151–159

    Article  Google Scholar 

  17. Mittal S, Vetter JS (2014) A survey of methods for analyzing and improving GPU energy efficiency. ACM Comput Surv 47(2):19:1–19:23

    Article  Google Scholar 

  18. NVIDIA (2015) Tegra processors. http://www.nvidia.com/object/tegra-x1-processor.html

  19. Press WH, Flannery BP, Teukolsky SA (1992) Vetterling WT numerical recipes: the art of scientific computing. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  20. Scogland TRW, Lin H, Feng W (2010) A first look at integrated gpus for green high-performance computing. Comput Sci Res Dev 25(3–4):125–134

    Article  Google Scholar 

  21. Tian Y, Lin C, Li K (2014) Managing performance and power consumption tradeoff for multiple heterogeneous servers in cloud computing. Clust Comput 17(3):943–955

    Article  Google Scholar 

  22. Ukidave Y, Kaeli DR (2013) Analyzing optimization techniques for power efficiency on heterogeneous platforms. In: Parallel and distributed processing symposium workshops PhD Forum (IPDPSW), 2013 IEEE 27th International, pp 1040–1049

  23. Wang H, Sathish V, Singh R, Schulte MJ, Kim NS (2012) Workload and Power budget partitioning for single-chip heterogeneous processors. In: Proceedings of the 21st international conference on parallel architectures and compilation techniques, PACT ’12. ACM, New York, pp 401–410

  24. Weaver VM, Johnson M, Kasichayanula K, Ralph J, Luszczek P, Terpstra D, Moore S (2012) Measuring energy and power with PAPI. In: Proceedings of the 2012 41st international conference on parallel processing workshops, ICPPW ’12. IEEE Computer Society, Washington, DC, pp 262–268

  25. Yuffe M, Knoll E, Mehalel M, Shor J, Kurts T (2011) A fully integrated multi-CPU, GPU and memory controller 32nm processor. In: Solid-state circuits conference digest of technical papers (ISSCC), 2011 IEEE International, pp 264–266

  26. Zhong Z, Rychkov V, Lastovetsky A (2014) Data partitioning on multicore and multi-GPU platforms using functional performance models. Comput IEEE Trans PP(99):1–1

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to E. M. Garzón.

Additional information

This work has been funded by Grants from the Spanish Ministry of Science and Innovation (TIN2012-37483-C03-03, CAPAP-H5 network TIN2014-53522) and Junta de Andalucia (P12-TIC-301) in part financed by the European Regional Development Fund (ERDF).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Garzón, E.M., Moreno, J.J. & Martínez, J.A. An approach to optimise the energy efficiency of iterative computation on integrated GPU–CPU systems. J Supercomput 73, 114–125 (2017). https://doi.org/10.1007/s11227-016-1643-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1643-9

Keywords