Abstract
Heterogeneous computers require a well-distributed workload to operate efficiently. When possible, this load balancing procedure should redistribute the workload with minimal knowledge of the system architecture, to reduce overhead. We propose a generic dynamic load balancing technique for iterative problems, independent from the resource to optimize. Proof of this generalization is given through formalization of the designed technique. A heuristic algorithm is defined based upon this formalization, with a structure that facilitates different objective functions. As a result, swapping the objective function can be done with relatively low effort. This heuristic is implemented to minimize energy consumption in an application. We use this application to solve three different dynamic programming problems with multiple GPUs. The implementation is described and then compared against two different workloads, the homogeneous distribution and another dynamic load balancing technique. Our experimentation shows good results in minimizing the overall energy consumption with low overhead.

Similar content being viewed by others
References
Acosta A, Almeida F (2013) Skeletal based programming for dynamic programming on multi-GPU systems. J Supercomput 65(3):1125–1136. https://doi.org/10.1007/s11227-013-0895-x
Agullo E, Demmel J, Dongarra J, Hadri B, Kurzak J, Langou J, Ltaief H, Luszczek P, Tomov S (2009) Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects. J Phys Conf Ser 180(1):012037
Almeida F, Arteaga J, Blanco V, Cabrera A (2015) Energy measurement tools for ultrascale computing: a survey. Supercomput Front Innov 2(2):64–76
Beloglazov A, Abawajy J, Buyya R (2012) Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing. Future Gener Comput Syst 28(5):755–768. https://doi.org/10.1016/j.future.2011.04.017 (Special Section: Energy efficiency in large-scale distributed systems)
Browne S, Dongarra J, Garner N, Ho G, Mucci P (2000) A portable programming interface for performance evaluation on modern processors. Int J High Perform Comput Appl 14(3):189–204. https://doi.org/10.1177/109434200001400303
Cabrera A, Acosta A, Almeida F, Blanco V (2017) Energy efficient dynamic load balancing over multi-GPU heterogeneous systems. In: Parallel Processing and Applied Mathematics—12th International Conference, PPAM 2017, Lublin, Poland, September 10–13, 2017, Revised Selected Papers, Part II, pp 123–132. https://doi.org/10.1007/978-3-319-78054-2_12
Cabrera A, Almeida F, Arteaga J, Blanco V (2014) Measuring energy consumption using EML (energy measurement library). Comput Sci Res Dev 30(2):135–143. https://doi.org/10.1007/s00450-014-0269-5
Dongarra J, Bosilca G, Chen Z, Eijkhout V, Fagg GE, Fuentes E, Langou J, Luszczek P, Pjesivac-Grbovic J, Seymour K, You H, Vadhiyar SS (2006) Self-adapting numerical software (SANS) effort. IBM J Res Dev 50(2/3):223–238
Garzón EM, Moreno JJ, Martínez JA (2017) An approach to optimise the energy efficiency of iterative computation on integrated GPU–CPU systems. J Supercomput 73(1):114–125. https://doi.org/10.1007/s11227-016-1643-9
Ge R, Feng X, Song S, Chang HC, Li D, Cameron KW (2010) Powerpack: energy profiling and analysis of high-performance systems and applications. IEEE Trans Parallel Distrib Syst 21(5):658–671
Guzek M, Kliazovich D, Bouvry P (2015) HEROS: energy-efficient load balancing for heterogeneous data centers. In: Pu C, Mohindra A (eds) 8th IEEE International Conference on Cloud Computing, CLOUD 2015, New York City, NY, USA, June 27–July 2, 2015, pp 742–749. IEEE. https://doi.org/10.1109/CLOUD.2015.103
Hendrickson B, Leland R (1995) An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM J Sci Comput 16(2):452–469. https://doi.org/10.1137/0916028
Innovative Computing Laboratory (2011) University of Tennessee: the parallel linear algebra for scalable multi-core architectures (PLASMA) project. http://icl.cs.utk.edu/plasma/. Accessed May 2018
Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680. https://doi.org/10.1126/science.220.4598.671
Kumar V, Grama A, Vempaty N (1994) Scalable load balancing techniques for parallel computers. J Parallel Distrib Comput 22(1):60–79. https://doi.org/10.1006/jpdc.1994.1070
Marqués R, Paulino H, Alexandre F, Medeiros PD (2013) Algorithmic skeleton framework for the orchestration of GPU computations. In: Wolf F, Mohr B, an Mey D (eds) Euro-Par 2013 Parallel Processing—19th International Conference, Aachen, Germany, August 26–30, 2013. Proceedings, Lecture Notes in Computer Science, vol 8097, pp 874–885. Springer. https://doi.org/10.1007/978-3-642-40047-6_86
Martínez J, Garzón E, Plaza A, García I (2009) Automatic tuning of iterative computation on heterogeneous multiprocessors with ADITHE. J Supercomput. https://doi.org/10.1007/s11227-009-0350-1
Meuer H, Strohmaier E, Dongarra J, Simon H Top500 list. http://www.top500.org/. Accessed May 2018
Mladenović N, Hansen P (1997) Variable neighborhood search. Comput Oper Res 24(11):1097–1100. https://doi.org/10.1016/S0305-0548(97)00031-2
Peláez I, Almeida F, Suárez F (2007) DPSKEL: a skeleton based tool for parallel dynamic programming. In: 7th International Conference on Parallel Processing and Applied Mathematics, PPAM2007. Gdansk, Poland, pp 1104–1113. https://doi.org/10.1007/978-3-540-68111-3_117
Reddy R, Lastovetsky A (2017) Bi-objective optimization of data-parallel applications on homogeneous multicore clusters for performance and energy. IEEE Trans Comput 1(1):1–1. https://doi.org/10.1109/TC.2017.2742513
Richmond P, Romano D (2010) FLAME: Flexible large-scale agent modelling environment on the GPU. https://www.cs.utexas.edu/~flame/web/. Accessed Dec 2018
Steuwer M, Gorlatch S (2014) Skelcl: a high-level extension of opencl for multi-GPU systems. J Supercomput 69(1):25–33. https://doi.org/10.1007/s11227-014-1213-y
Takouna I, Rojas-Cessa R, Sachs K, Meinel C (2013) Communication-aware and energy-efficient scheduling for parallel applications in virtualized data centers. In: IEEE/ACM 6th International Conference on Utility and Cloud Computing, UCC 2013, Dresden, Germany, December 9–12, 2013, pp 251–255. IEEE. https://doi.org/10.1109/UCC.2013.50
The FLAME Project (2011) Flame: formal linear algebra methods environment. http://z.cs.utexas.edu/wiki/flame.wiki/FrontPage. Accessed May 2018
Willebeek-LeMair MH, Reeves AP (1993) Strategies for dynamic load balancing on highly parallel computers. IEEE Trans Parallel Distrib Syst 4(9):979–993. https://doi.org/10.1109/71.243526
Xu C, Lau FC (1997) Load balancing in parallel computers: theory and practice. Kluwer Academic Publishers, Norwell
Acknowledgements
This work was supported by the Spanish Ministry of Science, Innovation and Universities through the TIN2016-78919-R project, the Government of the Canary Islands, with the project ProID2017010130 and the grant TESIS2017010134, which is co-financed by the Ministry of Economy, Industry, Commerce and Knowledge of Canary Islands and the European Social Funds (ESF), operative program integrated of Canary Islands 2014-2020 Strategy Aim 3, Priority Topic 74(85%); the Spanish network CAPAP-H, and the European COST Action CHIPSET.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cabrera, A., Acosta, A., Almeida, F. et al. A heuristic technique to improve energy efficiency with dynamic load balancing. J Supercomput 75, 1610–1624 (2019). https://doi.org/10.1007/s11227-018-2718-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2718-6