Skip to main content

Optimization of Data-Parallel Applications on Heterogeneous HPC Platforms for Dynamic Energy Through Workload Distribution

  • Conference paper
  • First Online:
  • 1261 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11997))

Abstract

Energy is one of the most important objectives for optimization on modern heterogeneous high performance computing (HPC) platforms. The tight integration of multicore CPUs with accelerators in these platforms present several challenges to optimization of multithreaded data-parallel applications for dynamic energy.

In this work, we formulate the optimization problem of data-parallel applications on heterogeneous HPC platforms for dynamic energy through workload distribution. We propose a solution method to solve the problem. It consists of a data-partitioning algorithm that employs load imbalancing technique to determine the workload distribution minimizing the dynamic energy consumption of the parallel execution of an application. The inputs to the algorithm are discrete dynamic energy profiles of individual computing devices.

We experimentally analyse the proposed algorithm using two multithreaded data-parallel applications, matrix multiplication and 2D fast Fourier transform. The load-imbalanced solutions provided by the algorithm achieve significant dynamic energy reductions (on the average 130% and 44%) compared to the load-balanced ones for the applications.

Supported by Science Foundation Ireland (SFI) under Grant Number 14/IA/2474.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Basmadjian, R., Ali, N., Niedermeier, F., de Meer, H., Giuliani, G.: A methodology to predict the power consumption of servers in data centres. In: 2nd International Conference on Energy-Efficient Computing and Networking. ACM (2011)

    Google Scholar 

  2. Chakrabarti, A., Parthasarathy, S., Stewart, C.: A pareto framework for data analytics on heterogeneous systems: implications for green energy usage and performance. In: 46th International Conference on Parallel Processing (ICPP), pp. 533–542. IEEE (2017)

    Google Scholar 

  3. Choi, J., Dukhan, M., Liu, X., Vuduc, R.: Algorithmic time, energy, and power on candidate HPC compute building blocks. In: IEEE 28th International Parallel and Distributed Processing Symposium, pp. 447–457. IEEE (2014)

    Google Scholar 

  4. Devices, A.M.: Bios and kernel developer’s guide (BKDG) for AMD family 15h models 00h–0Fh processors (2012). https://www.amd.com/system/files/TechDocs/42301_15h_Mod_00h-0Fh_BKDG.pdf

  5. Drozdowski, M., Marszalkowski, J.M., Marszalkowski, J.: Energy trade-offs analysis using equal-energy maps. Future Gener. Comput. Syst. 36, 311–321 (2014)

    Article  Google Scholar 

  6. Fahad, M., Shahid, A., Manumachu, R.R., Lastovetsky, A.: A comparative study of methods for measurement of energy of computing. Energies 12(11), 2204 (2019)

    Article  Google Scholar 

  7. Gough, C., Steiner, I., Saunders, W.: Energy Efficient Servers: Blueprints for Data Center Optimization. Apress, New York (2015)

    Book  Google Scholar 

  8. HCL: HCLWattsUp: API for power and energy measurements using WattsUp Pro Meter (2016). https://csgitlab.ucd.ie/ucd-hcl/hclwattsup

  9. Hsu, J.: Three paths to exascale supercomputing. IEEE Spectr. 53(1), 14–15 (2016)

    Article  Google Scholar 

  10. Khaleghzadeh, H., Manumachu, R.R., Lastovetsky, A.: A novel data-partitioning algorithm for performance optimization of data-parallel applications on heterogeneous HPC platforms. IEEE Trans. Parallel Distrib. Syst. 29(10), 2176–2190 (2018)

    Article  Google Scholar 

  11. Khaleghzadeh, H., Reddy, R., Lastovetsky, A.: HEOPTA: heterogeneous model-based data partitioning algorithm for optimization of data-parallel applications for dynamic energy (2019). https://csgitlab.ucd.ie/HKhaleghzadeh/heopt

  12. Lang, J., Rünger, G.: An execution time and energy model for an energy-aware execution of a conjugate gradient method with CPU/GPU collaboration. J. Parallel Distrib. Comput. 74(9), 2884–2897 (2014)

    Article  Google Scholar 

  13. Lastovetsky, A., Reddy, R.: New model-based methods and algorithms for performance and energy optimization of data parallel applications on homogeneous multicore clusters. IEEE Trans. Parallel Distrib. Syst. 28(4), 1119–1133 (2017)

    Article  Google Scholar 

  14. Manumachu, R.R., Lastovetsky, A.: Bi-objective optimization of data-parallel applications on homogeneous multicore clusters for performance and energy. IEEE Trans. Comput. 67(2), 160–177 (2018)

    Article  MathSciNet  Google Scholar 

  15. Marszałkowski, J.M., Drozdowski, M., Marszałkowski, J.: Time and energy performance of parallel systems with hierarchical memory. J. Grid Comput. 14(1), 153–170 (2015). https://doi.org/10.1007/s10723-015-9345-8

    Article  MATH  Google Scholar 

  16. Nagasaka, H., Maruyama, N., Nukada, A., Endo, T., Matsuoka, S.: Statistical power modeling of GPU kernels using performance counters. In: International Green Computing Conference and Workshops (IGCC). IEEE (2010)

    Google Scholar 

  17. Nvidia: Nvidia management library: NVML reference manual, October 2018. https://docs.nvidia.com/pdf/NVML_API_Reference_Guide.pdf

  18. O’Brien, K., Pietri, I., Reddy, R., Lastovetsky, A., Sakellariou, R.: A survey of power and energy predictive models in HPC systems and applications. ACM Comput. Surv. 50(3), 37 (2017)

    Article  Google Scholar 

  19. Shao, Y.S., Brooks, D.: Energy characterization and instruction-level energy model of Intel’s Xeon Phi processor. In: Proceedings of the 2013 International Symposium on Low Power Electronics and Design, ISLPED 2013. IEEE Press (2013)

    Google Scholar 

  20. Song, S., Su, C., Rountree, B., Cameron, K.W.: A simplified and accurate model of power-performance efficiency on emergent GPU architectures. In: 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 673–686. IEEE Computer Society (2013)

    Google Scholar 

  21. Zhong, Z., Rychkov, V., Lastovetsky, A.: Data partitioning on multicore and multi-GPU platforms using functional performance models. IEEE Trans. Comput. 64(9), 2506–2518 (2015)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamidreza Khaleghzadeh .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 989 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khaleghzadeh, H., Fahad, M., Manumachu, R.R., Lastovetsky, A. (2020). Optimization of Data-Parallel Applications on Heterogeneous HPC Platforms for Dynamic Energy Through Workload Distribution. In: Schwardmann, U., et al. Euro-Par 2019: Parallel Processing Workshops. Euro-Par 2019. Lecture Notes in Computer Science(), vol 11997. Springer, Cham. https://doi.org/10.1007/978-3-030-48340-1_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-48340-1_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-48339-5

  • Online ISBN: 978-3-030-48340-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics