Skip to main content

Multi Objective Optimization of HPC Kernels for Performance, Power, and Energy

  • Conference paper
  • First Online:
High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation (PMBS 2013)

Abstract

Code optimization in the high-performance computing realm has traditionally focused on reducing execution time. The problem, in mathematical terms, has been expressed as a single objective optimization problem. The expected concerns of next-generation systems, however, demand a more detailed analysis of the interplay among execution time and other metrics. Metrics such as power, performance, energy, and resiliency may all be targeted together and traded against one another. We present a multi objective formulation of the code optimization problem. Our proposed framework helps one explore potential tradeoffs among multiple objectives and provides a significantly richer analysis than can be achieved by treating additional metrics as hard constraints. We empirically examine a variety of metrics, architectures, and code optimization decisions and provide evidence that such tradeoffs exist in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kogge, P.: The tops in flops. IEEE Spectrum 48(2), 48–54 (2011)

    Article  Google Scholar 

  2. TOP500 List: June 2013 Report, http://www.top500.org

  3. Balaprakash, P., Wild, S.M., Hovland, P.D.: Can search algorithms save large-scale automatic performance tuning? Procedia Computer Science 4, 2136–2145 (2011)

    Article  Google Scholar 

  4. Kadayif, I., Kandemir, M., Vijaykrishnan, N., Irwin, M., Sivasubramaniam, A.: EAC: A compiler framework for high-level energy estimation and optimization. In: Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, pp. 436–442. IEEE (2002)

    Google Scholar 

  5. Kodi, A., Louri, A.: Performance adaptive power-aware reconfigurable optical interconnects for high-performance computing (HPC) systems. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC), pp. 1–12 (2007)

    Google Scholar 

  6. Ahmad, I., Ranka, S., Khan, S.U.: Using game theory for scheduling tasks on multi-core processors for simultaneous optimization of performance and energy. In: IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–6. IEEE (2008)

    Google Scholar 

  7. Azizi, O., Mahesri, A., Lee, B.C., Patel, S.J., Horowitz, M.: Energy-performance tradeoffs in processor architecture and circuit design: A marginal cost analysis. In: ACM SIGARCH Computer Architecture News, vol. 38, pp. 26–36. ACM (2010)

    Google Scholar 

  8. Tiwari, A., Laurenzano, M.A., Carrington, L., Snavely, A.: Modeling power and energy usage of HPC kernels. In: IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 990–998. IEEE (2012)

    Google Scholar 

  9. Choi, J.W., Bedard, D., Fowler, R., Vuduc, R.: A roofline model of energy. In: 2013 IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS), pp. 661–672. IEEE (May 2013)

    Google Scholar 

  10. Ascia, G., Catania, V., Palesi, M.: Multi-objective mapping for mesh-based NoC architectures. In: Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, pp. 182–187. ACM (2004)

    Google Scholar 

  11. Jahr, R., Ungerer, T., Calborean, H., Vintan, L.: Automatic multi-objective optimization of parameters for hardware and code optimizations. In: International Conference on High Performance Computing and Simulation (HPCS), pp. 308–316. IEEE (2011)

    Google Scholar 

  12. Park, S., Jiang, W., Zhou, Y., Adve, S.: Managing energy-performance tradeoffs for multithreaded applications on multiprocessor architectures. In: ACM SIGMETRICS Performance Evaluation Review, vol. 35, pp. 169–180 (2007)

    Google Scholar 

  13. Bedard, D., Lim, M.Y., Fowler, R., Porterfield, A.: PowerMon: Fine-grained and integrated power monitoring for commodity computer systems. In: IEEE SoutheastCon 2010, pp. 479–484 (2010)

    Google Scholar 

  14. Li, D., de Supinski, B.R., Schulz, M., Cameron, K., Nikolopoulos, D.S.: Hybrid MPI/OpenMP power-aware computing. In: IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1–12. IEEE (2010)

    Google Scholar 

  15. Rahman, S.F., Guo, J., Yi, Q.: Automated empirical tuning of scientific codes for performance and power consumption. In: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers, pp. 107–116. ACM (2011)

    Google Scholar 

  16. Lively, C., Wu, X., Taylor, V., Moore, S., Chang, H.C., Cameron, K.: Energy and performance characteristics of different parallel implementations of scientific applications on multicore systems. International Journal of High Performance Computing Applications 25(3), 342–350 (2011)

    Article  Google Scholar 

  17. Ţăpuş, C., Chung, I.H., Hollingsworth, J.K.: Active harmony: towards automated performance tuning. In: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, Supercomputing 2002, pp. 1–11. IEEE Computer Society Press, Los Alamitos (2002)

    Google Scholar 

  18. Tiwari, A., Laurenzano, M.A., Carrington, L., Snavely, A.: Auto-tuning for energy usage in scientific applications. In: Alexander, M., et al. (eds.) Euro-Par 2011, Part II. LNCS, vol. 7156, pp. 178–187. Springer, Heidelberg (2012)

    Google Scholar 

  19. Laros III, J.H.: Measuring and tuning energy efficiency on large scale high performance computing platforms. Technical Report SAND2011-5702, Sandia National Laboratories (August 2011)

    Google Scholar 

  20. Heydemann, K., Bodin, F.: Iterative compilation for two antagonistic criteria: Application to code size and performance. In: Proceedings of the 4th Workshop on Optimizations for DSP and Embedded Systems (2006)

    Google Scholar 

  21. Hoste, K., Eeckhout, L.: Cole: Compiler optimization level exploration. In: Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp. 165–174. ACM (2008)

    Google Scholar 

  22. Lokuciejewski, P., Plazar, S., Falk, H., Marwedel, P., Thiele, L.: Multi-objective exploration of compiler optimizations for real-time systems. In: 13th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing (ISORC), pp. 115–122 (2010)

    Google Scholar 

  23. Hoste, K., Georges, A., Eeckhout, L.: Automated just-in-time compiler tuning. In: Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 62–72. ACM (2010)

    Google Scholar 

  24. Fursin, G., Kashnikov, Y., Memon, A.W., Chamski, Z., Temam, O., Namolaru, M., Yom-Tov, E., Mendelson, B., Zaks, A., Courtois, E., et al.: Milepost gcc: Machine learning enabled self-tuning compiler. International Journal of Parallel Programming 39(3), 296–327 (2011)

    Article  Google Scholar 

  25. Jordan, H., Thoman, P., Durillo, J.J., Pellegrini, S., Gschwandtner, P., Fahringer, T., Moritsch, H.: A multi-objective auto-tuning framework for parallel codes. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC), pp. 10:1–10:12. IEEE Computer Society Press, Los Alamitos (2012)

    Google Scholar 

  26. Ehrgott, M.: Multicriteria Optimization. 2nd edn. Springer (2005)

    Google Scholar 

  27. Balaprakash, P., Wild, S.M., Norris, B.: SPAPT: Search problems in automatic performance tuning. Procedia Computer Science 9, 1959–1968 (2012)

    Article  Google Scholar 

  28. Kaiser, A., Williams, S., Madduri, K., Ibrahim, K., Bailey, D., Demmel, J., Strohmaier, E.: TORCH computational reference kernels: A testbed for computer science research. Technical Report UCB/EECS-2010-144, EECS Department, University of California, Berkeley (December 2010)

    Google Scholar 

  29. Davis, T.A.: Direct methods for sparse linear systems, vol. 2. SIAM (2006)

    Google Scholar 

  30. Heroux, M.A., Doerer, D.W., Crozier, P.S., Willenbring, J.M.: Improving performance via mini-applications. Technical Report SAND2009-5574, Sandia National Laboratories (September 2009)

    Google Scholar 

  31. Norris, B., Hartono, A., Gropp, W.: Annotations for productivity and performance portability. In: Petascale Computing: Algorithms and Applications. Computational Science, pp. 443–462. Chapman & Hall/CRC Press (2007)

    Google Scholar 

  32. Intel Xeon Phi Coprocessor - the Architecture: http://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-codename-knights-corner

  33. Albers, S., Antoniadis, A.: Race to idle: New algorithms for speed scaling with a sleep state. In: Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1266–1285. SIAM (2012)

    Google Scholar 

  34. Intel Xeon Phi Coprocessor System Software Developers Guide: http://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-system-software-developers-guide

  35. Alonso, P., Dolz, M.F., Igual, F.D., Mayo, R., Quintana-Orti, E.S.: Saving energy in the LU factorization with partial pivoting on multi-core processors. In: 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 353–358. IEEE (2012)

    Google Scholar 

  36. Springer, R., Lowenthal, D.K., Rountree, B., Freeh, V.W.: Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster. In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 230–238. ACM (2006)

    Google Scholar 

  37. Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software 38(1) 1:1–1:25 (2011)

    Google Scholar 

  38. CPU Freq. Scaling, https://wiki.archlinux.org/index.php/Cpufrequtils

  39. WattsUp? Meters, https://www.wattsupmeters.com/

  40. IBM System Blue Gene Solution - Overview, http://www-03.ibm.com/systems/technicalcomputing/solutions/bluegene/

  41. Yoshii, K., Iskra, K., Gupta, R., Beckman, P., Vishwanath, V., Yu, C., Coghlan, S.: Evaluating power-monitoring capabilities on IBM Blue Gene/P and Blue Gene/Q. In: IEEE International Conference on Cluster Computing (CLUSTER), pp. 36–44. IEEE (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefan M. Wild .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Balaprakash, P., Tiwari, A., Wild, S.M. (2014). Multi Objective Optimization of HPC Kernels for Performance, Power, and Energy. In: Jarvis, S., Wright, S., Hammond, S. (eds) High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation. PMBS 2013. Lecture Notes in Computer Science(), vol 8551. Springer, Cham. https://doi.org/10.1007/978-3-319-10214-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10214-6_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10213-9

  • Online ISBN: 978-3-319-10214-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics