Skip to main content

Advertisement

Log in

Automatic runtime frequency-scaling system for energy savings in parallel applications

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Although high-performance computing has always been about efficient application execution, both energy and power consumption have become critical concerns owing to their effect on operating costs and failure rates of large-scale computing platforms. Modern processors provide techniques, such as dynamic voltage and frequency scaling (DVFS) and CPU clock modulation (called throttling), to improve energy efficiency on-the-fly. Without careful application, however, DVFS and throttling may cause a significant performance loss due to system overhead. This paper proposes a novel runtime system that maximizes energy saving by selecting appropriate values for DVFS and throttling in parallel applications. Specifically, the system automatically predicts communication phases in parallel applications and applies frequency scaling considering both the CPU offload, provided by the network-interface card, and the architectural stalls during computation. Experiments, performed on NAS parallel benchmarks as well as on real-world applications in molecular dynamics and linear system solution, demonstrate that the proposed runtime system obtaining energy savings of as much as 14 % with a low performance loss of about 2 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. TOP500 list: http://top500.org/.

  2. Infiniband: http://www.infinibandta.org.

  3. MPI Forum: http://www.mpi-forum.org.

  4. In this paper, rank denotes here the destination rank for send-type operations, such as MPI_Send, or the source rank, for receive-type ones, such as MPI_Recv.

  5. CPMD Consortium: http://www.cpmd.org.

  6. Dynamo is funded and operated jointly by Iowa State University and Ames Laboratory.

  7. Wattsup meter: https://www.wattsupmeters.com.

  8. MVAPICH Project: http://mvapich.cse.ohio-state.edu/.

References

  1. Lusk E, Chan A, Gropp W (1998) Users Guide for MPE Extensions for MPI Programs

  2. Annavarami M, Grochowski E, Shen J (2005) Mitigating Amdahl’s law through EPI throttling. In: Proceedings of the 32nd annual international symposium on Computer Architecture, ISCA’05. IEEE Computer Society, Washington, pp 298–309

  3. Bailey DH, Barszcz E, Barton JT, Browning DS, Carter RL, Dagum LR, Fatoohi A, Frederickson PO, Lasinski TA, Schreiber RS, Simon HD, Venkatakrishnan V, Weeratunga SK (1991) The NAS parallel benchmarks-summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pp 158–165

  4. Curtis-Maury M, Shah A, Blagojevic F, Nikolopoulos DS, de Supinski BR, Schulz M (2008) Prediction models for multi-dimensional power-performance optimization on many cores. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques, PACT ’08. ACM. New York, pp 250–259

  5. Freeh VW, Lowenthal DK (2005) Using multiple energy gears in MPI programs on a power-scalable cluster. In: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, pp 164–173

  6. Ge R, Feng X, Feng W, Cameron KW (2007) CPU MISER: a performance-directed, run-time system for power-aware clusters. In: International conference on parallel processing, 2007, ICPP 2007, p 18

  7. Ge R, Feng X, Song S, Chang HC, Li D, Cameron KW (2010) PowerPack: energy profiling and analysis of high-performance systems and applications. IEEE Trans Parallel Distrib Syst 21:658–671

    Article  Google Scholar 

  8. Gusfield D (1997) Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  9. Hsu CH, Feng W (2005) A power-aware run-time system for high-performance computing. In: Proceedings of the ACM/IEEE SC 2005 Conference on Supercomputing, p 1

  10. Huang S, Feng W (2009) Energy-efficient cluster computing via accurate workload characterization. In: 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009. CCGRID’09, pp 68–75

  11. Iancu C, Hofmeyr S, Blagojevic F, Zheng Y (2010) Oversubscription on multicore processors. In: 2010 IEEE international symposium on parallel distributed processing (IPDPS), pp 1–11

  12. Ioannou N, Kauschke M, Gries M, Cintra M (2011) Phase-based application-driven hierarchical power management on the single-chip cloud computer. In: 2011 international conference on parallel architectures and compilation techniques (PACT), pp 131–142

  13. Isci C, Martonosi M (2003) Runtime power monitoring in high-end processors: methodology and empirical data. In: Proceedings of the 36th annual IEEE/ACM international symposium on microarchitecture, MICRO 36. IEEE Computer Society, Washington, pp 93–104

  14. Kandalla K, Mancini EP, Sur S, Panda DK (2010) Designing power-aware collective communication algorithms for InfiniBand clusters. In: 2010 39th international conference on parallel processing (ICPP), pp 218–227

  15. Li Z, Saad Y, Sosonkina M (2003) pARMS: a parallel version of the algebraic recursive multilevel solver. Numer Linear Algebra Appl 10:485–509

    Article  MATH  MathSciNet  Google Scholar 

  16. Lim, MY, Freeh VW, Lowenthal DK (2006) Adaptive, transparent frequency and voltage scaling of communication phases in MPI programs. In: Proceedings of the 2006 ACM/IEEE conference on Supercomputing

  17. Liu J, Poff D, Abali B (2009) Evaluating high performance communication: a power perspective. In: Proceedings of the 23rd international conference on supercomputing, pp 326–337

  18. Park J, Shin D, Chang N, Pedram M (2010) Accurate modeling and calculation of delay and energy overheads of dynamic voltage scaling in modern high-performance microprocessors. In 2010 International Symposium on Low-Power Electronics and Design (ISLPED), pp 419–424

  19. Rabenseifner R (1999) Automatic profiling of MPI applications with hardware performance counters. In: Proceedings of the 6th European PVM/MPI Users’ Group meeting on recent advances in parallel virtual machine and message passing interface. Springer, London, pp 35–42

  20. Rountree B, Lownenthal DK, de Supinski, Schulz M, Freeh VW, Bletsch T (2009) Adagio: making dvs practical for complex HPC applications. In: Proceedings of the 23rd international conference on Supercomputing, ICS’09. ACM, New York, pp 460–469

  21. Sundriyal V, Sosonkina M (2011) Per-call energy saving strategies in all-to-all communications. In: Proceedings of the 18th European MPI Users’ Group conference on recent advances in the message passing interface, EuroMPI’11. Springer, Berlin, pp 188–197

  22. Sundriyal V, Sosonkina M, Gaenko A (2012) Runtime procedure for energy savings in applications with point-to-point communications. http://archives.ece.iastate.edu/archive/00000622/

  23. Sundriyal V, Sosonkina M, Gaenko A (2012) Runtime procedure for energy savings in applications with point-to-point communications. In: 2012 IEEE 24th international symposium on computer architecture and high performance computing (SBAC-PAD), pp 155–162

  24. Sundriyal V, Sosonkina M, Liu F, Schmidt MW (2011) Dynamic frequency scaling and energy saving in quantum chemistry applications. In: Proceedings of the 2011 IEEE international symposium on parallel and distributed processing workshops and PhD Forum, IPDPSW ’11. IEEE Computer Society, Washington, pp 837–845

  25. Sundriyal V, Sosonkina M, Zhang Z (2012) Achieving energy efficiency during collective communications. Concurr Comput Pract Experience

  26. Thakur R, Rabenseifner R (2005) Optimization of collective communication operations in mpich. Int J High Perform Comput Appl 19:49–66

    Article  Google Scholar 

  27. Vishnu A, Song S, Marquez A, Barker K, Kerbyson D, Cameron K, Balaji P (2010) Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models. In: Proceedings of the 2010 IEEE/ACM international conference on green computing and communications & International conference on Cyber, physical and social computing, GREENCOM-CPSCOM ’10. IEEE Computer Society, Washington, pp 229–236

Download references

Acknowledgments

This work was supported in part by Ames Laboratory and Iowa State University under the contract DE-AC02-07CH11358 with the US Department of Energy, by the Air Force Office of Scientific Research under the AFOSR award FA9550-12-1-0476, and by the National Science Foundation grants NSF/OCI—0941434, 0904782, 1047772. The authors would like to thank Dr. Rong Ge for her valuable feedback and for providing the CPU Miser software, and to the anonymous referees for the their comments and suggestions, all of which helped to improve the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vaibhav Sundriyal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sundriyal, V., Sosonkina, M. & Zhang, Z. Automatic runtime frequency-scaling system for energy savings in parallel applications. J Supercomput 68, 777–797 (2014). https://doi.org/10.1007/s11227-013-1062-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-013-1062-0

Keywords

Navigation