Automatic runtime frequency-scaling system for energy savings in parallel applications

Sundriyal, Vaibhav; Sosonkina, Masha; Zhang, Zhao

doi:10.1007/s11227-013-1062-0

Automatic runtime frequency-scaling system for energy savings in parallel applications

Published: 13 December 2013

Volume 68, pages 777–797, (2014)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Vaibhav Sundriyal¹,
Masha Sosonkina² &
Zhao Zhang¹

248 Accesses
11 Citations
Explore all metrics

Abstract

Although high-performance computing has always been about efficient application execution, both energy and power consumption have become critical concerns owing to their effect on operating costs and failure rates of large-scale computing platforms. Modern processors provide techniques, such as dynamic voltage and frequency scaling (DVFS) and CPU clock modulation (called throttling), to improve energy efficiency on-the-fly. Without careful application, however, DVFS and throttling may cause a significant performance loss due to system overhead. This paper proposes a novel runtime system that maximizes energy saving by selecting appropriate values for DVFS and throttling in parallel applications. Specifically, the system automatically predicts communication phases in parallel applications and applies frequency scaling considering both the CPU offload, provided by the network-interface card, and the architectural stalls during computation. Experiments, performed on NAS parallel benchmarks as well as on real-world applications in molecular dynamics and linear system solution, demonstrate that the proposed runtime system obtaining energy savings of as much as 14 % with a low performance loss of about 2 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Heterogeneous Voltage Frequency Scaling of Data-Parallel Applications for Energy Saving on Homogeneous Multicore Platforms

Malleable Techniques and Resource Scheduling to Improve Energy Efficiency in Parallel Applications

Computation-Aware Dynamic Frequency Scaling: Parsimonious Evaluation of the Time-Energy Trade-Off Using Design of Experiments

Notes

TOP500 list: http://top500.org/.
Infiniband: http://www.infinibandta.org.
MPI Forum: http://www.mpi-forum.org.
In this paper, rank denotes here the destination rank for send-type operations, such as MPI_Send, or the source rank, for receive-type ones, such as MPI_Recv.
CPMD Consortium: http://www.cpmd.org.
Dynamo is funded and operated jointly by Iowa State University and Ames Laboratory.
Wattsup meter: https://www.wattsupmeters.com.
MVAPICH Project: http://mvapich.cse.ohio-state.edu/.

References

Lusk E, Chan A, Gropp W (1998) Users Guide for MPE Extensions for MPI Programs
Annavarami M, Grochowski E, Shen J (2005) Mitigating Amdahl’s law through EPI throttling. In: Proceedings of the 32nd annual international symposium on Computer Architecture, ISCA’05. IEEE Computer Society, Washington, pp 298–309
Bailey DH, Barszcz E, Barton JT, Browning DS, Carter RL, Dagum LR, Fatoohi A, Frederickson PO, Lasinski TA, Schreiber RS, Simon HD, Venkatakrishnan V, Weeratunga SK (1991) The NAS parallel benchmarks-summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pp 158–165
Curtis-Maury M, Shah A, Blagojevic F, Nikolopoulos DS, de Supinski BR, Schulz M (2008) Prediction models for multi-dimensional power-performance optimization on many cores. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques, PACT ’08. ACM. New York, pp 250–259
Freeh VW, Lowenthal DK (2005) Using multiple energy gears in MPI programs on a power-scalable cluster. In: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, pp 164–173
Ge R, Feng X, Feng W, Cameron KW (2007) CPU MISER: a performance-directed, run-time system for power-aware clusters. In: International conference on parallel processing, 2007, ICPP 2007, p 18
Ge R, Feng X, Song S, Chang HC, Li D, Cameron KW (2010) PowerPack: energy profiling and analysis of high-performance systems and applications. IEEE Trans Parallel Distrib Syst 21:658–671
Article Google Scholar
Gusfield D (1997) Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, New York
Book MATH Google Scholar
Hsu CH, Feng W (2005) A power-aware run-time system for high-performance computing. In: Proceedings of the ACM/IEEE SC 2005 Conference on Supercomputing, p 1
Huang S, Feng W (2009) Energy-efficient cluster computing via accurate workload characterization. In: 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009. CCGRID’09, pp 68–75
Iancu C, Hofmeyr S, Blagojevic F, Zheng Y (2010) Oversubscription on multicore processors. In: 2010 IEEE international symposium on parallel distributed processing (IPDPS), pp 1–11
Ioannou N, Kauschke M, Gries M, Cintra M (2011) Phase-based application-driven hierarchical power management on the single-chip cloud computer. In: 2011 international conference on parallel architectures and compilation techniques (PACT), pp 131–142
Isci C, Martonosi M (2003) Runtime power monitoring in high-end processors: methodology and empirical data. In: Proceedings of the 36th annual IEEE/ACM international symposium on microarchitecture, MICRO 36. IEEE Computer Society, Washington, pp 93–104
Kandalla K, Mancini EP, Sur S, Panda DK (2010) Designing power-aware collective communication algorithms for InfiniBand clusters. In: 2010 39th international conference on parallel processing (ICPP), pp 218–227
Li Z, Saad Y, Sosonkina M (2003) pARMS: a parallel version of the algebraic recursive multilevel solver. Numer Linear Algebra Appl 10:485–509
Article MATH MathSciNet Google Scholar
Lim, MY, Freeh VW, Lowenthal DK (2006) Adaptive, transparent frequency and voltage scaling of communication phases in MPI programs. In: Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Liu J, Poff D, Abali B (2009) Evaluating high performance communication: a power perspective. In: Proceedings of the 23rd international conference on supercomputing, pp 326–337
Park J, Shin D, Chang N, Pedram M (2010) Accurate modeling and calculation of delay and energy overheads of dynamic voltage scaling in modern high-performance microprocessors. In 2010 International Symposium on Low-Power Electronics and Design (ISLPED), pp 419–424
Rabenseifner R (1999) Automatic profiling of MPI applications with hardware performance counters. In: Proceedings of the 6th European PVM/MPI Users’ Group meeting on recent advances in parallel virtual machine and message passing interface. Springer, London, pp 35–42
Rountree B, Lownenthal DK, de Supinski, Schulz M, Freeh VW, Bletsch T (2009) Adagio: making dvs practical for complex HPC applications. In: Proceedings of the 23rd international conference on Supercomputing, ICS’09. ACM, New York, pp 460–469
Sundriyal V, Sosonkina M (2011) Per-call energy saving strategies in all-to-all communications. In: Proceedings of the 18th European MPI Users’ Group conference on recent advances in the message passing interface, EuroMPI’11. Springer, Berlin, pp 188–197
Sundriyal V, Sosonkina M, Gaenko A (2012) Runtime procedure for energy savings in applications with point-to-point communications. http://archives.ece.iastate.edu/archive/00000622/
Sundriyal V, Sosonkina M, Gaenko A (2012) Runtime procedure for energy savings in applications with point-to-point communications. In: 2012 IEEE 24th international symposium on computer architecture and high performance computing (SBAC-PAD), pp 155–162
Sundriyal V, Sosonkina M, Liu F, Schmidt MW (2011) Dynamic frequency scaling and energy saving in quantum chemistry applications. In: Proceedings of the 2011 IEEE international symposium on parallel and distributed processing workshops and PhD Forum, IPDPSW ’11. IEEE Computer Society, Washington, pp 837–845
Sundriyal V, Sosonkina M, Zhang Z (2012) Achieving energy efficiency during collective communications. Concurr Comput Pract Experience
Thakur R, Rabenseifner R (2005) Optimization of collective communication operations in mpich. Int J High Perform Comput Appl 19:49–66
Article Google Scholar
Vishnu A, Song S, Marquez A, Barker K, Kerbyson D, Cameron K, Balaji P (2010) Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models. In: Proceedings of the 2010 IEEE/ACM international conference on green computing and communications & International conference on Cyber, physical and social computing, GREENCOM-CPSCOM ’10. IEEE Computer Society, Washington, pp 229–236

Download references

Acknowledgments

This work was supported in part by Ames Laboratory and Iowa State University under the contract DE-AC02-07CH11358 with the US Department of Energy, by the Air Force Office of Scientific Research under the AFOSR award FA9550-12-1-0476, and by the National Science Foundation grants NSF/OCI—0941434, 0904782, 1047772. The authors would like to thank Dr. Rong Ge for her valuable feedback and for providing the CPU Miser software, and to the anonymous referees for the their comments and suggestions, all of which helped to improve the paper.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Iowa State University, Ames, USA
Vaibhav Sundriyal & Zhao Zhang
Old Dominion University, Norfolk, USA
Masha Sosonkina

Authors

Vaibhav Sundriyal
View author publications
You can also search for this author in PubMed Google Scholar
Masha Sosonkina
View author publications
You can also search for this author in PubMed Google Scholar
Zhao Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vaibhav Sundriyal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sundriyal, V., Sosonkina, M. & Zhang, Z. Automatic runtime frequency-scaling system for energy savings in parallel applications. J Supercomput 68, 777–797 (2014). https://doi.org/10.1007/s11227-013-1062-0

Download citation

Published: 13 December 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s11227-013-1062-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic runtime frequency-scaling system for energy savings in parallel applications

Abstract

Access this article

Similar content being viewed by others

Heterogeneous Voltage Frequency Scaling of Data-Parallel Applications for Energy Saving on Homogeneous Multicore Platforms

Malleable Techniques and Resource Scheduling to Improve Energy Efficiency in Parallel Applications

Computation-Aware Dynamic Frequency Scaling: Parsimonious Evaluation of the Time-Energy Trade-Off Using Design of Experiments

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic runtime frequency-scaling system for energy savings in parallel applications

Abstract

Access this article

Similar content being viewed by others

Heterogeneous Voltage Frequency Scaling of Data-Parallel Applications for Energy Saving on Homogeneous Multicore Platforms

Malleable Techniques and Resource Scheduling to Improve Energy Efficiency in Parallel Applications

Computation-Aware Dynamic Frequency Scaling: Parsimonious Evaluation of the Time-Energy Trade-Off Using Design of Experiments

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation