Joint frequency scaling of processor and DRAM

Sundriyal, Vaibhav; Sosonkina, Masha

doi:10.1007/s11227-016-1680-4

Joint frequency scaling of processor and DRAM

Published: 02 March 2016

Volume 72, pages 1549–1569, (2016)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Vaibhav Sundriyal¹ &
Masha Sosonkina²

486 Accesses
24 Citations
Explore all metrics

Abstract

Energy efficiency and energy-proportional computing have become a central focus in modern supercomputers. Many previous energy-saving strategies have focused solely on the CPU while the DRAM subsystem has not been addressed sufficiently, even though memory consumes about 20 % of the total power in a typical server platform. This paper describes a novel runtime system that scales the frequency of both processor and DRAM-based on the performance and power models, also proposed here. Specifically, first, a performance-loss constraint is chosen for an application, then, an optimal processor–DRAM frequency pair is modeled such that the pair minimizes the energy consumption in a given timeslice. Experiments performed on SPEC CPU™ 2006, NAS NPB, and pARMS benchmarks demonstrate that the proposed runtime system may obtain total energy savings both for memory- and compute-intensive applications. In particular, as much as 22 % of energy was saved with a low performance loss of about 4.8 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of DRAM Power Consumption in Server Platforms

Main Memory Scaling: Challenges and Solution Directions

Bounding and reducing memory interference in COTS-based multi-core systems

Article 23 February 2016

Hyoseung Kim, Dionisio de Niz, … Ragunathan Rajkumar

Notes

TOP500 list: http://top500.org/.
Authors’ previous work [31] outlines the pitfalls of the models relying on the user-defined performance-loss tolerance and introduces a model based on instantaneous power consumption.
LMBench web-site: http://www.bitmover.com/lmbench/.
See, e.g., http://www.anandtech.com/show/6355/intels-haswell-architecture/8.
Wattsup meter: https://www.wattsupmeters.com.
SPEC CPU™ 2006 benchmarks web-site: https://www.spec.org/cpu2006/.

References

Begum R, Werner D, Hempstead M, Prasad G, Challen G (2015) Energy-performance trade-offs on energy-constrained devices with multi-component DVFS. In: Workload Characterization (IISWC), 2015 IEEE International Symposium on, pp 34–43, Oct 2015
Borkar S (2001) The exascale challenge, 2011. Keynote speech. In: the 12th International Conference on Parallel Architectures and Compilation Techniques
Chen YJ, Yang CL, Lin PS, Lu YC (2015) Thermal/performance characterization of CMPs with 3D-stacked DRAMs under synergistic voltage-frequency control of cores and DRAMs. In: Proceedings of the 2015 Conference on Research in Adaptive and Convergent Systems, RACS, pp 430–436, New York, NY, USA, 2015. ACM
David H, Fallin C, Gorbatov E, Hanebutte UR, Mutlu O (2011) Memory power management via dynamic voltage/frequency scaling. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp 31–40
Deng Q, Meisner D, Bhattacharjee A, Wenisch TF, Bianchini R (2012) Coscale: coordinating cpu and memory system DVFS in server systems. In: Microarchitecture (MICRO), 2012 45th Annual IEEE/ACM International Symposium on, pp 143–154, Dec 2012
Etinski M, Corbalan J, Labarta J, Valero M, Veidenbaum A (2009) Power-aware load balancing of large scale MPI applications. In Parallel Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pp 1–8, May 2009
Freeh VW, Lowenthal DK (2005) Using multiple energy gears in MPI programs on a power-scalable cluster. In: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, pp 164–173
Ge R, Feng X, Feng W, Cameron KW (2007) CPU MISER: A performance-directed, run-time system for power-aware clusters. In: Parallel Processing, 2007. ICPP 2007. International Conference on, pp 18, Sep. 2007
Ge R, Feng X, Song S, Chang HC, Li D, Cameron KW (2010) PowerPack: energy profiling and analysis of high-performance systems and applications. Parallel Distrib Syst IEEE Trans 21:658–671
Article Google Scholar
Gonzales R, Horowitz M (1995) Energy dissipation in general purpose processors. IEEE J Solid State Circuits 31:1277–1284
Article Google Scholar
Hackenberg D, Schone R, Ilsche T, Molka D, Schuchart J, Geyer R (2015) An energy efficiency feature survey of the intel haswell processor. In: Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International, pp 896–904, May 2015
Hennessy JL, Patterson DA (2011) Computer architecture: a quantitative approach (appendix B), 5th edn. Morgan Kaufmann Publishers Inc., San Francisco
MATH Google Scholar
Henning JL (2006) SPEC CPU2006 benchmark descriptions. SIGARCH Comput Archit News 34(4):1–17
Article MathSciNet Google Scholar
Hsu CH, Feng W (2005) A power-aware run-time system for high-performance computing. In Supercomputing. In: Proceedings of the ACM/IEEE SC 2005 Conference, pp 1, Nov. 2005
Huang S, Feng W (2009) Energy-efficient cluster computing via accurate workload characterization. In: Cluster Computing and the Grid, 2009. CCGRID’09. 9th IEEE/ACM International Symposium on, pp 68–75, May 2009
Iancu C, Hofmeyr S, Blagojevic F, Zheng Y (2010) Oversubscription on multicore processors. In: Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pp 1–11
Intel 64 and IA-32 architectures software developer’s manual combined volumes 3A, 3B, and 3C: System programming guide. http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
Ioannou N, Kauschke M, Gries M, Cintra M (2011) Phase-based application-driven hierarchical power management on the single-chip cloud computer. In: Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, pp 131–142, Oct. 2011
Kandalla K, Mancini EP, Sur S, Panda DK (2010) Designing power-aware collective communication algorithms for InfiniBand clusters. In: Parallel Processing (ICPP), 2010 39th International Conference on, pp 218–227
Lefurgy C, Rajamani K, Rawson F, Felter W, Kistler M, Keller TW (2003) Energy management for commercial servers. Computer 36(12):39–48
Article Google Scholar
Li Z, Saad Y, Sosonkina M (2003) pARMS: a parallel version of the algebraic recursive multilevel solver. Numer Linear Algebra Appl 10:485–509
Article MathSciNet MATH Google Scholar
Lim MY, Freeh VW, Lowenthal DK (2006) Adaptive, transparent frequency and voltage scaling of communication phases in MPI programs. In: Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Mills N, Mills E (2015) Taming the energy use of gaming computers. Energy Efficiency 1–18. doi:10.1007/s12053-015-9371-1
Mittal S (2014) A survey of techniques for improving energy efficiency in embedded computing systems. Int J Comput Aided Eng Technol (IJACET) 6:440–459
Article Google Scholar
Moscibroda T, Mutlu O (2007) Memory performance attacks: Denial of memory service in multi-core systems. In: Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, SS’07, pp 18:1–18:18, Berkeley, CA, USA, 2007. USENIX Association
Park J, Shin D, Chang N, Pedram M (2010) Accurate modeling and calculation of delay and energy overheads of dynamic voltage scaling in modern high-performance microprocessors. In: 2010 International Symposium on Low-Power Electronics and Design (ISLPED), pp 419–424
Rountree B, Lownenthal DK, de Supinski BR, Schulz M, Freeh VW, Bletsch T (2009) Adagio: making DVS practical for complex HPC applications. In: Proceedings of the 23rd international conference on Supercomputing, ICS’09, pp 460–469, New York, NY, USA, 2009. ACM
Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM, Philadelpha
Book MATH Google Scholar
Sosonkina M, Saad Y, Cai X (2004) Using the parallel algebraic recursive multilevel solver in modern physical applications. Future Gener Comput Syst 20:489–500
Article Google Scholar
Sundriyal V, Sosonkina M (2011) Per-call energy saving strategies in all-to-all communications. In: Proceedings of the 18th European MPI Users’ Group conference on Recent advances in the message passing interface, EuroMPI’11, pp 188–197, Berlin, Heidelberg, 2011. Springer-Verlag
Sundriyal V, Sosonkina M (2013) Initial investigation of a scheme to use instantaneous CPU power consumption for energy savings format. In: Proceedings of the 1st International Workshop on Energy Efficient Supercomputing, E2SC ’13, pp 1:1–1:6, New York, NY, USA, 2013. ACM
Sundriyal V, Sosonkina M, Gaenko A (2012) Runtime procedure for energy savings in applications with point-to-point communications. In: Computer Architecture and High Performance Computing (SBAC-PAD), 2012 IEEE 24th International Symposium on, pp 155–162
Sundriyal V, Sosonkina M, Zhang Z (2012) Achieving energy efficiency during collective communications. Pract Exp Concurr Comput 25:2140–2156
Article Google Scholar
Tiwari A., Schulz M, Arrington L (2015) Predicting optimal power allocation for CPU and DRAM domains. In: Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International, pp 951–959, May 2015
Vishnu A, Song S, Marquez A, Barker K, Kerbyson D, Cameron K, Balaji P (2010) Designing energy efficient communication runtime systems for data centric programming models. In: Proceedings of the 2010 IEEE/ACM Int’l Conference on Green Computing and Communications & Int’l Conference on Cyber, Physical and Social Computing, GREENCOM-CPSCOM ’10, pp 229–236, Washington, DC, USA, 2010. IEEE Computer Society
Zhang Z, Chang JM (2014) A cool scheduler for multi-core systems exploiting program phases. Comput IEEE Trans 63(5):1061–1073
Article MathSciNet Google Scholar

Download references

Acknowledgments

This work was supported in part by the Air Force Office of Scientific Research under the AFOSR award FA9550-12-1-0476, by the National Science Foundation grants 0904782, 1047772, 1516096, by the US Department of Energy, Office of Advanced Scientific Computing Research, through the Ames Laboratory, operated by Iowa State University under contract No. DE-AC02-07CH11358, and by the US Department of Defense High Performance Computing Modernization Program, through a HASI grant.

Author information

Authors and Affiliations

ODU Research Foundation, Norfolk, USA
Vaibhav Sundriyal
Old Dominion University, Norfolk, USA
Masha Sosonkina

Authors

Vaibhav Sundriyal
View author publications
You can also search for this author in PubMed Google Scholar
Masha Sosonkina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vaibhav Sundriyal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sundriyal, V., Sosonkina, M. Joint frequency scaling of processor and DRAM. J Supercomput 72, 1549–1569 (2016). https://doi.org/10.1007/s11227-016-1680-4

Download citation

Published: 02 March 2016
Issue Date: April 2016
DOI: https://doi.org/10.1007/s11227-016-1680-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Joint frequency scaling of processor and DRAM

Abstract

Access this article

Similar content being viewed by others

Evaluation of DRAM Power Consumption in Server Platforms

Main Memory Scaling: Challenges and Solution Directions

Bounding and reducing memory interference in COTS-based multi-core systems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Joint frequency scaling of processor and DRAM

Abstract

Access this article

Similar content being viewed by others

Evaluation of DRAM Power Consumption in Server Platforms

Main Memory Scaling: Challenges and Solution Directions

Bounding and reducing memory interference in COTS-based multi-core systems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation