Abstract
Energy efficiency and energy-proportional computing have become a central focus in modern supercomputers. Many previous energy-saving strategies have focused solely on the CPU while the DRAM subsystem has not been addressed sufficiently, even though memory consumes about 20 % of the total power in a typical server platform. This paper describes a novel runtime system that scales the frequency of both processor and DRAM-based on the performance and power models, also proposed here. Specifically, first, a performance-loss constraint is chosen for an application, then, an optimal processor–DRAM frequency pair is modeled such that the pair minimizes the energy consumption in a given timeslice. Experiments performed on SPEC CPU™ 2006, NAS NPB, and pARMS benchmarks demonstrate that the proposed runtime system may obtain total energy savings both for memory- and compute-intensive applications. In particular, as much as 22 % of energy was saved with a low performance loss of about 4.8 %.






Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Notes
TOP500 list: http://top500.org/.
Authors’ previous work [31] outlines the pitfalls of the models relying on the user-defined performance-loss tolerance and introduces a model based on instantaneous power consumption.
LMBench web-site: http://www.bitmover.com/lmbench/.
Wattsup meter: https://www.wattsupmeters.com.
SPEC CPU™ 2006 benchmarks web-site: https://www.spec.org/cpu2006/.
References
Begum R, Werner D, Hempstead M, Prasad G, Challen G (2015) Energy-performance trade-offs on energy-constrained devices with multi-component DVFS. In: Workload Characterization (IISWC), 2015 IEEE International Symposium on, pp 34–43, Oct 2015
Borkar S (2001) The exascale challenge, 2011. Keynote speech. In: the 12th International Conference on Parallel Architectures and Compilation Techniques
Chen YJ, Yang CL, Lin PS, Lu YC (2015) Thermal/performance characterization of CMPs with 3D-stacked DRAMs under synergistic voltage-frequency control of cores and DRAMs. In: Proceedings of the 2015 Conference on Research in Adaptive and Convergent Systems, RACS, pp 430–436, New York, NY, USA, 2015. ACM
David H, Fallin C, Gorbatov E, Hanebutte UR, Mutlu O (2011) Memory power management via dynamic voltage/frequency scaling. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp 31–40
Deng Q, Meisner D, Bhattacharjee A, Wenisch TF, Bianchini R (2012) Coscale: coordinating cpu and memory system DVFS in server systems. In: Microarchitecture (MICRO), 2012 45th Annual IEEE/ACM International Symposium on, pp 143–154, Dec 2012
Etinski M, Corbalan J, Labarta J, Valero M, Veidenbaum A (2009) Power-aware load balancing of large scale MPI applications. In Parallel Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pp 1–8, May 2009
Freeh VW, Lowenthal DK (2005) Using multiple energy gears in MPI programs on a power-scalable cluster. In: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, pp 164–173
Ge R, Feng X, Feng W, Cameron KW (2007) CPU MISER: A performance-directed, run-time system for power-aware clusters. In: Parallel Processing, 2007. ICPP 2007. International Conference on, pp 18, Sep. 2007
Ge R, Feng X, Song S, Chang HC, Li D, Cameron KW (2010) PowerPack: energy profiling and analysis of high-performance systems and applications. Parallel Distrib Syst IEEE Trans 21:658–671
Gonzales R, Horowitz M (1995) Energy dissipation in general purpose processors. IEEE J Solid State Circuits 31:1277–1284
Hackenberg D, Schone R, Ilsche T, Molka D, Schuchart J, Geyer R (2015) An energy efficiency feature survey of the intel haswell processor. In: Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International, pp 896–904, May 2015
Hennessy JL, Patterson DA (2011) Computer architecture: a quantitative approach (appendix B), 5th edn. Morgan Kaufmann Publishers Inc., San Francisco
Henning JL (2006) SPEC CPU2006 benchmark descriptions. SIGARCH Comput Archit News 34(4):1–17
Hsu CH, Feng W (2005) A power-aware run-time system for high-performance computing. In Supercomputing. In: Proceedings of the ACM/IEEE SC 2005 Conference, pp 1, Nov. 2005
Huang S, Feng W (2009) Energy-efficient cluster computing via accurate workload characterization. In: Cluster Computing and the Grid, 2009. CCGRID’09. 9th IEEE/ACM International Symposium on, pp 68–75, May 2009
Iancu C, Hofmeyr S, Blagojevic F, Zheng Y (2010) Oversubscription on multicore processors. In: Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on, pp 1–11
Intel 64 and IA-32 architectures software developer’s manual combined volumes 3A, 3B, and 3C: System programming guide. http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
Ioannou N, Kauschke M, Gries M, Cintra M (2011) Phase-based application-driven hierarchical power management on the single-chip cloud computer. In: Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, pp 131–142, Oct. 2011
Kandalla K, Mancini EP, Sur S, Panda DK (2010) Designing power-aware collective communication algorithms for InfiniBand clusters. In: Parallel Processing (ICPP), 2010 39th International Conference on, pp 218–227
Lefurgy C, Rajamani K, Rawson F, Felter W, Kistler M, Keller TW (2003) Energy management for commercial servers. Computer 36(12):39–48
Li Z, Saad Y, Sosonkina M (2003) pARMS: a parallel version of the algebraic recursive multilevel solver. Numer Linear Algebra Appl 10:485–509
Lim MY, Freeh VW, Lowenthal DK (2006) Adaptive, transparent frequency and voltage scaling of communication phases in MPI programs. In: Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Mills N, Mills E (2015) Taming the energy use of gaming computers. Energy Efficiency 1–18. doi:10.1007/s12053-015-9371-1
Mittal S (2014) A survey of techniques for improving energy efficiency in embedded computing systems. Int J Comput Aided Eng Technol (IJACET) 6:440–459
Moscibroda T, Mutlu O (2007) Memory performance attacks: Denial of memory service in multi-core systems. In: Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium, SS’07, pp 18:1–18:18, Berkeley, CA, USA, 2007. USENIX Association
Park J, Shin D, Chang N, Pedram M (2010) Accurate modeling and calculation of delay and energy overheads of dynamic voltage scaling in modern high-performance microprocessors. In: 2010 International Symposium on Low-Power Electronics and Design (ISLPED), pp 419–424
Rountree B, Lownenthal DK, de Supinski BR, Schulz M, Freeh VW, Bletsch T (2009) Adagio: making DVS practical for complex HPC applications. In: Proceedings of the 23rd international conference on Supercomputing, ICS’09, pp 460–469, New York, NY, USA, 2009. ACM
Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM, Philadelpha
Sosonkina M, Saad Y, Cai X (2004) Using the parallel algebraic recursive multilevel solver in modern physical applications. Future Gener Comput Syst 20:489–500
Sundriyal V, Sosonkina M (2011) Per-call energy saving strategies in all-to-all communications. In: Proceedings of the 18th European MPI Users’ Group conference on Recent advances in the message passing interface, EuroMPI’11, pp 188–197, Berlin, Heidelberg, 2011. Springer-Verlag
Sundriyal V, Sosonkina M (2013) Initial investigation of a scheme to use instantaneous CPU power consumption for energy savings format. In: Proceedings of the 1st International Workshop on Energy Efficient Supercomputing, E2SC ’13, pp 1:1–1:6, New York, NY, USA, 2013. ACM
Sundriyal V, Sosonkina M, Gaenko A (2012) Runtime procedure for energy savings in applications with point-to-point communications. In: Computer Architecture and High Performance Computing (SBAC-PAD), 2012 IEEE 24th International Symposium on, pp 155–162
Sundriyal V, Sosonkina M, Zhang Z (2012) Achieving energy efficiency during collective communications. Pract Exp Concurr Comput 25:2140–2156
Tiwari A., Schulz M, Arrington L (2015) Predicting optimal power allocation for CPU and DRAM domains. In: Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International, pp 951–959, May 2015
Vishnu A, Song S, Marquez A, Barker K, Kerbyson D, Cameron K, Balaji P (2010) Designing energy efficient communication runtime systems for data centric programming models. In: Proceedings of the 2010 IEEE/ACM Int’l Conference on Green Computing and Communications & Int’l Conference on Cyber, Physical and Social Computing, GREENCOM-CPSCOM ’10, pp 229–236, Washington, DC, USA, 2010. IEEE Computer Society
Zhang Z, Chang JM (2014) A cool scheduler for multi-core systems exploiting program phases. Comput IEEE Trans 63(5):1061–1073
Acknowledgments
This work was supported in part by the Air Force Office of Scientific Research under the AFOSR award FA9550-12-1-0476, by the National Science Foundation grants 0904782, 1047772, 1516096, by the US Department of Energy, Office of Advanced Scientific Computing Research, through the Ames Laboratory, operated by Iowa State University under contract No. DE-AC02-07CH11358, and by the US Department of Defense High Performance Computing Modernization Program, through a HASI grant.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sundriyal, V., Sosonkina, M. Joint frequency scaling of processor and DRAM. J Supercomput 72, 1549–1569 (2016). https://doi.org/10.1007/s11227-016-1680-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1680-4