ABSTRACT
One of the key challenges for multi-core processors in the nano-CMOS era is dealing with the increased temperatures. It is imperative that peak temperatures are reduced and that heat is spread as evenly on the chip as possible to avoid mutual heating and high thermal gradients between processor cores. Approaches have emerged which share a global power budget among multiple cores in order to meet these objectives. However, while these approaches act proactively in distributing power across the chip before thermal problems arise, changes in the respective strategies remain reactive to a temperature threshold. Our approach uses reinforcement learning in order to dynamically change what we call power trading strategies before thermal thresholds are hit based on past recorded observations. Through learning, our hierarchical approach is also able to distribute so-called multiple power budgets at once thereby making power trading more effective, reaching a decrease in peak temperatures of around 4 compared to a fully distributed approach - which can be critical at near-threshold temperatures in terms of transient errors - while also decreasing the number of deadline misses by a factor of 7. Our technique has been verified by deploying a thermal camera.
- N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, S. K. Reinhardt. "The M5 Simulator: Modeling Networked Systems,". IEEE Micro, vol. 26, no. 4, pp. 52--60, July/August, 2006. Google ScholarDigital Library
- Borkar, S., "Thousand core chips: a technology perspective,". In Proc. of the Design Automation Conference (DAC), pp. 746--749, 2007. Google ScholarDigital Library
- Coskun, A. K, Rosing, T. S., and Gross, K. C. ''Utilizing Predictors for Efficient Thermal Management in Multiprocessor SoCs," In IEEE Transactions on CAD, vol. 28 no. 10, pp. 1503--1516, Oct. 2009. Google ScholarDigital Library
- Coskun, A. K., Rosing, T. S., and Gross, K. C. ''Temperature Management in Multiprocessor SoCs using Online Learning,". In Proc. of the Design Automation Conference (DAC), pp. 890--893. Google ScholarDigital Library
- Ebi, T., Al Faruque, M. A., and Henkel, J. ''TAPE: Thermal-Aware Agent-Based Power Economy for Multi/Manycore Architectures,". In Proc. of the International Conference on Computer-Aided Design (ICCAD). pp. 302--309, 2009. Google ScholarDigital Library
- Herbert, S., and Marculescu, D. ''Analysis of Dynamic Voltage/Frequency Scaling in Chip-Multiprocessors,". In Proc. of the International Symposium on Low Power Electronics and Design (ISLPED), pp. 38--43, 2007. Google ScholarDigital Library
- W. Huang, M. R. Stan, K. Skadron, K. Sankaranarayanan, S. Ghosh, and S. Velusam. ''Compact Thermal Modeling for Temperature-Aware Design,". In Proc. of the Design Automation Conference (DAC), pp. 878--883. Google ScholarDigital Library
- Sheng Li, Jung Ho Ahn, Strong, R.D., Brockman, J. B., Tullsen, D. M., and Jouppi, N. P. "McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures,".Google Scholar
- Narayanan, V. and Xie, Y., ''Reliability concerns in embedded system designs," IEEE Computer, vol. 39, no. 1, pp. 118--120, 2006. Google ScholarDigital Library
- Umit Y. Ogras, Radu Marculescu, Puru Choudhary, and Diana Marculescu. ''Voltage-frequency island partitioning for GALS-based networks-on-chip,". In Proc. of the Design Automation Conference (DAC), pp. 110--115, 2007. Google ScholarDigital Library
- Pinkesh J. Shah, Yoni Aizik, Muhammad Mhameed, and Gila Kamhi, ''Challenges and methodologies for efficient power budgeting across the die,'' In Proc. of the 20th symposium on Great lakes symposium on VLSI (GLSVLSI), pp. 317--322, 201. Google ScholarDigital Library
- Sutton, R. S., and A. G. Barto, ''Reinforcement Learning: An Introduction,'' The MIT Press, Cambridge, MA, 1998. Google ScholarDigital Library
- M. D. Powell, M. Gomaa, and T. N. Vijaykumar. ''Heat-and-Run: Leveraging SMT and CMP to Manage Power Density Through the System''. In Proc. of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). pp. 260--270, 2004. Google ScholarDigital Library
- I. Yeo, C. C. Liu, E. J. Kim. ''Predictive Dynamic Thermal Management for Multicore Systems,". In Proc. of the Design Automation Conference (DAC), pp. 734--739, 2008. Google ScholarDigital Library
- Intel: ''Intel Lifts the Hood on its 'Single-Chip' Cloud Computer'''. http://spectrum.ieee.org/semiconductors/processors/intel-lifts-the-hood-on-its-singlechip-cloud-computer. 2010.Google Scholar
- http://www.dias-infrared.de/pdf/pyroview_compact_en.pdf.Google Scholar
- http://www.freertos.or.Google Scholar
- http://www.spec.org/cpu2006.Google Scholar
Index Terms
- Economic learning for thermal-aware power budgeting in many-core architectures
Recommendations
TAPE: thermal-aware agent-based power economy for multi/many-core architectures
ICCAD '09: Proceedings of the 2009 International Conference on Computer-Aided DesignA growing challenge in embedded system design is coping with increasing power densities resulting from packing more and more transistors onto a small die area, which in turn transform into thermal hotspots. In the current late silicon era silicon ...
Power-Aware Deployment and Control of Forced-Convection and Thermoelectric Coolers
DAC '14: Proceedings of the 51st Annual Design Automation ConferenceAdvances in the thermoelectric cooling technology have made it one of the promising solutions for spot cooling in VLSI circuits. Thermoelectric coolers (TECs) generate heat during their operation. This heat plus the heat generated in the circuit should ...
COOL: control-based optimization of load-balancing for thermal behavior
CODES+ISSS '12: Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesisThe thermal behavior of on-chip systems is crucial in order to maintain a reliable operation throughout its lifetime. Potential thermal hotspots like, for example, register files are particularly responsible for unreliable behavior and have therefore ...
Comments