skip to main content
10.1145/2463209.2488735acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Workload and user experience-aware dynamic reliability management in multicore processors

Published: 29 May 2013 Publication History

Abstract

Reliability is a major concern for nanoscale CMOS circuits. Degradation phenomena such as Electromigration, Negative Bias Temperature Instability, Time Dependent Dielectric Breakdown worsen with transistor scaling. Dynamic Reliability Management (DRM) techniques reduce reliability loss at runtime by constraining operating points, but they face the challenge of reducing user experience degradation while meeting a lifetime target. In this work we propose a sensor based hierarchical controller for multicore processor DRM, exploiting the major gap between the time scales of workload variations and reliability loss. We improve performance and user experience by locally relaxing reliability-induced operating point constraints, while meeting them over the large time windows relevant for reliability. With respect to the state-of-the-art, our solution guarantees timely execution of 100% of latency-critical applications, and have a 4% performance improvement over the whole lifetime.

References

[1]
Apple imovie, itunes.apple.com/en/app/-imovie/id377298193?mt=8.
[2]
ionroad, http://www.ionroad.com/.
[3]
A. Bartolini, M. Cacciari, A. Tilli, and L. Benini. Thermal and energy management of high-performance multicores: Distributed and self-calibrating model-predictive controller. IEEE Transactions on Parallel and Distributed Systems, 24(1):170--183, 2013.
[4]
J. Blome, S. Feng, S. Gupta, and S. Mahlke. Self-calibrating online wearout detection. In Microarchitecture, 2007. MICRO 2007. 40th Annual IEEE/ACM International Symposium on.
[5]
A. K. Coskun, R. Strong, D. M. Tullsen, and T. Simunic Rosing. Evaluating the impact of job scheduling and power management on processor lifetime for chip multiprocessors. In Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems, SIGMETRICS '09, pages 169--180, New York, NY, USA, 2009. ACM.
[6]
R. Degraeve, N. Pangon, B. Kaczer, T. Nigam, G. Groeseneken, and A. Naem. Temperature acceleration of oxide breakdown and its impact on ultra-thin gate oxide reliability. In VLSI Technology, 1999. Digest of Technical Papers. 1999 Symposium on.
[7]
H. Falaki, R. Mahajan, S. Kandula, D. Lymberopoulos, R. Govindan, and D. Estrin. Diversity in smartphone usage. In Proceedings of the 8th international conference on Mobile systems, applications, and services, MobiSys '10, 2010.
[8]
B. Gyselinckx, C. Van Hoof, J. Ryckaert, R. Yazicioglu, P. Fiorini, and V. Leonov. Human++: autonomous wireless sensors for body area networks. In Custom Integrated Circuits Conference, 2005. Proceedings of the IEEE 2005, pages 13--19, sept. 2005.
[9]
C. Hu. Gate oxide scaling limits and projection. In Electron Devices Meeting, 1996. IEDM '96., International, pages 319--322, dec. 1996.
[10]
E. Karl, D. Blaauw, D. Sylvester, and T. Mudge. Reliability modeling and management in dynamic microprocessor-based systems. In Design Automation Conference, 2006 43rd ACM/IEEE.
[11]
F. Paterna, A. Acquaviva, A. Caprara, F. Papariello, G. Desoli, and L. Benini. Variability-aware task allocation for energy-efficient quality of service provisioning in embedded streaming multimedia applications. Computers, IEEE Transactions on, 2012.
[12]
S. Sharifi, R. Ayoub, and T. Rosing. Tempomp: Integrated prediction and management of temperature in heterogeneous mpsocs. In Design, Automation Test in Europe Conference Exhibition (DATE), 2012, pages 593--598, march 2012.
[13]
P. Singh, E. Karl, D. Blaauw, and D. Sylvester. Compact degradation sensors for monitoring nbti and oxide degradation. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 2012.
[14]
P. Singh, E. Karl, D. Sylvester, and D. Blaauw. Dynamic nbti management using a 45 nm multi-degradation sensor. Circuits and Systems I: Regular Papers, IEEE Transactions on, 58(9):2026--2037, sept. 2011.
[15]
J. Srinivasan, S. Adve, P. Bose, and J. Rivers. The case for lifetime reliability-aware microprocessors. In Computer Architecture, 2004. Proceedings. 31st Annual International Symposium on.
[16]
J. Stathis. Physical and predictive models of ultra thin oxide reliability in cmos devices and circuits. In Reliability Physics Symposium, 2001. Proceedings. 39th Annual. 2001 IEEE International.
[17]
S. Wang and J.-J. Chen. Thermal-aware lifetime reliability in multicore systems. In Quality Electronic Design (ISQED), 2010 11th International Symposium on, pages 399--405, march 2010.
[18]
E. Wu, D. Harmon, and L.-K. Han. Interrelationship of voltage and temperature dependence of oxide breakdown for ultrathin oxides. Electron Device Letters, IEEE, 21(7):362--364, july 2000.
[19]
M. yu Hsieh. A scalable simulation framework for evaluating thermal management techniques and the lifetime reliability of multithreaded multicore systems. In Green Computing Conference and Workshops (IGCC), 2011 International, pages 1--6, july 2011.
[20]
C. Zhuo, K. Chopra, D. Sylvester, and D. Blaauw. Process variation and temperature-aware full chip oxide breakdown reliability analysis. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 2011.
[21]
C. Zhuo, D. Sylvester, and D. Blaauw. Process variation and temperature-aware reliability management. In Design, Automation Test in Europe Conference Exhibition (DATE), 2010.

Cited By

View all
  • (2022)DarkGates: A Hybrid Power-Gating Architecture to Mitigate the Performance Impact of Dark-Silicon in High Performance Processors2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00089(1170-1183)Online publication date: Apr-2022
  • (2021)Improving Mean Time to Failure of IoT Networks with Reliability-Aware Routing2021 10th Mediterranean Conference on Embedded Computing (MECO)10.1109/MECO52532.2021.9460211(1-4)Online publication date: 7-Jun-2021
  • (2020)An Adaptive Thermal Management Framework for Heterogeneous Multi-Core ProcessorsIEEE Transactions on Computers10.1109/TC.2020.297006269:6(894-906)Online publication date: 1-Jun-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '13: Proceedings of the 50th Annual Design Automation Conference
May 2013
1285 pages
ISBN:9781450320719
DOI:10.1145/2463209
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 May 2013

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

DAC '13
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)DarkGates: A Hybrid Power-Gating Architecture to Mitigate the Performance Impact of Dark-Silicon in High Performance Processors2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00089(1170-1183)Online publication date: Apr-2022
  • (2021)Improving Mean Time to Failure of IoT Networks with Reliability-Aware Routing2021 10th Mediterranean Conference on Embedded Computing (MECO)10.1109/MECO52532.2021.9460211(1-4)Online publication date: 7-Jun-2021
  • (2020)An Adaptive Thermal Management Framework for Heterogeneous Multi-Core ProcessorsIEEE Transactions on Computers10.1109/TC.2020.297006269:6(894-906)Online publication date: 1-Jun-2020
  • (2019)A Survey of Prediction and Classification Techniques in Multicore Processor SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.287869930:5(1184-1200)Online publication date: 1-May-2019
  • (2019)A Lifetime Reliability-Constrained Runtime Mapping for Throughput Optimization in Many-Core SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.285516838:9(1771-1784)Online publication date: Sep-2019
  • (2019)Simulating Wear-out Effects of Asymmetric Multicores at the Architecture Level2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)10.1109/DFT.2019.8875468(1-6)Online publication date: Oct-2019
  • (2018)Quantifying the Impact of Variability and Heterogeneity on the Energy Efficiency for a Next-Generation Ultra-Green SupercomputerIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.276615129:7(1575-1588)Online publication date: 1-Jul-2018
  • (2018)Dynamic Lifetime Reliability Management for Chip MultiprocessorsIEEE Transactions on Multi-Scale Computing Systems10.1109/TMSCS.2018.28701874:4(952-958)Online publication date: 1-Oct-2018
  • (2018)TheSPoT: Thermal Stress-Aware Power and Temperature Management for Multiprocessor Systems-on-ChipIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2017.276841737:8(1532-1545)Online publication date: Aug-2018
  • (2018)LifeSim: A lifetime reliability simulator for manycore systems2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC)10.1109/CCWC.2018.8301711(375-381)Online publication date: Jan-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media