research-article

RASTER: runtime adaptive spatial/temporal error resiliency for embedded processors

Authors:

Muhammad Shafique,

Jude Angelo Ambrose,

Sri ParameswaranAuthors Info & Claims

DAC '13: Proceedings of the 50th Annual Design Automation Conference

Article No.: 62, Pages 1 - 7

https://doi.org/10.1145/2463209.2488809

Published: 29 May 2013 Publication History

Abstract

Applying error recovery monotonously can either compromise the real-time constraint, or worsen the power/energy envelope. Neither of these violations can be realistically accepted in embedded system design, which expects ultra efficient realization of a given application. In this paper, we propose a HW/SW methodology that exploits both application specific characteristics and Spatial/Temporal redundancy. Our methodology combines design-time and runtime optimizations, to enable the resultant embedded processor to perform runtime adaptive error recovery operations, precisely targeting the reliability-wise critical instruction executions. The proposed error recovery functionality can dynamically 1) evaluate the reliability cost economy (in terms of execution-time and dynamic power), 2) determine the most profitable scheme, and 3) adapt to the corresponding error recovery scheme, which is composed of spatial and temporal redundancy based error recovery operations. The experimental results have shown that our methodology at best can achieve fifty times greater reliability while maintaining the execution time and power deadlines, when compared to the state of the art.

References

[1]

H. Asadi, M. B. Tahoori, and C. Tirumurti. Estimating error propagation probabilities with bounded variances. In DFT, pages 41--49, 2007.

Digital Library

[2]

R. Baumann. Soft errors in advanced computer systems. IEEE Design & Test of Computers, 22(3):258--266, 2005.

Digital Library

[3]

P. Bernstein. Sequoia: a fault-tolerant tightly coupled multiprocessor for transaction processing. Computer, 21(2):37--45, feb. 1988.

Digital Library

[4]

M. R. Guthaus, J. Ringenberg, D. Ernst, T. Mudge, R. Brown, and T. Austin. MiBench: a free, commercially representative embedded benchmark suite. In IEEE International Symposium on Workload Characterization, 2001.

Digital Library

[5]

D. Hunt and P. Marinos. A general purpose cache-aided rollback error recovery (CARER) technique. In proceedings of the 17th international symposium on fault-tolerant coputing systems, pages 170--175, 1987.

[6]

IRC. International Technology Roadmap for Semiconductor 2007 Edition Design, 2007.

[7]

R. Leveugle, A. Calvez, P. Maistri, and P. Vanhauwaert. Statistical fault injection: quantified error and confidence. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE '09, pages 502--506, 3001 Leuven, Belgium, Belgium, 2009. European Design and Automation Association.

Digital Library

[8]

T. Li, R. Ragel, and S. Parameswaran. Reli: Hardware/software checkpoint and recovery scheme for embedded processors. In Design, Automation Test in Europe Conference Exhibition (DATE), 2012, pages 875--880, march 2012.

Digital Library

[9]

P. Mishra and N. Dutt. Processor Description Languages, Volume 1. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008.

Digital Library

[10]

S. Mukherjee, C. Weaver, J. Emer, S. Reinhardt, and T. Austin. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In Microarchitecture, 2003. MICRO-36. Proceedings. 36th Annual IEEE/ACM International Symposium on, pages 29--40, dec. 2003.

Digital Library

[11]

J. Peddersen, S. L. Shee, A. Janapsatya, and S. Parameswaran. Rapid embedded hardware/software system generation. VLSI Design, International Conference on, 0:111--116, 2005.

Digital Library

[12]

J. M. Rabaey and S. Malik. Challenges and solutions for late- and post-silicon design. IEEE Des. Test, 25:296--302, July 2008.

Digital Library

[13]

K. Rajamani, H. Hanson, J. Rubio, S. Ghiasi, and F. Rawson. Application-aware power management. In Workload Characterization, 2006 IEEE International Symposium on, pages 39--48, oct. 2006.

[14]

S. Rehman, M. Shafique, and J. Henkel. Instruction scheduling for reliability-aware compilation. In Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE, pages 1288--1296, june 2012.

Digital Library

[15]

S. Rehman, M. Shafique, F. Kriebel, and J. Henkel. Reliable software for unreliable hardware: embedded code generation aiming at reliability. In Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, CODES+ISSS '11, pages 237--246, New York, NY, USA, 2011. ACM.

Digital Library

[16]

G. A. Reis, J. Chang, and D. I. August. Automatic instruction-level software-only recovery. IEEE Micro, 27:36--47, January 2007.

Digital Library

[17]

J. Sartori and R. Kumar. Architecting processors to allow voltage/reliability tradeoffs. In CASES, pages 115--124, 2011.

Digital Library

[18]

V. Sridharan and D. R. Kaeli. Using hardware vulnerability factors to enhance avf analysis. In Proceedings of the 37th annual international symposium on Computer architecture, ISCA '10, pages 461--472, New York, NY, USA, 2010. ACM.

Digital Library

[19]

R. Teodorescu, J. Nakano, and J. Torrellas. SWICH: A prototype for efficient cache-level checkpointing and rollback. IEEE Micro, 26:28--40, 2006.

Digital Library

[20]

J. F. Wakerly. Transient failures in triple modular redundancy systems with sequential modules. IEEE Trans. Computers, 24(5):570--573, 1975.

Digital Library

[21]

B. Zhao, H. Aydin, and D. Zhu. Enhanced reliability-aware power management through shared recovery technique. In ICCAD, pages 63--70, 2009.

Digital Library

Cited By

Safari SAnsari MKhdr HGohari-Nazari PYari-Karin SYeganeh-Khaksar AHessabi SEjlali AHenkel J(2022)A Survey of Fault-Tolerance Techniques for Embedded Systems From the Perspective of Power, Energy, and Thermal IssuesIEEE Access10.1109/ACCESS.2022.314421710(12229-12251)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3144217
Kriebel FKhalid FPrabakaran BRehman SShafique M(2020)Fault-Tolerant Computing with Heterogeneous Hardening ModesDependable Embedded Systems10.1007/978-3-030-52017-5_7(161-180)Online publication date: 10-Dec-2020
https://doi.org/10.1007/978-3-030-52017-5_7
Ibrahim WIbrahim H(2019)Multithreaded and Reconvergent Aware Algorithms for Accurate Digital Circuits Reliability EstimationIEEE Transactions on Reliability10.1109/TR.2018.287647568:2(514-525)Online publication date: Jun-2019
https://doi.org/10.1109/TR.2018.2876475
Show More Cited By

Index Terms

RASTER: runtime adaptive spatial/temporal error resiliency for embedded processors
1. Computer systems organization

Recommendations

CARE: compiler-assisted recovery from soft failures
SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

As processors continue to boost the system performance with higher circuit density, shrinking process technology and near-threshold voltage (NTV) operations, they are projected to be more vulnerable to transient faults, which have become one of the ...
A runtime model-based framework for specifying and verifying adaptive RTE systems

Adaptive Real-Time Embedded Systems (RTES) may execute in an unpredictable context that is impossible to definitely consider in the development time. Therefore, these systems are required to adapt their behaviour to unpredicted changes at runtime in order ...
Adaptive Parallelism Exploitation under Physical and Real-Time Constraints for Resilient Systems
Special Issue on 11th International Conference on Field-Programmable Technology (FPT'12) and Special Issue on the 7th International Workshop on Reconfigurable Communication-Centric Systems-on-Chip (ReCoSoC'12)

This article introduces the resilient adaptive algebraic architecture that aims at adapting parallelism exploitation of a matrix multiplication algorithm in a time-deterministic fashion to reduce power consumption while meeting real-time deadlines ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '13: Proceedings of the 50th Annual Design Automation Conference

May 2013

1285 pages

ISBN:9781450320719

DOI:10.1145/2463209

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

EDAC: Electronic Design Automation Consortium
SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 May 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

DAC '13

Sponsor:

EDAC
SIGDA

DAC '13: The 50th Annual Design Automation Conference 2013

May 29 - June 7, 2013

Texas, Austin

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
266
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Safari SAnsari MKhdr HGohari-Nazari PYari-Karin SYeganeh-Khaksar AHessabi SEjlali AHenkel J(2022)A Survey of Fault-Tolerance Techniques for Embedded Systems From the Perspective of Power, Energy, and Thermal IssuesIEEE Access10.1109/ACCESS.2022.314421710(12229-12251)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3144217
Kriebel FKhalid FPrabakaran BRehman SShafique M(2020)Fault-Tolerant Computing with Heterogeneous Hardening ModesDependable Embedded Systems10.1007/978-3-030-52017-5_7(161-180)Online publication date: 10-Dec-2020
https://doi.org/10.1007/978-3-030-52017-5_7
Ibrahim WIbrahim H(2019)Multithreaded and Reconvergent Aware Algorithms for Accurate Digital Circuits Reliability EstimationIEEE Transactions on Reliability10.1109/TR.2018.287647568:2(514-525)Online publication date: Jun-2019
https://doi.org/10.1109/TR.2018.2876475
Ratasich DKhalid FGeissler FGrosu RShafique MBartocci E(2019)A Roadmap Toward the Resilient Internet of Things for Cyber-Physical SystemsIEEE Access10.1109/ACCESS.2019.28919697(13260-13283)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2019.2891969
Omar HShi QAhmad MDogan HKhan O(2018)Declarative ResilienceACM Transactions on Embedded Computing Systems10.1145/321055917:4(1-27)Online publication date: 24-Jul-2018
https://dl.acm.org/doi/10.1145/3210559
Bokhari HShafique MHenkel JParameswaran S(2017)Adroit Use of Dark Silicon for Power, Performance and Reliability Optimisation of NoCsThe Dark Side of Silicon10.1007/978-3-319-31596-6_11(291-325)Online publication date: 1-Jan-2017
https://doi.org/10.1007/978-3-319-31596-6_11
Ibrahim WAmer Hde Rango FRisco Martín J(2016)Critical nodes count algorithm for accurate input vectors reliability rankingProceedings of the Summer Computer Simulation Conference10.5555/3015574.3015593(1-7)Online publication date: 24-Jul-2016
https://dl.acm.org/doi/10.5555/3015574.3015593
Li TAmbrose JRagel RParameswaran S(2016)Processor Design for Soft ErrorsACM Computing Surveys10.1145/299635749:3(1-44)Online publication date: 8-Nov-2016
https://dl.acm.org/doi/10.1145/2996357
Dutt NJantsch ASarma S(2016)Toward Smart Embedded SystemsACM Transactions on Embedded Computing Systems10.1145/287293615:2(1-27)Online publication date: 17-Feb-2016
https://dl.acm.org/doi/10.1145/2872936
Salehi MKhavari Tavana MRehman SShafique MEjlali AHenkel J(2016)Two-State Checkpointing for Energy-Efficient Fault Tolerance in Hard Real-Time SystemsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2015.251283924:7(2426-2437)Online publication date: Jul-2016
https://doi.org/10.1109/TVLSI.2015.2512839
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten