skip to main content
10.1145/2463209.2488809acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

RASTER: runtime adaptive spatial/temporal error resiliency for embedded processors

Published: 29 May 2013 Publication History

Abstract

Applying error recovery monotonously can either compromise the real-time constraint, or worsen the power/energy envelope. Neither of these violations can be realistically accepted in embedded system design, which expects ultra efficient realization of a given application. In this paper, we propose a HW/SW methodology that exploits both application specific characteristics and Spatial/Temporal redundancy. Our methodology combines design-time and runtime optimizations, to enable the resultant embedded processor to perform runtime adaptive error recovery operations, precisely targeting the reliability-wise critical instruction executions. The proposed error recovery functionality can dynamically 1) evaluate the reliability cost economy (in terms of execution-time and dynamic power), 2) determine the most profitable scheme, and 3) adapt to the corresponding error recovery scheme, which is composed of spatial and temporal redundancy based error recovery operations. The experimental results have shown that our methodology at best can achieve fifty times greater reliability while maintaining the execution time and power deadlines, when compared to the state of the art.

References

[1]
H. Asadi, M. B. Tahoori, and C. Tirumurti. Estimating error propagation probabilities with bounded variances. In DFT, pages 41--49, 2007.
[2]
R. Baumann. Soft errors in advanced computer systems. IEEE Design & Test of Computers, 22(3):258--266, 2005.
[3]
P. Bernstein. Sequoia: a fault-tolerant tightly coupled multiprocessor for transaction processing. Computer, 21(2):37--45, feb. 1988.
[4]
M. R. Guthaus, J. Ringenberg, D. Ernst, T. Mudge, R. Brown, and T. Austin. MiBench: a free, commercially representative embedded benchmark suite. In IEEE International Symposium on Workload Characterization, 2001.
[5]
D. Hunt and P. Marinos. A general purpose cache-aided rollback error recovery (CARER) technique. In proceedings of the 17th international symposium on fault-tolerant coputing systems, pages 170--175, 1987.
[6]
IRC. International Technology Roadmap for Semiconductor 2007 Edition Design, 2007.
[7]
R. Leveugle, A. Calvez, P. Maistri, and P. Vanhauwaert. Statistical fault injection: quantified error and confidence. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE '09, pages 502--506, 3001 Leuven, Belgium, Belgium, 2009. European Design and Automation Association.
[8]
T. Li, R. Ragel, and S. Parameswaran. Reli: Hardware/software checkpoint and recovery scheme for embedded processors. In Design, Automation Test in Europe Conference Exhibition (DATE), 2012, pages 875--880, march 2012.
[9]
P. Mishra and N. Dutt. Processor Description Languages, Volume 1. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008.
[10]
S. Mukherjee, C. Weaver, J. Emer, S. Reinhardt, and T. Austin. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In Microarchitecture, 2003. MICRO-36. Proceedings. 36th Annual IEEE/ACM International Symposium on, pages 29--40, dec. 2003.
[11]
J. Peddersen, S. L. Shee, A. Janapsatya, and S. Parameswaran. Rapid embedded hardware/software system generation. VLSI Design, International Conference on, 0:111--116, 2005.
[12]
J. M. Rabaey and S. Malik. Challenges and solutions for late- and post-silicon design. IEEE Des. Test, 25:296--302, July 2008.
[13]
K. Rajamani, H. Hanson, J. Rubio, S. Ghiasi, and F. Rawson. Application-aware power management. In Workload Characterization, 2006 IEEE International Symposium on, pages 39--48, oct. 2006.
[14]
S. Rehman, M. Shafique, and J. Henkel. Instruction scheduling for reliability-aware compilation. In Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE, pages 1288--1296, june 2012.
[15]
S. Rehman, M. Shafique, F. Kriebel, and J. Henkel. Reliable software for unreliable hardware: embedded code generation aiming at reliability. In Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, CODES+ISSS '11, pages 237--246, New York, NY, USA, 2011. ACM.
[16]
G. A. Reis, J. Chang, and D. I. August. Automatic instruction-level software-only recovery. IEEE Micro, 27:36--47, January 2007.
[17]
J. Sartori and R. Kumar. Architecting processors to allow voltage/reliability tradeoffs. In CASES, pages 115--124, 2011.
[18]
V. Sridharan and D. R. Kaeli. Using hardware vulnerability factors to enhance avf analysis. In Proceedings of the 37th annual international symposium on Computer architecture, ISCA '10, pages 461--472, New York, NY, USA, 2010. ACM.
[19]
R. Teodorescu, J. Nakano, and J. Torrellas. SWICH: A prototype for efficient cache-level checkpointing and rollback. IEEE Micro, 26:28--40, 2006.
[20]
J. F. Wakerly. Transient failures in triple modular redundancy systems with sequential modules. IEEE Trans. Computers, 24(5):570--573, 1975.
[21]
B. Zhao, H. Aydin, and D. Zhu. Enhanced reliability-aware power management through shared recovery technique. In ICCAD, pages 63--70, 2009.

Cited By

View all
  • (2022)A Survey of Fault-Tolerance Techniques for Embedded Systems From the Perspective of Power, Energy, and Thermal IssuesIEEE Access10.1109/ACCESS.2022.314421710(12229-12251)Online publication date: 2022
  • (2020)Fault-Tolerant Computing with Heterogeneous Hardening ModesDependable Embedded Systems10.1007/978-3-030-52017-5_7(161-180)Online publication date: 10-Dec-2020
  • (2019)Multithreaded and Reconvergent Aware Algorithms for Accurate Digital Circuits Reliability EstimationIEEE Transactions on Reliability10.1109/TR.2018.287647568:2(514-525)Online publication date: Jun-2019
  • Show More Cited By

Index Terms

  1. RASTER: runtime adaptive spatial/temporal error resiliency for embedded processors

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        DAC '13: Proceedings of the 50th Annual Design Automation Conference
        May 2013
        1285 pages
        ISBN:9781450320719
        DOI:10.1145/2463209
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        In-Cooperation

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 29 May 2013

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. ASIP
        2. checkpoint recovery
        3. redundancy
        4. runtime adaptation
        5. soft error

        Qualifiers

        • Research-article

        Conference

        DAC '13
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

        Upcoming Conference

        DAC '25
        62nd ACM/IEEE Design Automation Conference
        June 22 - 26, 2025
        San Francisco , CA , USA

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)4
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 08 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2022)A Survey of Fault-Tolerance Techniques for Embedded Systems From the Perspective of Power, Energy, and Thermal IssuesIEEE Access10.1109/ACCESS.2022.314421710(12229-12251)Online publication date: 2022
        • (2020)Fault-Tolerant Computing with Heterogeneous Hardening ModesDependable Embedded Systems10.1007/978-3-030-52017-5_7(161-180)Online publication date: 10-Dec-2020
        • (2019)Multithreaded and Reconvergent Aware Algorithms for Accurate Digital Circuits Reliability EstimationIEEE Transactions on Reliability10.1109/TR.2018.287647568:2(514-525)Online publication date: Jun-2019
        • (2019)A Roadmap Toward the Resilient Internet of Things for Cyber-Physical SystemsIEEE Access10.1109/ACCESS.2019.28919697(13260-13283)Online publication date: 2019
        • (2018)Declarative ResilienceACM Transactions on Embedded Computing Systems10.1145/321055917:4(1-27)Online publication date: 24-Jul-2018
        • (2017)Adroit Use of Dark Silicon for Power, Performance and Reliability Optimisation of NoCsThe Dark Side of Silicon10.1007/978-3-319-31596-6_11(291-325)Online publication date: 1-Jan-2017
        • (2016)Critical nodes count algorithm for accurate input vectors reliability rankingProceedings of the Summer Computer Simulation Conference10.5555/3015574.3015593(1-7)Online publication date: 24-Jul-2016
        • (2016)Processor Design for Soft ErrorsACM Computing Surveys10.1145/299635749:3(1-44)Online publication date: 8-Nov-2016
        • (2016)Toward Smart Embedded SystemsACM Transactions on Embedded Computing Systems10.1145/287293615:2(1-27)Online publication date: 17-Feb-2016
        • (2016)Two-State Checkpointing for Energy-Efficient Fault Tolerance in Hard Real-Time SystemsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2015.251283924:7(2426-2437)Online publication date: Jul-2016
        • Show More Cited By

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media