skip to main content
10.1145/2463209.2488809acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

RASTER: runtime adaptive spatial/temporal error resiliency for embedded processors

Published:29 May 2013Publication History

ABSTRACT

Applying error recovery monotonously can either compromise the real-time constraint, or worsen the power/energy envelope. Neither of these violations can be realistically accepted in embedded system design, which expects ultra efficient realization of a given application. In this paper, we propose a HW/SW methodology that exploits both application specific characteristics and Spatial/Temporal redundancy. Our methodology combines design-time and runtime optimizations, to enable the resultant embedded processor to perform runtime adaptive error recovery operations, precisely targeting the reliability-wise critical instruction executions. The proposed error recovery functionality can dynamically 1) evaluate the reliability cost economy (in terms of execution-time and dynamic power), 2) determine the most profitable scheme, and 3) adapt to the corresponding error recovery scheme, which is composed of spatial and temporal redundancy based error recovery operations. The experimental results have shown that our methodology at best can achieve fifty times greater reliability while maintaining the execution time and power deadlines, when compared to the state of the art.

References

  1. H. Asadi, M. B. Tahoori, and C. Tirumurti. Estimating error propagation probabilities with bounded variances. In DFT, pages 41--49, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Baumann. Soft errors in advanced computer systems. IEEE Design & Test of Computers, 22(3):258--266, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Bernstein. Sequoia: a fault-tolerant tightly coupled multiprocessor for transaction processing. Computer, 21(2):37--45, feb. 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. R. Guthaus, J. Ringenberg, D. Ernst, T. Mudge, R. Brown, and T. Austin. MiBench: a free, commercially representative embedded benchmark suite. In IEEE International Symposium on Workload Characterization, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Hunt and P. Marinos. A general purpose cache-aided rollback error recovery (CARER) technique. In proceedings of the 17th international symposium on fault-tolerant coputing systems, pages 170--175, 1987.Google ScholarGoogle Scholar
  6. IRC. International Technology Roadmap for Semiconductor 2007 Edition Design, 2007.Google ScholarGoogle Scholar
  7. R. Leveugle, A. Calvez, P. Maistri, and P. Vanhauwaert. Statistical fault injection: quantified error and confidence. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE '09, pages 502--506, 3001 Leuven, Belgium, Belgium, 2009. European Design and Automation Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Li, R. Ragel, and S. Parameswaran. Reli: Hardware/software checkpoint and recovery scheme for embedded processors. In Design, Automation Test in Europe Conference Exhibition (DATE), 2012, pages 875--880, march 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Mishra and N. Dutt. Processor Description Languages, Volume 1. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Mukherjee, C. Weaver, J. Emer, S. Reinhardt, and T. Austin. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In Microarchitecture, 2003. MICRO-36. Proceedings. 36th Annual IEEE/ACM International Symposium on, pages 29--40, dec. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Peddersen, S. L. Shee, A. Janapsatya, and S. Parameswaran. Rapid embedded hardware/software system generation. VLSI Design, International Conference on, 0:111--116, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. M. Rabaey and S. Malik. Challenges and solutions for late- and post-silicon design. IEEE Des. Test, 25:296--302, July 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Rajamani, H. Hanson, J. Rubio, S. Ghiasi, and F. Rawson. Application-aware power management. In Workload Characterization, 2006 IEEE International Symposium on, pages 39--48, oct. 2006.Google ScholarGoogle ScholarCross RefCross Ref
  14. S. Rehman, M. Shafique, and J. Henkel. Instruction scheduling for reliability-aware compilation. In Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE, pages 1288--1296, june 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Rehman, M. Shafique, F. Kriebel, and J. Henkel. Reliable software for unreliable hardware: embedded code generation aiming at reliability. In Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, CODES+ISSS '11, pages 237--246, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. A. Reis, J. Chang, and D. I. August. Automatic instruction-level software-only recovery. IEEE Micro, 27:36--47, January 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Sartori and R. Kumar. Architecting processors to allow voltage/reliability tradeoffs. In CASES, pages 115--124, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. Sridharan and D. R. Kaeli. Using hardware vulnerability factors to enhance avf analysis. In Proceedings of the 37th annual international symposium on Computer architecture, ISCA '10, pages 461--472, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Teodorescu, J. Nakano, and J. Torrellas. SWICH: A prototype for efficient cache-level checkpointing and rollback. IEEE Micro, 26:28--40, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. F. Wakerly. Transient failures in triple modular redundancy systems with sequential modules. IEEE Trans. Computers, 24(5):570--573, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Zhao, H. Aydin, and D. Zhu. Enhanced reliability-aware power management through shared recovery technique. In ICCAD, pages 63--70, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. RASTER: runtime adaptive spatial/temporal error resiliency for embedded processors

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          DAC '13: Proceedings of the 50th Annual Design Automation Conference
          May 2013
          1285 pages
          ISBN:9781450320719
          DOI:10.1145/2463209

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 May 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,770of5,499submissions,32%

          Upcoming Conference

          DAC '24
          61st ACM/IEEE Design Automation Conference
          June 23 - 27, 2024
          San Francisco , CA , USA

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader