research-article

A low-power instruction replay mechanism for design of resilient microprocessors

Authors:

Rance Rodrigues,

Arunachalam Annamalai,

Sandip KunduAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 13, Issue 4

Article No.: 85, Pages 1 - 23

https://doi.org/10.1145/2560034

Published: 10 March 2014 Publication History

Abstract

There is a growing concern about the increasing rate of defects in computing substrates. Traditional redundancy solutions prove to be too expensive for commodity microprocessor systems. Modern microprocessors feature multiple execution units to take advantage of instruction level parallelism. However, most workloads do not exhibit the level of instruction level parallelism that a typical microprocessor is resourced for. This offers an opportunity to reexecute instructions using idle execution units. But, relying solely on idle resources will not provide full instruction coverage and there is a need to explore other alternatives. To that end, we propose and evaluate two instruction replay schemes within the same core for online testing of the execution units. One scheme (RER) reexecutes only the retired instructions, while the other (REI) reexecutes all the issued instructions. The complete proposed solution requires a comparator and minor modifications to control logic, resulting in negligible hardware overhead. Both soft and hard error detection are considered and the performance and energy impact of both schemes are evaluated and compared against previously proposed redundant execution schemes. Results show that even though the proposed schemes result in a small performance penalty when compared to previous work, the energy overhead is significantly reduced.

References

[1]

T. Austin. 1999. DIVA: a reliable substrate for deep submicron microarchitecture design. In Proceedings of the 32nd Annual International Symposium on Microarchitecture.

Digital Library

[2]

R. Baumann. 2005. Soft errors in advanced computer systems. IEEE Des. Test Comput. 22, 3.

Digital Library

[3]

S. Borkar. 2005. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. IEEE Micro.

Digital Library

[4]

F. Bower, D. Sorin, and S. Ozev. 2005. A mechanism for online diagnosis of hard faults in microprocessors. In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture.

Digital Library

[5]

H. Hana and B. Johnson. 1986. Concurrent error detection in VLSI circuits using time redundancy. In Proceedings of the IEEE Southeastcon'86 Regional Conference.

[6]

A. Mendelson and N. Suri. 2000. Designing high-performance and reliable superscalar architectures: The out of order reliable superscalar (O3RS) approach. In Proceedings of the International Conference on Dependable Systems and Networks (DSN'00).

Digital Library

[7]

E. Mizan, T. Amimeur, and M. Jacome. 2007. Self-imposed temporal redundancy: An efficient technique to enhance the reliability of pipelined functional units. In Proceedings of the 19th International Symposium on Computer Architecture and High Performance Computing.

[8]

J. H. Patel and L. Y. Fung. 1982. Concurrent error detection in ALU's by recomputing with shifted operands. IEEE Trans. Comput. 31, 7.

Digital Library

[9]

J. Ray, J. Hoe, and B. Falsafi. 2001. Dual use of superscalar datapath for transient-fault detection and recovery. In Proceedings of the 34th ACM/IEEE International Symposium on Microarchitecture.

Digital Library

[10]

S. K. Reinhardt and S. S. Mukherjee. 2000. Transient fault detection via simultaneous multithreading. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA'00).

Digital Library

[11]

J. Renau. 2005. SESC: SuperESCalar simulator. Tech. rep., University of California at Santa Cruz.

[12]

R. Rodrigues and S. Kundu. 2011. An online mechanism to verify datapath execution using existing resources in chip multiprocessors. In Proceedings of the 20th Asian Test Symposium. 161--166.

Digital Library

[13]

E. Rotenberg. 1999. AR-SMT: a microarchitectural approach to fault tolerance in microprocessors. In Proceedings of the 29th Annual International Symposium on Fault-Tolerant Computing (Digest of Papers).

Digital Library

[14]

S. Rusu, S. Tam, H. Muljono, D. Ayers, and J. Chang. 2006. A dual-core multi-threaded Xeon processor with 16mb l3 cache. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC'06) (Digest of Technical Papers). 315--324.

[15]

S. Shyam, K. Constantinides, S. Phadke, V. Bertacco, and T. Austin. 2006. Ultra low-cost defect protection for microprocessor pipelines. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems.

Digital Library

[16]

D. P. Siewiorek and R. S. Swarz. 1998. Reliable Computer Systems: Design and Evaluation. AK Peters, Ltd.

Digital Library

[17]

J. Smolens, J. Kim, J. Hoe, and B. Falsafi. 2004. Efficient resource sharing in concurrent error detecting superscalar microarchitectures. In Proceedings of the 37th International Symposium on Microarchitecture. 257--268.

Digital Library

[18]

D. J. Sorin, M. M. K. Martin, M. D. Hill, and D. A. Wood. 2002. SafetyNet: Improving the availability of shared memory multiprocessors with global checkpoint/recovery. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA'02).

Digital Library

[19]

SPEC2000. The Standard Performance Evaluation Corporation (Spec CPI2000 suite).

[20]

A. Timor, A. Mendelson, Y. Birk, and N. Suri. 2010. Using Underutilized CPU Resources to Enhance Its Reliability. IEEE Trans. Depend. Secure Comput.

Digital Library

[21]

D. Vasudevan and P. Lala. 2005. A technique for modular design of self-checking carry-select adder. In Proceedings of the 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'05).

Digital Library

[22]

M. Yilmaz, D. R. Hower, S. Ozev, and D. J. Sorin. 2006. Self-checking and self-diagnosing 32-bit microprocessor multiplier. In Proceedings of the IEEE International Test Conference.

[23]

M. Yilmaz, A. Meixner, S. Ozev, and D. J. Sorin. 2007. Lazy error detection for microprocessor functional units. In Proceedings of the 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT'07).

Digital Library

Cited By

Psiakis RKritikakou ASentieys O(2017)NEDA: NOP Exploitation with Dependency Awareness for Reliable VLIW Processors2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI.2017.75(391-396)Online publication date: Jul-2017
https://doi.org/10.1109/ISVLSI.2017.75
Soman JMiralaei NMycroft AJones T(2015)REPAIR: Hard-error recovery via re-execution2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS)10.1109/DFT.2015.7315139(76-79)Online publication date: Oct-2015
https://doi.org/10.1109/DFT.2015.7315139

Index Terms

A low-power instruction replay mechanism for design of resilient microprocessors

Recommendations

Reducing instruction bit-width for low-power VLIW architectures

VLIW (very long instruction word) architectures have proven to be useful for embedded applications with abundant instruction level parallelism. But due to the long instruction bus width it often consumes more power and memory space than necessary. One ...
Low power microarchitecture with instruction reuse
CF '08: Proceedings of the 5th conference on Computing frontiers

Power consumption has become a very important metric and challenging research topic in the design of microprocessors in the recent years. The goal of this work is to improve power efficiency of superscalar processors through instruction reuse at the ...
Comprehensive Evaluation of an Instruction Reissue Mechanism
ISPAN '00: Proceedings of the 2000 International Symposium on Parallel Architectures, Algorithms and Networks

In this paper, we evaluate a mechanism to reissue instructionson the mispredicted speculation path. An instruction which is once dispatched to a functional unit during mispredicted speculation is issued again inside an instruction window. This scheme is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 13, Issue 4

Regular Papers

November 2014

647 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/2592905

Editor:
Sandeep K. Shukla
Virginia Tech, USA

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 10 March 2014

Accepted: 01 August 2013

Revised: 01 June 2013

Received: 01 December 2012

Published in TECS Volume 13, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Semiconductor Research Corporation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
280
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Psiakis RKritikakou ASentieys O(2017)NEDA: NOP Exploitation with Dependency Awareness for Reliable VLIW Processors2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI.2017.75(391-396)Online publication date: Jul-2017
https://doi.org/10.1109/ISVLSI.2017.75
Soman JMiralaei NMycroft AJones T(2015)REPAIR: Hard-error recovery via re-execution2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS)10.1109/DFT.2015.7315139(76-79)Online publication date: Oct-2015
https://doi.org/10.1109/DFT.2015.7315139

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents