skip to main content
10.1145/1250662.1250725acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

Mechanisms for bounding vulnerabilities of processor structures

Published: 09 June 2007 Publication History

Abstract

Concern for the increasing susceptibility of processor structures to transient errors has led to several recent research efforts that propose architectural techniques to enhance reliability. However, real systems are typically required to satisfy hard reliability budgets, and barring expensive full-redundancy approaches, none of the proposed solutions treat any reliability budgets or bounds as hard constraints. Meeting vulnerability bounds requires monitoring vulnerabilities of processor structures and taking appropriate actions whenever these bounds are violated. This mandates treating reliability as a first-order microarchitecture design constraint, while optimizing performance as long as reliability requirements are satisfied. This paper makes three key contributions towards this goal: (i) we present a simple infrastructure to monitor and provide upper bounds on the vulnerabilities of key processor structures at cycle-level fidelity; (ii) we propose two distinct control mechanisms - throttling and selective redundancy - to proactively and/or reactively bound the vulnerabilities to any limit specified by the system designer; (iii) within this framework, we propose a novel adaptation of Out-of-Order Commit for vulnerability reduction, which automatically provides additional leverage for the control mechanisms to boost performance while remaining within the reliability budget.

References

[1]
G. B. Bell and M. H. Lipasti. Deconstructing commit. In Proceedings of the 4th International Symposium on Performance Analysis of Systems and Software, Austin, Texas, March 2004.
[2]
D. Burger and T. Austin. The SimpleScalar Toolset, Version 3.0. http://www.simplescalar.com.
[3]
C. Constantinescu. Trends and Challenges in VLSI Circuit Reliability. IEEE Micro, 23(4):14--19, July-August 2003.
[4]
A. Cristal, D. Ortega, J. Llosa, and M. Valero. Out-of-order commit processors. In HPCA '04: Proceedings of the 10th International Symposium on High Performance Computer Architecture, page 48, Washington, DC, USA, 2004. IEEE Computer Society.
[5]
X. Fu, J. Poe, T. Li, and J. A. B. Fortes. Characterizing microarchitecture soft error vulnerability phase behavior. In MASCOTS '06: Proceedings of the 14th IEEE International Symposium on Modeling, Analysis, and Simulation, pages 147--155, Washington, DC, USA, 2006. IEEE Computer Society.
[6]
M. A. Gomaa and T. N. Vijaykumar. Opportunistic transient-fault detection. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 172--183, 2005.
[7]
I. Kim and M. H. Lipasti. Understanding scheduling replay schemes. In HPCA '04: Proceedings of the 10th International Symposium on High Performance Computer Architecture, page 198, Washington, DC, USA, 2004. IEEE Computer Society.
[8]
X. Li, S. V. Adve, P. Bose, and J. A. Rivers. Softarch: An architecture level tool for modeling and analyzing soft errors. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), pages 496--505, 2005.
[9]
S. Mukherjee, M. Kontz, and S. Reinhardt. Detailed Design and Evaluation of Redundant Multithreading Alternatives. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 99--110, May 2002.
[10]
S. Mukherjee, C. Weaver, J. Emer, S. Reinhardt, and T. Austin. A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 29--40, December 2003.
[11]
S. Palacharla. Complexity-Effective Superscalar Processors. PhD thesis, University of Wisconsin-Madison, 1998.
[12]
A. Parashar, S. Gurumurthi, and A. Sivasubramaniam. A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 376--386, June 2004.
[13]
A. Parashar, S. Gurumurthi, and A. Sivasubramaniam. Slick: Slice-based locality exploitation for efficient redundant multithreading. In Proceedings of the Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2006.
[14]
D. Ponomarev, G. Kucuk, and K. Ghose. Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources. In MICRO 34: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, pages 90--101, Washington, DC, USA, 2001. IEEE Computer Society.
[15]
J. Ray, J. Hoe, and B. Falsafi. Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 214--224, December 2001.
[16]
V. K. Reddy, S. Parthasarathy, and E. Rotenberg. Understanding prediction-based partial redundant threading for low-overhead, high-coverage fault tolerance. In Proceedings of the Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2006.
[17]
S. Reinhardt and S. Mukherjee. Transient Fault Detection via Simultaneous Multithreading. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 25--36, June 2000.
[18]
E. Rotenberg. AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors. In Proceedings of the International Symposium on Fault-Tolerant Computing (FTCS), pages 84--91, June 1999.
[19]
J. Shen and M. Lipasti. Modern Processor Design: Fundamentals of Superscalar Processors (Beta Edition). McGraw Hill, 2003.
[20]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically Characterizing Large Scale Program Behavior. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2002.
[21]
P. Shivakumar, M. Kistler, S. Keckler, D. Burger, and L. Alvisi. Modeling the Effect of Technology Trends on Soft Error Rate of Combinational Logic. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), June 2002.
[22]
K. Sundaramoorthy, Z. Purser, and E. Rotenburg. Slipstream processors: improving both performance and fault tolerance. In ASPLOS-IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, pages 257--268, 2000.
[23]
T. Vijaykumar, I. Pomeranz, and K. Cheng. Transient-Fault Recovery via Simultaneous Multithreading. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 87--98, May 2002.
[24]
C. Weaver, J. Emer, S. Mukherjee, and S. Reinhardt. Techniques to Reduce the Soft Error Rate of High-Performance Microprocessor. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 264--275, June 2004.

Cited By

View all

Index Terms

  1. Mechanisms for bounding vulnerabilities of processor structures

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture
    June 2007
    542 pages
    ISBN:9781595937063
    DOI:10.1145/1250662
    • General Chair:
    • Dean Tullsen,
    • Program Chair:
    • Brad Calder
    • cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 35, Issue 2
      May 2007
      527 pages
      ISSN:0163-5964
      DOI:10.1145/1273440
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. microarchitecture
    2. redundant threading
    3. transient faults

    Qualifiers

    • Article

    Conference

    SPAA07
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 543 of 3,203 submissions, 17%

    Upcoming Conference

    ISCA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Reliability-Aware Runahead2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00062(772-785)Online publication date: Apr-2022
    • (2022)IntroductionFault Tolerant Computer Architecture10.1007/978-3-031-01723-0_1(1-17)Online publication date: 5-Mar-2022
    • (2020)Prediction-Based Error Correction for GPU Reliability with Low OverheadElectronics10.3390/electronics91118499:11(1849)Online publication date: 5-Nov-2020
    • (2018)Optimizing Soft Error Reliability Through Scheduling on Heterogeneous Multicore ProcessorsIEEE Transactions on Computers10.1109/TC.2017.277948067:6(830-846)Online publication date: 1-Jun-2018
    • (2018)A user‐assisted thread‐level vulnerability assessment toolConcurrency and Computation: Practice and Experience10.1002/cpe.508531:13Online publication date: 20-Nov-2018
    • (2016)An Accurate Cross-Layer Approach for Online Architectural Vulnerability EstimationACM Transactions on Architecture and Code Optimization10.1145/297558813:3(1-27)Online publication date: 17-Sep-2016
    • (2016)Cross-layer system reliability assessment framework for hardware faults2016 IEEE International Test Conference (ITC)10.1109/TEST.2016.7805863(1-10)Online publication date: Nov-2016
    • (2016)Reliability aware throughput management of chip multi-processor architecture via thread migrationThe Journal of Supercomputing10.1007/s11227-016-1665-372:4(1363-1380)Online publication date: 1-Apr-2016
    • (2015)Soft-Error-Tolerant Design Methodology for Balancing Performance, Power, and ReliabilityIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2014.234887223:9(1628-1639)Online publication date: Sep-2015
    • (2015)Response-time minimization in soft real-time systems with temperature-affected reliability constraint2015 CSI Symposium on Real-Time and Embedded Systems and Technologies (RTEST)10.1109/RTEST.2015.7369850(1-8)Online publication date: Oct-2015
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media