Fast and accurate architectural vulnerability analysis for embedded processors using Instruction Vulnerability Factor

doi:10.1016/j.micpro.2016.01.012

Microprocessors and Microsystems

Volume 42, May 2016, Pages 113-126

https://doi.org/10.1016/j.micpro.2016.01.012 Get rights and content

Abstract

Scaling new-silicons to nano-scale era has brought more integration, high performance and low power consumption while the reliability becomes a serious challenge for integrated circuits technology. Therefore, reliability awareness has become essential in early stages of integrated circuit design. Since many of modern chips scrimmage with the limited power budget and traditional techniques such as N-Modular Redundancy (NMR) is not efficient for non-uniform fault tolerance, accurate analyzing of the reliability of different hardware components or application parts is necessary. Transient and soft errors which are resulted from cosmic rays strike and Process Voltage and Temperature (PVT) variation are known as main sources of unreliability. Recently, Architectural Vulnerability Factor (AVF) is widely used for analyzing the reliability of a processors. In this paper, we have introduced a new metric named as Instruction Vulnerability Factor (IVF) which is used for fast, accurate, and recurring AVF estimation. Special scenarios have been developed which enable us to utilize exhaustive fault injection for precise IVF calculation for a given processor instruction set. IVFs of a special instruction considers the vulnerability of pipeline stages while executing the instruction. Finally, a simple equation has been derived for AVF estimation based on running instructions. Our experimental results which are extracted by our Configurable Reliability Analysis Framework (CRAF) confirm the accuracy of presented AVF estimation method. Moreover, IVF can be employed by reliability aware compilation or online AVF estimation techniques.

Introduction

Although shrinking the technology feature size to nano-scale leads to lower power consumption, higher performance, and more integrity, the reliability of new silicons is a new important challenge. Nowadays, soft errors are an important concern in designing reliable nano-scale integrated circuits and have emerged as a key challenge to microprocessor design. With dimensions of the circuit elements getting closer to countable atom diameters, circuits become more vulnerable to electronic noises which are generally resulted from high-current power supplies, internal radiations from decomposition of heavy metal mixed with utilized metallization, or a hit by external high-energy particle [1]. These random effects can cause a current pulse at an internal node of the circuit and flip the state of the logic when the induced charge is accumulated to a sufficient amount. Such a pulse can lead to a temporary change in the logic level of the struck node and cause a Single Event Transient (SET), or the particle may hit a storage element and change the stored value and cause a Single Event Upset (SEU).

Although Mean-Time-To-Failure (MTTF) is a well-known measure for reliability analysis and covers all failure mechanisms including diffusion, aging, and packaging defects, Failure In Time (FIT) is a more specific measure for soft errors. FIT which indicates how many errors in one billion operating device hours are expected, not only depends on the device itself, but also on the operating conditions (supply voltage, temperature) and the device’s location (position on earth surface, altitude, or maybe in space). FIT is separated into two factors [2]: Raw Error Rate which depends on the environmental conditions and Architectural Vulnerability Factor (AVF) that reflects the susceptibility of design to soft errors. AVF is defined as the possibility that a soft error (transient error) eventually leads to a visible error at program output. This metric can be computed as a portion of the important bits in the structure that are required for Architecturally Correct Execution (ACE) to the total number of bits of the structure. Since AVF is independent of environmental conditions, it is suitable for comparison between reliability of different architectures and AVF awareness at early design stage is greatly helpful to achieve a trade-off between system performance and reliability.

Three well known methods for AVF computation are reported [3]: analytical models, performance models (simulators), and statistical fault injection (SFI) models. In AVF estimation by analytical models, Little’s Law is applied to ACE bits and the average number of ACE bits in the structure is calculated. In the early stages of design which neither a performance model nor a detailed RTL model is available, this technique can be useful. Performance model identifies ACE and un-ACE bits of the objects flowing through the structure and gets the fraction of time in which a bit contains ACE state as the AVF of the bit. This analysis requires an in-depth understanding of the micro-architectural states of the processors. The last AVF computation method, SFI, performs bit flips into RTL model of the processor and the AVF of the structure is then estimated as the fraction of mismatches divided by the total number of bit flips [4], [5]. Although SFI is a very powerful technique and can compute AVF without understanding the processor architecture, unfortunately it can be used only in detailed models, such as RTL, and is usually much slower than performance models. Indeed, exploiting this method for AVF computation requires a large amount of simulation time to cover efficient number of injected errors.

There are many researches about efficient AVF estimation [2], [6], [7], [8], [9], [10], [11], [12] and mostly the AVFs of small structures such as Instruction Queue (IQ), Reorder Buffer (ROB), Load/Store Queue (LSQ), and eventually Register File (RF) have been estimated. In fact, performance models for AVF estimation are time consuming methods and are not appropriate for large structures such as a whole processor. From another point of view, all of these hardware structures are sequential parts and their states have been traced for ACE analysis. With bit flips which are introduced only into sequential parts, soft errors in combinational parts and their properties in fault masking have been neglected. While for moderate nano-scale silicons, transient errors at combinational nodes must be taken into account [13].

This paper introduces a new reliability metric named as Instruction Vulnerability Factor (IVF) which is utilized for fast and accurate AVF estimation. We have proposed novel scenarios for IVF extraction which enables us to exploit exhaustive fault injection in a reasonable simulation time to capture more accurate IVFs. Our extensive simulations and verifications prove that IVF values which are calculated once for a given processor’s Instruction Set Architecture (ISA) carry the accuracy of fault injection. Then, we have derived a simple analytical equation for AVF estimation based on the IVFs of running instructions. Our AVF estimation method is very fast and our experimental results confirm that its accuracy is comparable to AVF estimation by fault injection.

Section snippets

Background and previous works

Recently, the reliability of new-silicons has been the subject of many researches. Mukherjee et al. [2] have introduced Architectural Vulnerability Factor (AVF) as a new metric which is widely utilized when the reliability issues resulted from soft errors are studied. Currently, the AVF is most popular and a lot of researches have been done on AVF related topics such as cost effective and accurate AVF estimation methods [7], [14], [15], online AVF estimation methods [9], [10], [11], [12], AVF

Instruction Vulnerability Factor

As mentioned before, we have introduced a new metric that we have called Instruction Vulnerability Factor (IVF) in [34]. In this paper, first, we have extended and completed our previous work and then the introduced IVF is utilized for fast and cycle accurate estimation of AVF in processors. First of all, motivational examples are presented in Section 3.1 and then IVF is introduced and explained in Section 3.2.

IVF extraction method

For calculating the IVF of instructions, their behavior in presence of soft errors must be monitored. In normal conditions, when the processor executes an application, different instructions exist in processor pipeline simultaneously. In this situation, if a soft error strikes the processor hardware and an error has been tracked, it is difficult to assign this failure to one of the currently resident instructions in the pipeline and therefore, IVF extraction is impossible or so difficult with

AVF estimation using IVF

This section introduces our proposed method for fast and accurate AVF estimation based on using the IVF of running instructions. Using IVFs which are extracted once for each instruction of a given CPU, the AVF of a hardware structure or whole CPU can be estimated. The AVF of a structure in a cycle is defined as percentage of ACE bits to all bits in structure. In each cycle there are five instructions in the CPU pipeline which are executed concurrently. At this cycle, we can use alternate

Using workload features to refine estimated AVF

Although exhaustive fault injection takes into account the impact of hardware on the AVF of the structure, however, some features of workload (or running application) affect AVF. The number of cache misses, software masking, data dependencies between instructions, and input data are samples of these features which are important in AVF estimation.

When a cache miss happens, four instructions remain in processor pipeline as long as the miss penalty time. In these circumstances, the WB stage is

Experimental setup

In this section our experimental setup has been explained. First, we have introduced fault model in Section 7.1. Then, Section 7.2 illustrates our fault injection and reliability analysis framework in detail.

Experimental results

In this section, we have exploited CRAF with appropriate configuration for extracting our experimental results.

Conclusion and future work

In this paper, we have extended our previously proposed reliability metric named Instruction Vulnerability Factor (IVF) [34] which indicates the vulnerability of different pipeline stages of a processor while executing instructions. Then, special scenarios have been developed for IVF extraction based on exhaustive fault injection in a reasonable simulation time. Finally, IVF-based AVF estimation method has been proposed for acceptable and fast AVF estimation in a very time saving manner. The

Acknowledgment

This research was in part supported by a Grant from IPM.

References (61)

Y. Xie et al.
Reliability-aware co-synthesis for embedded systems
J. VLSI Signal Process. Syst. Signal Image Video Technol.
(2007)
S. Rehman et al.
Instruction scheduling for reliability-aware compilation
Proceedings of the Forty-Ninth Annual Design Automation Conference
(2012)
R.C. Baumann
Radiation-induced soft errors in advanced semiconductor technologies
IEEE Trans. Device Mater. Reliab.
(2005)
S.S. Mukherjee et al.
A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor
Proceedings of the Thirty-Sixth Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-36.
(2003)
S.S. Mukherjee et al.
The soft error problem: An architectural perspective
Proceedings of the Eleventh International Symposium on High-Performance Computer Architecture, HPCA-11
(2005)
S. Kim et al.
Soft error sensitivity characterization for microprocessor dependability enhancement strategy
Proceedings of the 2002 International Conference on Dependable Systems and Networks, DSN 2002.
(2002)
N.J. Wang et al.
Characterizing the effects of transient faults on a high-performance processor pipeline
Proceedings of the 2004 International Conference on Dependable Systems and Networks
(2004)
J. Chetia et al.
An efficient AVF estimation technique using circuit partitioning
Proceedings of the Twelfth European Conference on Radiation and Its Effects on Components and Systems, RADECS 2011
(2011)
R. Hartl et al.
Improved backwards analysis for architectural vulnerability factor estimation
Proceedings of the Semiconductor Conference Dresden, SCD 2011
(2011)
M. Maniatakos et al.
Avf analysis acceleration via hierarchical fault pruning
Proceedings of the Sixteenth IEEE European Test Symposium, ETS 2011
(2011)

X. Li et al.

Online estimation of architectural vulnerability factor for soft errors

Proceedings of the Thirty-Fifth International Symposium on Computer Architecture, ISCA’08

(2008)

N.K. Soundararajan et al.

Mechanisms for bounding vulnerabilities of processor structures

Proceedings of the Thirty-Fourth Annual International Symposium on Computer Architecture

(2007)

K.R. Walcott et al.

Dynamic prediction of architectural vulnerability from microarchitectural state

ACM SIGARCH Comput. Archit. News

(2007)

A. Biswas et al.

Quantized AVF: A means of capturing vulnerability variations over small windows of time

Proceedings of the IEEE Workshop on Silicon Errors in Logic-System Effects

(2009)

P. Shivakumar et al.

Modeling the effect of technology trends on the soft error rate of combinational logic

Proceedings of the International Conference on Dependable Systems and Networks, DSN 2002.

(2002)

A. Biswas et al.

Computing architectural vulnerability factors for address-based structures

Proceedings of the Thirty-Second International Symposium on Computer Architecture, ISCA’05.

(2005)

S.S. Mukherjee et al.

Measuring architectural vulnerability factors

Micro IEEE

(2003)

S. Pan et al.

Online computing and predicting architectural vulnerability factor of microprocessor structures

Proceedings of the Fifteenth IEEE Pacific Rim International Symposium on Dependable Computing, PRDC’09

(2009)

F. Firouzi et al.

Adaptive fault-tolerant DVFS with dynamic online AVF prediction

Microelectron. Reliab.

(2012)

H. Aydin et al.

Reliability-aware energy management for periodic real-time tasks

IEEE Trans. Comput.

(2009)

T. Imagawa et al.

High-speed DFG-level SEU vulnerability analysis for applying selective TMR to resource-constrained CGRA

Proceedings of the Fourteenth International Symposium on Quality Electronic Design , ISQED 2013

(2013)

L. Huang et al.

Lifetime reliability-aware task allocation and scheduling for mpsoc platforms

Proceedings of the Conference on Design, Automation and Test in Europe

(2009)

J. Li et al.

Epipe: a low-cost fault-tolerance technique considering WCET constraints

J. Syst.Archit.

(2013)

Z. Wang et al.

Accurate and efficient reliability estimation techniques during ADL-driven embedded processor design

Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, DATE 2013

(2013)

T. Li et al.

Cser: Hw/sw configurable soft-error resiliency for application specific instruction-set processors

Proceedings of the Conference on Design, Automation and Test in Europe

(2013)

S. Rehman et al.

Leveraging variable function resilience for selective software reliability on unreliable hardware

Proceedings of the Conference on Design, Automation and Test in Europe

(2013)

M. Shafique et al.

Exploiting program-level masking and error propagation for constrained reliability optimization

Proceedings of the Fiftieth Annual Design Automation Conference

(2013)

I. Oz et al.

Thread vulnerability in parallel applications

J. Parallel Distrib. Comput.

(2012)

S. Rehman et al.

Reliable software for unreliable hardware: embedded code generation aiming at reliability

Proceedings of the Seventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis

(2011)

D. Borodin et al.

Protective redundancy overhead reduction using instruction vulnerability factor

Proceedings of the Seventh ACM International Conference on Computing Frontiers

(2010)

Cited by (8)

An automated framework for selectively tolerating SDC errors based on rigorous instruction-level vulnerability assessment
2024, Future Generation Computer Systems
The recent trend in most processor manufacturing technologies has significantly increased the vulnerability of embedded systems operating in harsh environments against soft errors. These errors can cause Silent Data Corruptions (SDCs) that produce erroneous execution results silently, disturbing the system’s execution and potentially leading to severe financial, human or environmental disasters. The use of fault tolerance techniques that take into account the performance and constraints of safety-critical systems is therefore essential to improve system reliability efficiently. Given the significant overhead imposed by conventional techniques, e.g., performance loss, increased memory usage, and additional hardware costs, researchers have developed cost-effective software-based techniques for fault tolerance. However, as detection rates grow, these techniques can increase code size and execution time significantly, which creates a challenge. This paper proposes an automated framework for selective fault tolerance of SDCs in software running on different architectures. The framework comprises a sequence of several consecutive techniques executed automatically. It offers a software-based technique that operates at the microarchitecture level and evaluates the vulnerability of program instructions against SDC errors. The framework conducts vulnerability assessment at the binary code level using a non-intrusive, runtime fault injection mechanism. It can inject faults at different granularity levels to maximize fault activation, including fine-grained injection at specific instruction fields or encoding bits, and coarse-grained injection into the entire software system. The framework makes minor modifications to the software being tested, enabling it to run at near-native speed. When SDC vulnerable instructions are identified, the framework selectively protects them automatically using a compiler extension, achieving a more appropriate trade-off between SDC detection and overhead by avoiding overprotection. Our framework was evaluated by conducting a large number of fault injection-based experiments on real-world benchmark programs using the cycle-accurate Gem5 simulator. Leveraging the accurate vulnerability assessment results provided by our framework, the proposed selective technique reduces SDC errors by up to 99% by selectively protecting only 45% of the program’s static instructions, with a performance overhead ranging from 8% to 35%.
Estimating Code Vulnerability to Timing Errors Via Microarchitecture-Aware Machine Learning
2023, IEEE Design and Test
Aging-Aware Instruction-Level Statistical Dynamic Timing Analysis for Embedded Processors
2020, IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Instruction-Level NBTI Stress Estimation and Its Application in Runtime Aging Prediction for Embedded Processors
2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A scheme to improve the intrinsic error detection of the instruction set architecture
2017, IEEE Computer Architecture Letters
GBMW: An accurate framework for exploiting soft error masking effects in register files
2017, Proceedings of 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference, IAEAC 2017

View all citing articles on Scopus

Ali Azarpeyvand was born in Zanjan, Iran, in 1974. He received B.Sc. degree in computer engineering from Sharif University of Technology, Tehran, Iran in 1997, and M.Sc. degree in computer architecture from University of Tehran, Tehran, Iran in 2000. Since 1998 to 2004 he has been a senior digital design engineer in Emad Semicon Company. Since September 2001, he joined University of Zanjan, Zanjan, Iran, as an instructor. He has received his Ph.D. degree in School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran in 2013. Currently he is an Assistant Professor in University of Zanjan. His research interests include computer architecture, fault-tolerant embedded systems, reliability analysis, soft error mitigation techniques, and approximate computing.

Mostafa Ersali Salehi Nasab was born in Kerman, Iran, in 1978. He received the B.Sc. degree in computer engineering from University of Tehran, Tehran, Iran, and the M.Sc. degree in computer architecture from University of Amirkabir, Tehran, Iran, in 2001 and 2003, respectively. He has received his Ph.D. degree in School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran in 2010. From 2004 to 2008, he was a senior digital designer working on ASIC design projects with SINA Microelectronics Inc., Technology Park of University of Tehran, Tehran, Iran. He is now an Assistant Professor in University of Tehran. His research interests include novel techniques for high-performance, low-power, and fault-tolerant embedded system design.

Sied Mehdi Fakhraie received the M.Sc. degree in electronics from the University of Tehran, Tehran, Iran, in 1989, and the Ph.D. degree in electrical and computer engineering from the University of Toronto, Toronto, ON, Canada, in 1995. He was a Professor with the School of Electrical and Computer Engineering, University of Tehran. He was the Director of the Silicon Intelligence and VLSI Signal Processing Laboratory, the Director of Electrical and Electronics Engineering, and the Director of Computer Hardware Engineering with the School of Electrical and Computer Engineering, University of Tehran. He was a Visiting Professor with the University of Toronto, in 1998, 1999, and 2000, where he was involved in efficient implementation of artificial neural networks. He was with Valence Semiconductor Inc., Irvine, CA, USA, from 2000 to 2003. He was in Dubai, United Arab Emirates, and Markham, Canada Offices of Valence as the Director of Application Specified Integrated Circuit (ASIC) and System-on-a-Chip Design, and the Technical Leader of integrated broadband gateway and family radio system baseband processors. He was involved in many industrial integrated circuit design projects, including design of network processors and home gateway access devices, DSL modems, pagers, and digital signal processors for personal and mobile communication devices. He has co-authored a book entitled VLSI-Compatible Implementations for Artificial Neural Networks (Boston, MA, USA: Kluwer, 1997). He has authored or co-authored over 230 reviewed conference and journal papers. His last research interests include system design and ASIC implementation of integrated systems, novel techniques for high-speed digital circuit design, and system-integration and efficient VLSI implementation of intelligent systems. He passed away on December 7, 2014.

Saeed Safari received his Ph.D. degree in computer architecture from Computer Engineering Department, Sharif University of Technology, Tehran, Iran, in 2005. Since then, he has been a faculty member of Electrical and Computer Engineering Department, Faculty of Engineering, University of Tehran, Iran. His research interests include fault tolerant system design, high performance computing, computer arithmetic, test and design for test.

View full text

Fast and accurate architectural vulnerability analysis for embedded processors using Instruction Vulnerability Factor

Abstract

Introduction

Section snippets

Background and previous works

Instruction Vulnerability Factor

IVF extraction method

AVF estimation using IVF

Using workload features to refine estimated AVF

Experimental setup

Experimental results

Conclusion and future work

Acknowledgment

J. VLSI Signal Process. Syst. Signal Image Video Technol.

Radiation-induced soft errors in advanced semiconductor technologies

IEEE Trans. Device Mater. Reliab.

A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor

Proceedings of the Thirty-Sixth Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-36.

The soft error problem: An architectural perspective

Proceedings of the Eleventh International Symposium on High-Performance Computer Architecture, HPCA-11

Soft error sensitivity characterization for microprocessor dependability enhancement strategy

Proceedings of the 2002 International Conference on Dependable Systems and Networks, DSN 2002.

Characterizing the effects of transient faults on a high-performance processor pipeline

Proceedings of the 2004 International Conference on Dependable Systems and Networks

An efficient AVF estimation technique using circuit partitioning

Proceedings of the Twelfth European Conference on Radiation and Its Effects on Components and Systems, RADECS 2011

Improved backwards analysis for architectural vulnerability factor estimation

Proceedings of the Semiconductor Conference Dresden, SCD 2011

Avf analysis acceleration via hierarchical fault pruning

Proceedings of the Sixteenth IEEE European Test Symposium, ETS 2011

Online estimation of architectural vulnerability factor for soft errors

Proceedings of the Thirty-Fifth International Symposium on Computer Architecture, ISCA’08

Mechanisms for bounding vulnerabilities of processor structures

Proceedings of the Thirty-Fourth Annual International Symposium on Computer Architecture

Dynamic prediction of architectural vulnerability from microarchitectural state

ACM SIGARCH Comput. Archit. News

Quantized AVF: A means of capturing vulnerability variations over small windows of time

Proceedings of the IEEE Workshop on Silicon Errors in Logic-System Effects

Modeling the effect of technology trends on the soft error rate of combinational logic

Proceedings of the International Conference on Dependable Systems and Networks, DSN 2002.

Computing architectural vulnerability factors for address-based structures

Proceedings of the Thirty-Second International Symposium on Computer Architecture, ISCA’05.

Measuring architectural vulnerability factors

Micro IEEE

Online computing and predicting architectural vulnerability factor of microprocessor structures

Proceedings of the Fifteenth IEEE Pacific Rim International Symposium on Dependable Computing, PRDC’09

Adaptive fault-tolerant DVFS with dynamic online AVF prediction

Microelectron. Reliab.

Reliability-aware energy management for periodic real-time tasks

IEEE Trans. Comput.

High-speed DFG-level SEU vulnerability analysis for applying selective TMR to resource-constrained CGRA

Proceedings of the Fourteenth International Symposium on Quality Electronic Design , ISQED 2013

Lifetime reliability-aware task allocation and scheduling for mpsoc platforms

Proceedings of the Conference on Design, Automation and Test in Europe

Epipe: a low-cost fault-tolerance technique considering WCET constraints

J. Syst.Archit.

Accurate and efficient reliability estimation techniques during ADL-driven embedded processor design

Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, DATE 2013

Cser: Hw/sw configurable soft-error resiliency for application specific instruction-set processors

Proceedings of the Conference on Design, Automation and Test in Europe

Leveraging variable function resilience for selective software reliability on unreliable hardware

Proceedings of the Conference on Design, Automation and Test in Europe

Exploiting program-level masking and error propagation for constrained reliability optimization

Proceedings of the Fiftieth Annual Design Automation Conference

Thread vulnerability in parallel applications

J. Parallel Distrib. Comput.

Reliable software for unreliable hardware: embedded code generation aiming at reliability

Proceedings of the Seventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis

Protective redundancy overhead reduction using instruction vulnerability factor

Proceedings of the Seventh ACM International Conference on Computing Frontiers