An efficient control-flow checking technique for the detection of soft-errors in embedded software

doi:10.1016/j.compeleceng.2013.03.015

Computers & Electrical Engineering

Volume 39, Issue 4, May 2013, Pages 1320-1332

https://doi.org/10.1016/j.compeleceng.2013.03.015 Get rights and content

Abstract

In this paper, we propose a new technique to improve the efficiency of control-flow checking for detecting soft-errors in embedded software. The novelties of the proposed technique are as follows: (1) the frequency of used variables and the frequency of the execution of basic blocks are used as two parameters for selecting important variables and basic blocks, (2) kernel blocks (i.e., a subset of the program’s flowgraph vertices) are used for the selection of important basic blocks, and (3) using the proposed method, developers can make a trade-off between the detection latency and the performance overheads. The experimental evaluations using several benchmarks showed that the execution time in the hardened code is less than the relationship signatures for control flow checking (RSCFC) method, while the memory overhead and code size remains nearly the same. The execution time of the hardened code also remains nearly the same as the original code.

Introduction

Embedded computer systems are widely used in safety–critical applications, such as satellites, automotive and airplanes. Faulty behavior of these systems may lead to catastrophic incidents, so they must be designed to detect faults in minimum time. Faults are classified into three categories: permanent, transient and intermittent. Permanent faults result from manufacturing defects or residual design errors in hardware or software components. Transient and intermittent faults are due to environmental effects such as electromagnetic interference and alpha particle hits.

Transient faults or soft-errors have appeared rapidly in software, not only in electronically-spatial systems, but also at ground and sea level. Such faults (e.g., single event upset [1], [2]) cause incorrect execution of statement or modify bit-flips in memory.

Hardware redundancy and radiation hardening are two fault avoidance techniques used in spatial electronic components. However, the major defect of these techniques is performance reduction, high cost, high weight, and power consumption. Also, the lack of using today’s commercial components is another drawback of these techniques [1], [2].

Software implemented hardware fault tolerance (SIHFT) is another approach that uses software fault tolerance techniques for detecting faults in safety–critical systems. These techniques are divided into two classes that tolerate the processor and memory errors. For example, software-implemented error detection and correction (EDAC) [3] is an effective solution that recognizes and restores memory faults. Error detection by duplicated instruction (EDDI) is another technique that defeats processor effective faults [4].

The occurrence probability of transient faults is more than the other faults in the environments like space. Permanent or transient faults in hardware components, such as the program counter and the memory elements, may result in control-flow errors (CFEs). According to several reports in the literature, up to 70% of transient faults lead to CFEs in the program execution [5]. Therefore, it seems that control-flow based methods for checking and showing the solutions for optimization of memory and performance of these methods are important [6].

In this paper, we propose a new technique to improve the efficiency of control-flow checking methods. In the proposed approach, the program code is processed prior to applying the control-flow checking algorithm. The pre-process includes detecting high-frequent used variables and important basic blocks. The def-use chain algorithm is used for detecting high-frequent used variables. Important basic blocks are selected by nominee blocks set, which is based on the concept of kernel [7]. According to [7], a kernel is a subset of the program’s flowgraph vertices with the property that any set of tests which executes all vertices of the kernel executes all vertices of the flowgraph.

The proposed technique can be customized by developers to fulfill the requirements of different applications. After generating this information, the control-flow algorithm is applied to the program code. The novelties of the proposed method are as follows:

1.
The frequency of used variables and the frequency of the execution of basic blocks are used as two parameters for selecting important variables and basic blocks.
2.
Kernels are used for the selection of important basic blocks.
3.
The developer can make a trade-off between the detection latency and the performance overheads based on the sensitivity of software.

The structure of this paper is as follows. Section 2 gives some of the basic concepts used throughout the paper. Some existing techniques for reducing the performance and memory overheads are reviewed in Section 3. Section 4 summarizes the motivations of this work. In Section 5, the proposed technique for applying the method is introduced. In Section 6, the prototype implementation of the proposed technique is presented. The experimental results are given in Section 7. And, finally, the paper is concluded in Section 8.

Section snippets

Background

In this section, we give some basic concepts and definitions, which are used throughout the paper.

Related works

Several approaches based on hardware, software and a combination of them have been proposed for fault detection. In the following, we review some of the most related approaches.

The lockstepping [9] and watchdog processor [10] are hardware-based methods that use special hardware modules for fault detection. The lockstepping method has been used in Compaq’s Non-Stop Himalaya processor. This method performs same computation on two processors and compares the results. This method has complete fault

Motivations

The use of SIHFT technique in mission- or safety–critical applications results the reduction of costs and implementation time. With this technique, it is possible to easily improve and extend software applications. Also, it is possible to take the advantages of high-performance commercial off-the-shelf (COTS) components.

Our main motivation has been to reduce the memory and performance overheads of SIHFT techniques. For this purpose, we have selected the RSCFC method and have employed some

The proposed method

All existing SIHFT methods, such as RSCFC and CFCSS, apply control-flow checkpoints on all blocks of the program code. However, we intend to propose a method that selects these checkpoints in an intelligent manner. In the proposed method, control-flow assertions are inserted in the nominee blocks set that will be selected based on kernel blocks. We define the nominee blocks set as follows:

Definition 2 Nominee Blocks Set

The nominee blocks set are the basic blocks with higher priority for control flow checking. This set can be

A prototype implementation

The process of the generation of the modified code is shown in Fig. 8. We have developed a tool written in C++ language to implement the above algorithms. This tool takes a program in C language and inserts the assertions in the code. Then, this program finds transient errors. The stages of the process of this tool will be described in the following.

This tool has five stages. At the first stage, a filter is used to reduce the standard C statements to pseudo statements, because the parser needs

Experimental results

To assess the efficiency of the proposed approach, we have compared the memory size and performance overheads of the modified code and the original code. Also, we have evaluated the fault detection capability of our method in two cases and have compared its capabilities with the RSCFC method. For this comparison, we have selected the following three benchmarks:

•
Insertion sort (IN),
•
Quick sort (QS), and
•
Matrix multiplication (MM).

These programs have been executed on a Pentium 4 PC with 512 MB memory

Conclusions

The technique proposed in this paper tries to improve the efficiency of the relationship signatures for control flow checking (RSCFC) method. The existing software techniques only use signature (assertion) insertions in all basic blocks. However, in the proposed technique, kernel basic blocks are classified based on their frequency of execution and variables are also classified based on their frequency of use. These classifications are used as new measures for the insertion of signatures and

Tahereh Boroomandnezhad received the B.S. degree in computer engineering (software) from Islamic Azad University – Tehran Branch (2000) and the M.S. degree in computer engineering (software) from Iran University of Science and Technology (2010), Tehran, Iran. Her main research interests include fault-tolerant computing, software fault tolerance and modeling and simulation.

References (32)

S. Bahramnejad et al.
Mitigation of soft-errors in SRAM-based FPGAs using CAD tools
Comput Electr Eng
(Nov. 2011)
A. Li et al.
Software implemented transient fault detection in space computer
Aerospace Sci Technol
(2007)
A. Li et al.
On-line control flow error detection using relationship signatures among basic blocks
Comput Electr Eng
(2010)
Shirvani PP, Oh N, McCluskey EJ, Wood DL. Software-implemented hardware fault tolerance experiments COTS in space. In:...
Yenier U. Fault tolerant computing in space environment and software implemented hardware fault tolerance techniques...
P.P. Shirvani et al.
Software implemented EDAC protection against SEUs
IEEE Trans Reliab
(2000)
N. Oh et al.
Error detection by duplicated instructions in super-scalar processors
IEEE Trans Reliab
(2002)
Czech EW, Siewiorek D. Effects of transient gate-level faults on program behavior. In: Proceedings of 20th...
Sedaghat Y, Miremadi SG, Fazeli M. A software-based error detection technique using encoded signatures. In: Proceedings...
Dubrova E. Structural testing based on minimum kernels. In: Proceedings of the design, automation and test in Europe...

Yu J, Garzaran MJ, Snir M. Techniques for efficient software checking. In: Proceedings of the 20th international...

Horst RW, Harris RL, Jardine RL. Multiple instruction issue in the nonstop cyclone processor. In: Proceedings of the...

A. Mahmood et al.

Concurrent error detection using watchdog processors – a survey

IEEE Trans Comput

(1998)

Shazli SZ, Tahoori MB. Transient error detection and recovery in processor pipelines. In: Proceedings of the 2009 24th...

Gebelein J, Engel H, Kebschull U. FPGA fault tolerance in radiation susceptible environment. In: Proceedings of the...

Namjaoo M, McCluskey EJ. Watchdog processors and capability checking. In: Proceedings of the 12th international...

Cited by (6)

An efficient vulnerability-driven method for hardening a program against soft-error using genetic algorithm
2015, Computers and Electrical Engineering
Citation Excerpt :
Duplicating program instructions and comparing their results is a traditional software-based technique to detect soft-errors [3,6–8,10]. The key drawback in the software-based techniques is their performance-overhead and programming-complexity which are not acceptable in some safety-critical and real-time applications [3,6,7,11,17,18,21,22,24]. Full duplication method imposes about 255% performance-overhead to the system.
Soft-errors are one of the major causes of software failures. Restricted reliability-improvement and undesirable performance-overhead are the main shortcomings of the state-of-the-art software-based methods to detect and recover soft-errors in a program. One of the main questions in this area of study is that which sections of the program, as the vulnerable sections, need to be duplicated against soft-errors? We propose a software-based method to tolerate soft-errors, as selective-replication, which precisely identifies and hardens the most vulnerable blocks of a program. Using the genetic algorithm (GA), the proposed method takes the dynamic behavior of the programs into consideration to identify the most vulnerable blocks. The results of fault-injection experiments show that, with about 30% duplication and about 24% performance-overhead, the proposed method leads to 82% error-detection coverage. Furthermore, the proposed method can be used to improve the efficiency of the statistical fault injection (SFI) methods which are used to evaluate the error coverage of a technique or reliability of a program; the injection space in a program is reduced about 30% by avoiding the fault injection in the derating-blocks which were identified by the proposed method.
A configurable software-based approach for detecting CFES caused by transient faults
2021, KSII Transactions on Internet and Information Systems
The method of software reliability increasing for the microprocessors with multi-byte command system
2019, Journal of Physics: Conference Series
CDFEDT—Comparison of Data Flow Error Detection Techniques in embedded systems: An empirical study
2018, ACM International Conference Proceeding Series
Hybrid control-flow checking with on-line statistics
2015, Proceedings - 4th Eastern European Regional Conference on the Engineering of Computer-Based Systems, ECBS-EERC 2015
Interactive hybrid Control-flow checking method
2014, International Conference on Applied Electronics

Mohammad Abdollahi Azgomi received the B.S., M.S. and Ph.D. degrees in computer engineering (software) (1991, 1996 and 2005, respectively) from Sharif University of Technology, Tehran, Iran. His research interests include dependable and secure computing, modelling and simulation. He is currently a faculty member at the School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran.

^☆: Reviews processed and recommended for publication to Editor-in-Chief by Associate Editor Dr. Jian Li.

View full text

An efficient control-flow checking technique for the detection of soft-errors in embedded software☆