An integration of fault detection and correction processes in software reliability analysis

doi:10.1016/j.jss.2005.12.006

Journal of Systems and Software

Volume 79, Issue 9, September 2006, Pages 1312-1323

https://doi.org/10.1016/j.jss.2005.12.006 Get rights and content

Abstract

Software reliability is defined as the probability of failure-free software operation for a specified period of time in a specified environment and is widely recognized as one of the most significant aspects of software quality. Over the past 30 years, many software reliability growth models (SRGMs) have been proposed and they can greatly help us to estimate some important measures such as the mean time to failure, the number of remaining faults, defect levels, and the failure intensity, etc. Besides, SRGMs can also help to determine person power needed to support the desired reliability requirements. However, from our studies, most of SRGMs only focus on describing the behavior of fault detection process and assume that faults are fixed immediately upon detection. In fact, this assumption may not be realistic. Thus, in this paper, we will propose a general framework for modeling the software fault detection and correction processes. We will also show that the proposed approaches cover a number of well-known SRGMs. Two numerical examples based on two real software failure data sets are presented and discussed in detail.

Introduction

Over the past decade, the deployment of computer systems has grown dramatically. People in the modern society are increasingly dependent on both hardware and software systems. Since software is embedded in everything and permeates our daily life, the correct performance of a software system becomes an important issue of many critical systems. Software reliability can be viewed as a good measure of quantifying software failures and is defined as the probability of failure-free software operation for a specified period of time in a specified environment (Lyu, 1996). Numerous software reliability growth models (SRGMs) have been developed to measure software reliability, and some of them are based on NHPP (Lyu, 1996, Musa et al., 1987, Pham, 2000, Xie, 1991). These SRGMs are very useful to describe the error-detection process as a discrete or continuous process at a time-dependent error-detection rate (Goel and Okumoto, 1979, Lo et al., 2001, Lo et al., 2003, Yamada et al., 1983, Yamada et al., 1993). The common assumption of conventional SRGMs is that the detected faults will be immediately removed. However, this assumption may not be very realistic, that is, it is rare to see that the detected faults are immediately corrected (Gokhale et al., 1998, Schneidewind, 1975, Schneidewind, 2003, Ohba, 1984, Xie and Zhao, 1992).

Schneidewind (1975) first modeled the fault-correction process by using a constant delayed fault-detection process. Later, Xie and Zhao (1992) extended the Schneidewind model to a continuous version by substituting a time-dependent delay function for the constant delay. A key factor of the continuous version of Schneidewind model is the time-dependent delay function, which measures the expected time lag to correct a detected fault. Actually, software debugging is a science. Fault correction personnel have to formulate a hypothesis and make predictions based on the hypothesis. Furthermore, they should run the software, observe its output, and confirm the hypothesis. We know that the time to remove a fault depends on the complexity of the detected faults, the skills of the debugging team, the available manpower, and the software development environment, etc. (Lyu, 1996, Musa et al., 1987, Musa, 1998). Therefore, it is very important for us to have different software reliability models for modeling the fault detection and correction processes. In this paper, we will propose a new software reliability model considering both the fault detection and correction processes. Some numerical examples are performed based on two real software failure data sets. Experimental results show that the proposed framework to incorporate debugging time lag for SRGM has a fairly accurate prediction capability.

This paper is organized as follows. In Section 2, the properties of the related models are reviewed and a description of characteristics of the NHPP models is discussed. An integration model of fault detection and correction processes is proposed in Section 3. Also, we show how some existing NHPP models are re-evaluated from the viewpoint of correction process and make some observations between the original NHPP models and the integrated models. The numerical examples and comparison results are presented in Section 4. Finally, the conclusions are made in Section 5.

Section snippets

A brief review of some SRGMs based on NHPP

Let {N(t), t ⩾ 0} denote a counting process representing the cumulative number of faults detected by time t, m(t) be the mean value function (MVF) of the expected number of faults detected in time (0, t], and λ(t) denote the failure intensity at testing time t. That is, they satisfy the following: $m (t) = E [{N (t), t ⩾ 0}]$ and $λ (t) = \frac{d m (t)}{d t} .$ Thus, an SRGM based on NHPP with mean value function m(t) can be formulated as (Yamada et al., 1993) $P {N (t) = n} = \frac{m (t)^{n}}{n!} \exp (- m (t)), n = 0, 1, 2, \dots$ From our studies (Lo et al., 2001,

An integrated fault detection and correction model

In the past, much research on software reliability models has concentrated on modeling and predicting failure occurrence and has not given equal priority to modeling the fault correction process (Schneidewind, 2003). However, most latent software faults may remain uncorrected for a long time even after they are detected, which increases their impact. The remaining software faults are often one of the most unreliable reasons for software quality. Therefore, from the practical viewpoint, we may

Descriptions of real data sets

In this section, we evaluate the performance of the proposed model by using two sets of software failure data. The first data set is the System T1 data of the Rome Air Development Center (RADC) (Musa, 1985, Musa et al., 1987) and shown in Table 2. The system T1 is used for a real-time command and control application. The size of the software is approximately 21,700 object instructions. It took 21 weeks, and nine programmers to complete the test. During the test phase, about 25.3 CPU hours were

Conclusions

Over the past 30 years, many SRGMs have been proposed by many researchers and some important metrics can be easily determined through SRGMs. However, from our studies, they assumed that detected faults are immediately corrected. In fact, this assumption may not be realistic in practice. In this paper, we first propose a general framework for modeling the fault detection and correction processes. We have showed that some existing SRGMs can be easily derived from the concept of the integration of

Acknowledgements

This research was supported by the National Science Council, Taiwan, ROC, under Grant NSC 93-2213-E-267-001, and also substantially supported by a grant from the Ministry of Economic Affairs (MOEA) of Taiwan (Project No. 94-EC- 17-A-01-S1-038).

Jung-Hua Lo received the BS (1993) in Mathematics and the MS (1995) and the Ph.D. (2003) in Electrical Engineering from National Taiwan University. Since 1998, he has been with LanYang Institute of Technology, where he is currently an Assistant Professor and the Chairman of the Department of Information Management. His research interests are software engineering, software reliability and testing, etc.

References (23)

C.Y. Huang
Performance analysis of Software reliability growth models with testing-effort and change-point
J. Syst. Software
(2005)
P.K. Kapur et al.
Software reliability growth model with error dependency
Microelectron. Reliab.
(1995)
A.L. Goel et al.
Time-dependent error-detection rate model for software reliability and other performance measures
IEEE Trans. Reliab.
(1979)
Gokhale, S.S., Lyu, M.R., Trivedi, K.S., 1998. Software reliability analysis incorporating fault detection and...
Lo, J.H., Kuo, S.Y., Huang, C.Y., 2001. Reliability modeling incorporating error processes for internet-distributed...
Lo, J.H., Kuo, S.Y., Lyu, M.R., Huang, C.Y., 2002. Optimal resource allocation and reliability analysis for...
Lo, J.H., Huang, C.Y., Kuo, S.Y., Lyu, M.R., 2003. Sensitivity analysis of software reliability for component-based...
M.R. Lyu
Handbook of Software Reliability Engineering
(1996)
M.R. Lyu et al.
Applying software reliability models more effectively
IEEE Trans. Softw.
(1992)
Musa, J.D., 1985. Software reliability data, report and database available from data and analysis center for software....

J.D. Musa

Software Reliability Engineering: More Reliable Software, Faster Development and Testing

(1998)

Cited by (70)

On the testing resource allocation problem: Research trends and perspectives
2020, Journal of Systems and Software
In testing a software application, a primary concern is how to effectively plan the assignment of resources available for testing to the software components so as to achieve a target goal under given constraints. In the literature, this is known as testing resources allocation problem (TRAP). Researchers spent a lot of effort to propose models for supporting test engineers in this task, and a variety of solutions exist to assess the best trade-off between testing time, cost and quality of delivered products. This article presents a systematic mapping study aimed at systematically exploring the TRAP research area in order to provide an overview on the type of research performed and on results currently available. A sample of 68 selected studies has been classified and analyzed according to defined dimensions. Results give an overview of the state of the art, provide guidance to improve practicability and allow outlining a set of directions for future research and applications of TRAP solutions.
RFID library management software dependability through reliable fault-detection and fault correction procedures
2024, Microsystem Technologies
A model of software fault detection and correction processes considering heterogeneous faults
2023, Quality and Reliability Engineering International
Modelling reliability growth for multi-version open source software considering varied testing and debugging factors
2022, Quality and Reliability Engineering International
Automating Staged Rollout with Reinforcement Learning
2022, arXiv
Automating Staged Rollout with Reinforcement Learning
2022, Proceedings - International Conference on Software Engineering

View all citing articles on Scopus

Chin-Yu Huang is currently an Assistant Professor in the Department of Computer Science at National Tsing Hua University, Hsinchu, Taiwan. He received the MS (1994), and the Ph.D. (2000) in Electrical Engineering from National Taiwan University, Taipei. He was with the Bank of Taiwan from 1994 to 1999, and was a senior software engineer at Taiwan Semiconductor Manufacturing Company from 1999 to 2000. Before joining NTHU in 2003, he was a division chief of the Central Bank of China, Taipei. His research interests are software reliability engineering, software testing, software metrics, software testability, fault tree analysis, and system safety assessment, etc. He is a member of IEEE.

View full text

An integration of fault detection and correction processes in software reliability analysis

Abstract

Introduction

Section snippets

A brief review of some SRGMs based on NHPP

An integrated fault detection and correction model

Descriptions of real data sets

Conclusions

Acknowledgements

J. Syst. Software

Microelectron. Reliab.

Time-dependent error-detection rate model for software reliability and other performance measures

IEEE Trans. Reliab.

Handbook of Software Reliability Engineering

Applying software reliability models more effectively

IEEE Trans. Softw.

Software Reliability Engineering: More Reliable Software, Faster Development and Testing