Elsevier

Journal of Systems and Software

Volume 79, Issue 9, September 2006, Pages 1312-1323
Journal of Systems and Software

An integration of fault detection and correction processes in software reliability analysis

https://doi.org/10.1016/j.jss.2005.12.006Get rights and content

Abstract

Software reliability is defined as the probability of failure-free software operation for a specified period of time in a specified environment and is widely recognized as one of the most significant aspects of software quality. Over the past 30 years, many software reliability growth models (SRGMs) have been proposed and they can greatly help us to estimate some important measures such as the mean time to failure, the number of remaining faults, defect levels, and the failure intensity, etc. Besides, SRGMs can also help to determine person power needed to support the desired reliability requirements. However, from our studies, most of SRGMs only focus on describing the behavior of fault detection process and assume that faults are fixed immediately upon detection. In fact, this assumption may not be realistic. Thus, in this paper, we will propose a general framework for modeling the software fault detection and correction processes. We will also show that the proposed approaches cover a number of well-known SRGMs. Two numerical examples based on two real software failure data sets are presented and discussed in detail.

Introduction

Over the past decade, the deployment of computer systems has grown dramatically. People in the modern society are increasingly dependent on both hardware and software systems. Since software is embedded in everything and permeates our daily life, the correct performance of a software system becomes an important issue of many critical systems. Software reliability can be viewed as a good measure of quantifying software failures and is defined as the probability of failure-free software operation for a specified period of time in a specified environment (Lyu, 1996). Numerous software reliability growth models (SRGMs) have been developed to measure software reliability, and some of them are based on NHPP (Lyu, 1996, Musa et al., 1987, Pham, 2000, Xie, 1991). These SRGMs are very useful to describe the error-detection process as a discrete or continuous process at a time-dependent error-detection rate (Goel and Okumoto, 1979, Lo et al., 2001, Lo et al., 2003, Yamada et al., 1983, Yamada et al., 1993). The common assumption of conventional SRGMs is that the detected faults will be immediately removed. However, this assumption may not be very realistic, that is, it is rare to see that the detected faults are immediately corrected (Gokhale et al., 1998, Schneidewind, 1975, Schneidewind, 2003, Ohba, 1984, Xie and Zhao, 1992).

Schneidewind (1975) first modeled the fault-correction process by using a constant delayed fault-detection process. Later, Xie and Zhao (1992) extended the Schneidewind model to a continuous version by substituting a time-dependent delay function for the constant delay. A key factor of the continuous version of Schneidewind model is the time-dependent delay function, which measures the expected time lag to correct a detected fault. Actually, software debugging is a science. Fault correction personnel have to formulate a hypothesis and make predictions based on the hypothesis. Furthermore, they should run the software, observe its output, and confirm the hypothesis. We know that the time to remove a fault depends on the complexity of the detected faults, the skills of the debugging team, the available manpower, and the software development environment, etc. (Lyu, 1996, Musa et al., 1987, Musa, 1998). Therefore, it is very important for us to have different software reliability models for modeling the fault detection and correction processes. In this paper, we will propose a new software reliability model considering both the fault detection and correction processes. Some numerical examples are performed based on two real software failure data sets. Experimental results show that the proposed framework to incorporate debugging time lag for SRGM has a fairly accurate prediction capability.

This paper is organized as follows. In Section 2, the properties of the related models are reviewed and a description of characteristics of the NHPP models is discussed. An integration model of fault detection and correction processes is proposed in Section 3. Also, we show how some existing NHPP models are re-evaluated from the viewpoint of correction process and make some observations between the original NHPP models and the integrated models. The numerical examples and comparison results are presented in Section 4. Finally, the conclusions are made in Section 5.

Section snippets

A brief review of some SRGMs based on NHPP

Let {N(t), t  0} denote a counting process representing the cumulative number of faults detected by time t, m(t) be the mean value function (MVF) of the expected number of faults detected in time (0, t], and λ(t) denote the failure intensity at testing time t. That is, they satisfy the following:m(t)=E[{N(t),t0}]andλ(t)=dm(t)dt.Thus, an SRGM based on NHPP with mean value function m(t) can be formulated as (Yamada et al., 1993)P{N(t)=n}=m(t)nn!exp(-m(t)),n=0,1,2,From our studies (Lo et al., 2001,

An integrated fault detection and correction model

In the past, much research on software reliability models has concentrated on modeling and predicting failure occurrence and has not given equal priority to modeling the fault correction process (Schneidewind, 2003). However, most latent software faults may remain uncorrected for a long time even after they are detected, which increases their impact. The remaining software faults are often one of the most unreliable reasons for software quality. Therefore, from the practical viewpoint, we may

Descriptions of real data sets

In this section, we evaluate the performance of the proposed model by using two sets of software failure data. The first data set is the System T1 data of the Rome Air Development Center (RADC) (Musa, 1985, Musa et al., 1987) and shown in Table 2. The system T1 is used for a real-time command and control application. The size of the software is approximately 21,700 object instructions. It took 21 weeks, and nine programmers to complete the test. During the test phase, about 25.3 CPU hours were

Conclusions

Over the past 30 years, many SRGMs have been proposed by many researchers and some important metrics can be easily determined through SRGMs. However, from our studies, they assumed that detected faults are immediately corrected. In fact, this assumption may not be realistic in practice. In this paper, we first propose a general framework for modeling the fault detection and correction processes. We have showed that some existing SRGMs can be easily derived from the concept of the integration of

Acknowledgements

This research was supported by the National Science Council, Taiwan, ROC, under Grant NSC 93-2213-E-267-001, and also substantially supported by a grant from the Ministry of Economic Affairs (MOEA) of Taiwan (Project No. 94-EC- 17-A-01-S1-038).

Jung-Hua Lo received the BS (1993) in Mathematics and the MS (1995) and the Ph.D. (2003) in Electrical Engineering from National Taiwan University. Since 1998, he has been with LanYang Institute of Technology, where he is currently an Assistant Professor and the Chairman of the Department of Information Management. His research interests are software engineering, software reliability and testing, etc.

References (23)

  • C.Y. Huang

    Performance analysis of Software reliability growth models with testing-effort and change-point

    J. Syst. Software

    (2005)
  • P.K. Kapur et al.

    Software reliability growth model with error dependency

    Microelectron. Reliab.

    (1995)
  • A.L. Goel et al.

    Time-dependent error-detection rate model for software reliability and other performance measures

    IEEE Trans. Reliab.

    (1979)
  • Gokhale, S.S., Lyu, M.R., Trivedi, K.S., 1998. Software reliability analysis incorporating fault detection and...
  • Lo, J.H., Kuo, S.Y., Huang, C.Y., 2001. Reliability modeling incorporating error processes for internet-distributed...
  • Lo, J.H., Kuo, S.Y., Lyu, M.R., Huang, C.Y., 2002. Optimal resource allocation and reliability analysis for...
  • Lo, J.H., Huang, C.Y., Kuo, S.Y., Lyu, M.R., 2003. Sensitivity analysis of software reliability for component-based...
  • M.R. Lyu

    Handbook of Software Reliability Engineering

    (1996)
  • M.R. Lyu et al.

    Applying software reliability models more effectively

    IEEE Trans. Softw.

    (1992)
  • Musa, J.D., 1985. Software reliability data, report and database available from data and analysis center for software....
  • J.D. Musa

    Software Reliability Engineering: More Reliable Software, Faster Development and Testing

    (1998)
  • Cited by (70)

    View all citing articles on Scopus

    Jung-Hua Lo received the BS (1993) in Mathematics and the MS (1995) and the Ph.D. (2003) in Electrical Engineering from National Taiwan University. Since 1998, he has been with LanYang Institute of Technology, where he is currently an Assistant Professor and the Chairman of the Department of Information Management. His research interests are software engineering, software reliability and testing, etc.

    Chin-Yu Huang is currently an Assistant Professor in the Department of Computer Science at National Tsing Hua University, Hsinchu, Taiwan. He received the MS (1994), and the Ph.D. (2000) in Electrical Engineering from National Taiwan University, Taipei. He was with the Bank of Taiwan from 1994 to 1999, and was a senior software engineer at Taiwan Semiconductor Manufacturing Company from 1999 to 2000. Before joining NTHU in 2003, he was a division chief of the Central Bank of China, Taipei. His research interests are software reliability engineering, software testing, software metrics, software testability, fault tree analysis, and system safety assessment, etc. He is a member of IEEE.

    View full text