Towards a conceptual framework of software run reliability modeling☆
Introduction
MRTF: mean run to failure or median run to failure.
Stochastic modeling methodology I: by which inter-failure times are treated as random variables.
Stochastic modeling methodology II: by which the number of software failures occurring in a time interval is treated as a stochastic process.
Type 1 data: successful runs between failures.
Type 2 data: number of failures among number of runs.
Here, we note that software reliability modeling is often concerned with quantifying the behavior of software reliability and uses historical software reliability (failure) data to assess current software reliability status or/and forecast future software failures. Type 1 data count how many runs are conducted between two successive software failures and thus are often used to predict how many runs are necessary to expose the next software failure. Type 2 data counts how many software failures are observed in a given number of runs and thus are often used to predict the cumulative number of software failures observed in the next given number of runs. Obviously, type 1 data are more accurate than type 2 data and the former can be converted into the latter. However, type 2 data are easier to collect than type 1 data in practice.
Software reliability modeling has become one of the most important aspects in software reliability engineering since Jelinski–Moranda model appeared [7], [28], [43], [60]. Various methodologies have been adopted to model software reliability behavior: stochastic modeling methodology I [28], [29], [50], [60]; stochastic modeling methodology II [26], [29], [60], [62]; Bayesian methodology [36]; fuzzy methodology [13], [14]; neural network methodology [30]; non-parametric methodology [54]; and others [18], [33]. One may even claim that software reliability behavior can be well predicted [4], although there are some limits [35].
However, we note that most of existing work on software reliability modeling is focused on continuous-time base, which assumes that software reliability behavior can be measured in terms of calendar time, clock time or CPU execution time. Although this assumption is appropriate for a wide scope of systems, there are many systems, which are essentially different from this assumption. For example, reliability behavior of a bank transaction processing software system should be measured in terms of how many transactions are successful, rather than of how long the software system operates without failure. Similarly, reliability behavior of a rocket control software system should be measured in terms of how many rockets are successfully launched, rather than of how long a rocket flies without failure. Obviously, for these systems, the time base of reliability measurement is essentially discrete rather than continuous. We must examine whether software reliability modeling techniques in the context of continuous-time base can be directly applicable to dealing with problems in the context of discrete-time base.
In order to model software reliability behavior in the context of discrete-time base, throughout this paper we have the following common assumptions:
- 1.
Any software execution process can be divided into a series of runs.
- 2.
When a run is executed, the software either passes or fails.
- 3.
Runs are executed independently.1
Run reliability means the probability or possibility that software successfully perform a run. There has been some work on the topic of run reliability [5], [23], [25], [29], [45], [46], [52], [55], [56], [57], [58], [63]. Some assumed that no software defects were removed during testing and thus software run reliability was constant and corresponded to software validation phase [5], [23], [45], [46], [52], [57], [58], some dealt with the case of software defect removals [25], [29], [47], [63], while others compared or combined continuous-time and discrete-time software reliability modeling [55], [56]. However, we are surprised that the time base of run reliability was not properly recognized. That is, it was interpreted as ‘data domain’ [24], ‘input domain’ [47], or even ‘time-independent’ [58]. In fact, in (hardware) reliability engineering, the notion of time is widely interpreted. Time may be calendar time, mile or even positive integer. The time base of run reliability is also a type of time.
From previous work, we also note that modeling techniques have not been well established for run reliability, that is, basic definitions and notions behind run reliability have not been well understood and basic modeling methodologies have not been well formulated. For example, the notion of run lifetime has not been introduced, the relationships between hazard rate function and failure intensity function have not been well recognized, and one has not well understood what and how many methodologies can be used to model discrete-time software reliability behavior.
Modeling experience with continuous-time software reliability tells us that there has not been a single model, which is superior to other models in all cases [4], [38], and it will be wise if we disregard intuitive interpretations of software reliability model parameters [7]. In this paper, we aim to develop a conceptual framework for run reliability modeling rather than a particular run reliability model. By a conceptual framework we mean basic notions and basic methodologies. We try to show how run reliability behavior can be modeled in a similar way as that for continuous-time reliability modeling, provided we bear in mind that the underlying time base is discrete. Empirical validation of the proposed modeling methodologies should be left to future research.
Section 2 discusses the basic definitions and notions of run reliability. Section 3 deals with type 1 data. Section 4 deals with type 2 data. In Section 5, we show how to use Bayesian methodology to deal with run reliability modeling. In Section 6, we show how to use fuzzy methodology to deal with run reliability modeling. In Section 7, we discuss the relationship among operational profile, discrete-time software reliability behavior and continuous-time software reliability behavior. Concluding remarks are contained in Section 8. However other possible methodologies based on neural network, non-parametric analysis or Dempster–Shafer evidence theory [40], [53] are not involved here.
Of course, in order to apply the methodologies proposed in this paper, software run reliability data must be available. Compared to the amount of continuous-time software reliability data reported in the literature, the amount of discrete-time software reliability data reported in the literature is rather limited, although some authors have published their own data [29], [63].
Section snippets
Basic definitions and notions
In this section, we present basic definitions and notions of run reliability in the context of probability. Part of them have been discussed elsewhere [29], [49]. The counterparts in the context of possibility are left to Section 6.
Dealing with type 1 data
We can use Fig. 1 to represent software run execution process. The process begins with the first run, run (1,1). By run (1,k) we mean the kth run since the run process begun. By run (j,k) we mean the kth run after the (j−1)th software failure. Usually, a defect should be removed when a failure occurs, except in the software validation phase. So X1,…,Xn may not be i.i.d. Let
Then as shown in Section 2, we
Dealing with Type 2 data
We have the following assumptions [63]:
- 1.
N(0)=0.
- 2.
The process has independent increments, i.e., for any collection of the numbers of test runs 0<n1<n2<⋯<nk, the k random variables N(n1),N(n2)−N(n1),…, N(nk)−N(nk−1) are statistically independent.
- 3.
For any of the numbers of test runs ni and nj(0<ni<nj),
This implies that the NHPP model in the context of continuous-time base [2], [24], [51] can be directly used to deal with Type 2 data of run
Bayesian methodology
Bayesian methodology has a wide scope of applications in system reliability engineering [10], [19], [34], [36]. In comparison with the non-Bayesian methodology, where parameters of concern are treated as a constant, Bayesian methodology treats parameters of concern as a random variable whose probability distribution is called prior probability distribution. The prior probability distribution can capture subjective judgement or is just a parametric probability distribution (e.g., beta
Fuzzy methodology
Using fuzzy or possibilistic methodology to deal with software reliability modeling problems is not a new idea. Ramamoorthy and Bastani’s [47] notion of software correctness possibility is an example. Weiss and Weyuker’s [56] notion of ‘generalized reliability’ and Tsoukalas, Duran and Ntafos’ notion [57] of ‘cost weighted failure rate of software’ are other examples. The latter two notions are actually special cases of profust reliability [15]. On the other hand, Cai developed a fuzzy software
Operational profile, discrete-time base and continuous-time base
Now, let us return our attention to the probability context. Compared to research work on hardware operational profile [16], [17], [27], corresponding research work on software operational profile is relatively small [5], [20], [21], [22], [29], [42], [46], and there are several disadvantages associated with these work. First, they often defined software operational profile as probability distribution across the disjoint classes of test cases [5], [29], [42], [46]. However, the probability
Concluding remarks
Discrete time is a kind of time measure widely used in (hardware) reliability engineering. However, this has not been properly understood in software reliability modeling and several confusions exist in the current literature. In previous sections, we clarify some confusion and present a unified framework of discrete-time software reliability modeling, which is parallel to or resembles that of continuous-time software reliability modeling. In this framework, basic notions are defined and two
Acknowledgements
The author is most grateful to Bev Littlewood for his constructive discussions and comments. The author would like to thank Andrea Bondavalli, Karama Kanoun, Pascale Thevenod-Fosse, Mladen A. Vouk for their helpful comments on earlier versions of the paper. The comments of one anonymous referee are particularly useful since they help the author to identify and correct two mistakes. The readability of the paper is improved with the help of Didier Dubois. The first draft version of the paper was
References (65)
System failure engineering and fuzzy methodology: an introductory overview
Fuzzy Sets and Systems
(1996)On estimating the number of defects remaining in software
Journal of Systems and Software
(1998)- M. Abramowitz, I.A. Stegun, Handbook of Mathematical Functions, National Bureau of Standards Applied Mathematics Series...
- et al.
Statistical Inference for Stochastic Processes
(1980) - J.B. Bowles, C.E. Pelaez, Application of fuzzy logic to reliability engineering, in: Proceedings of the IEEE, vol. 83...
- S. Brocklekurst, B. Littlewood, New ways to get accurate reliability measures, IEEE Software (July 1992)...
- J.R. Brow, M. Lipow, Testing for software reliability, in: Proceedings of the International Conference on Reliable...
Censored software reliability models
IEEE Transactions on Reliability
(1997)- K.Y. Cai, Elements of Software Reliability Engineering, Tsinghua University Press, Beijing, September 1995 (in...
Introduction to Fuzzy Reliability
(1996)
Software Defect and Operational Profile Modeling
Cited by (42)
A method of multidimensional software aging prediction based on ensemble learning: A case of Android OS
2024, Information and Software TechnologyComparing the reliability of software systems: A case study on mobile operating systems
2018, Information SciencesCitation Excerpt :For instance, in one of the most recent works, Ke et al. [36] use the Parr-curve model with multiple change-points to analyze the consumption of testing-effort. The reliability of software systems have been described using multiple approaches, including fuzzy models [29,50,56], granular models [57], program invariants [16] regression analysis [47], Markov Models [42,44], neural networks [21,23,30,53,55,58], other machine learning techniques [59], testing [11,12], also trying to build a comprehensive framework for it [3,10,49,59], and applied to multiple, also very diverse, contexts [1,79]. Software reliability growth model is a mathematical model that describes software failure-detection or defect discovery phenomena during the system testing and debugging phases.
Estimating confidence interval of software reliability with adaptive testing strategy
2014, Journal of Systems and SoftwareCitation Excerpt :The former focuses on the reliability behavior measured in terms of time, e.g., CPU execution time, which is appropriate for a wide scope of systems. However, as pointed out by Cai (2000), this assumption is not satisfied in many systems. For example, the reliability behavior of a bank transaction processing system should be measured in terms of how many transactions are successful, rather than of how long the software system operates without failures.
Enhancing software reliability estimates using modified adaptive testing
2013, Information and Software TechnologyCitation Excerpt :Moreover, since no defect removal is conducted during the testing, no debugging is involved. This study is also related to reliability modeling approaches, including software reliability growth modeling [5,32], the extended Nelson model [41], architecture-based models [19], neural network-based models [7,28], Bayesian belief network approach [15], qualitative assessment approach [26], early stage assessment approach [39], holistic assessment approach [20], and among others [17,31]. More recently, Huang and Lyu [23] discuss the unification of software reliability growth models based on the Non-homogeneous Poisson processes (NHPPs), and propose a general NHPP model by incorporating the idea of power transformation in the model unification process.
Does software reliability growth behavior follow a non-homogeneous Poisson process
2008, Information and Software TechnologyCitation Excerpt :Note that the Goel–Okumoto NHPP model was originally proposed to describe the software reliability behavior in the continuous-time domain. However it can apply to the discrete-time domain directly by confining the various time instants (e.g., t1, t2, … , tk) to positive integers [22]. More specifically, the time instants can represent the number of tests (test cases) applied in the course of software testing.
Adaptive software testing with fixed-memory feedback
2007, Journal of Systems and Software
- ☆
Partially supported by the National Outstanding Youth Foundation of China and the Key Project of China.