Robust estimations of fault content with capture–recapture and detection profile estimators

doi:10.1016/S0164-1212(99)00140-5

Journal of Systems and Software

Volume 52, Issues 2–3, 1 June 2000, Pages 139-148

https://doi.org/10.1016/S0164-1212(99)00140-5 Get rights and content

Abstract

Inspections are widely used in the software engineering community as efficient contributors to reduced fault content and improved product understanding. In order to measure and control the effect and use of inspections, the fault content after an inspection must be estimated. The capture–recapture method, with its origin in biological sciences, is a promising approach for estimation of the remaining fault content in software artefacts. However, a number of empirical studies show that the estimates are neither accurate nor robust. In order to find robust estimates, i.e., estimates with small bias and variations, the adherence to the prerequisites for different estimation models is investigated. The basic hypothesis is that a model should provide better estimates the closer the actual sample distribution is to the model's theoretical distribution. Firstly, a distance measure is evaluated and secondly a χ²-based procedure is applied. Thirdly, smoothing algorithms are tried out, e.g., mean and median values of the estimates from a number of estimation models. Based on two different inspection experiments, we conclude that it is not possible to show a correlation between adherence to the models’ theoretical distributions and the prediction capabilities of the models. This indicates that there are other factors that affect the estimation capabilities more than the prerequisites. Neither does the investigation point out any specific model to be superior. On the contrary, the Mh–JK model, which has been shown as the best alternative in a prior study, is inferior in this study. The most robust estimations are achieved by the smoothing algorithms.

Introduction

In the software engineering community, inspections are widely accepted as efficient contributors to improved software quality and reduced costs. However, there is a need to control the inspection process in order to know whether a software artefact is of sufficient quality or if it needs improvement. This process control requires data on how many residual faults are there in a software artefact. The data generally available after an inspection are the number of faults detected and removed. However, these faults do not cause any problems after removal. It is more important to know how many faults remain in the software artefact, but this information is generally not known.

The capture–recapture method is a simple and useful approach to estimate the number of faults remaining after an inspection. The method is derived from biological applications, where it is applied to estimate wildlife populations. Individual animals are counted by different observers or at different trapping occasions. If the overlap among the animals found by observers is large, most of the population is assumed to be found and the remainder is estimated to be small. On the other hand, if the overlap among the animals found by observers is small, there are probably many remaining animals in the population. This method can be applied to software inspections. The inspectors correspond to the observers, and the faults in the inspected document correspond to the animals to be counted. If many inspectors find the same faults, they have probably covered most of the faults, while if they find disjunct sets of faults, there are probably many faults yet to be discovered.

The application of capture–recapture to software inspections is proposed by Eick et al. (1992); further studies and improvements are presented in Eick et al., 1993, Vander Wiel and Votta, 1993, Wohlin et al., 1995, Briand et al., 1997, Runeson and Wohlin, 1998. A related method, the detection profile method, is proposed by Wohlin and Runeson (1998) and evaluated by Briand et al. (1998). Although the principles behind the methods are intuitively right, the published results show a large variation in the estimation capabilities. Large over and underestimates are sometimes produced and the estimators are sensitive to the underlying data. This is not satisfactory, since we need robust methods that deliver trustworthy estimates.

The estimation models have different underlying assumptions on, e.g., the statistical distribution of the fault detection probability. In this paper an empirical study of fault estimation methods is presented, which aims to investigate whether the estimates depend on how the data fit into the statistical prerequisites of the estimation models or the varying estimates have random causes.

Two alternative approaches to evaluating the dependence of underlying assumptions are presented:

1.
a distance measure which is based on a graphical representation of data and prerequisites;
2.
a χ² selection algorithm which tests statistically whether prerequisites are fulfilled or not.

The reason for investigating both methods is that the former is a graphical approach to be used as descriptive statistics while the latter is a statistical approach using hypothesis testing. The study indicates no clear pattern in the dependence on underlying data distributions, which implies that there are other factors that affect the outcome more than the prerequisites. As an alternative to the prerequisites approach various smoothing algorithms are evaluated, e.g., mean or median of 2, 3 or 4 different estimates or a simple random selection from the 4 estimates. It is concluded that no single model is superior to the others. The most robust results are achieved by the smoothing algorithms, which combine the estimate of different models and the detection profile method.

In this paper three different notations are used, estimators, models and algorithms. An estimator is a formula used to predict the number of faults remaining in a document. A model is the umbrella term for a number of estimators with the same prerequisites. Algorithms are different methods for combining two or more estimators (smoothing algorithms) or selecting (selection algorithms) one out of a set of estimators.

The paper is structured as follows. In Section 2, the prerequisites for the estimation models are introduced. In Section 3, distance measures for the models and the analysis with these measures are presented. In Section 4, the χ² method is elaborated and in Section 5, the smoothing algorithms are presented. The analysis results are given in 6 Evaluation of the algorithms, 7 Discussion contains a discussion about the results. Finally, the conclusions from the study are presented in Section 8.

Section snippets

Prerequisites for estimation models

The prerequisites state what criteria are used when an estimation model is developed. Vander Wiel and Votta (1993) show by using simulated data that the Maximum-likelihood (Mt–ML) and the Jack-knife (Mh–JK) estimators are dependant on their theoretical distribution. This means that the prerequisites for the estimators should be fulfilled in order to achieve accurate estimations. Hence, a method could be defined to choose the most appropriate estimator based on the fulfilment of the

χ² tests

χ² tests are used when judging whether one sample comes from a certain distribution (Siegel and Castellan, 1988). The capture–recapture estimation methods as well as the DPM assume that

Smoothing algorithms

To evaluate whether a simpler algorithm can be used instead of the χ² selection algorithm, smoothing algorithms are investigated. The smoothing algorithms presented here are all based on mean and median value calculations. If this kind of algorithm estimates better, a question should be raised: Are the fulfilment of the prerequisites of the estimators really that important contributors to achieve accurate estimations?

The approaches used in this paper are:

•
Calculating the mean value of the four

Discussion

One goal of the study presented in this paper was to find a robust estimator by using the fulfilment of the prerequisites as input to a selection procedure. It turned out to be an unsuccessful approach. This is unexpected since research has shown that the fulfilment of the prerequisites is important for the estimators to give accurate estimation results.

An investigation of the prerequisites’ impact on the estimation result is presented in Vander Wiel and Votta (1993). Vander Wiel and Votta

Summary and conclusions

In this paper we have discussed and analysed three different methods to be used for choosing the most appropriate estimation model given one specific inspection sample. Two of the methods assume that when the prerequisites of an estimator are fulfilled the estimator is appropriate to perform the estimation. The first approach, the distance measure, is derived with the prerequisites in mind and reflects one or two prerequisites for each estimator. The second approach is a combination of χ²

Acknowledgements

We would like to thank Claes Wohlin and Håkan Petersson for their valuable comments on this paper. This work was partly funded by The Swedish National Board for Industrial and Technical Development (NUTEK), grant 1K1P-97-09673.

References (12)

Briand, L., Emam, K.E., Freimut, B., Laitenberger, O., 1997. Quantitative evaluation of capture–recapture models to...
Briand, L., Emam, K.E., Freimut, B., 1998. Comparison and integration of capture–recapture models and the detection...
A.D. Carothers
An examination and extension of Leslie’s test of equal catchability
Biometrics
(1971)
Eick, S.G., Loader, C.R., Long, M.D., Votta, L.G., Vander Wiel, S.A., 1992. Estimating software fault content before...
Eick, S.G., Loader, C.R., Vander Wiel, S.A., Votta, L.G., 1993. How many errors remain in a software design document...
W.S. Humphrey
A Discipline for Software Engineering
(1995)

There are more references available in the full text version of this article.

Cited by (0)

View full text