Abstract
In this study the authors analyse the International Software Benchmarking Standards Group data repository, Release 8.0. The data repository comprises project data from several different companies. However, the repository exhibits missing data, which must be handled in an appropriate manner, otherwise inferences may be made that are biased and misleading. The authors re-examine a statistical model that explained about 62% of the variability in actual software development effort (Summary Work Effort) which was conditioned on a sample from the repository of 339 observations. This model exhibited covariates Adjusted Function Points and Maximum Team Size and dependence on Language Type (which includes categories 2nd, 3rd, 4th Generation Languages and Application Program Generators) and Development Type (enhancement, new development and re-development). The authors now use Bayesian inference and the Bayesian statistical simulation program, BUGS, to impute missing data avoiding deletion of observations with missing Maximum Team size and increasing sample size to 616. Providing that by imputing data distributional biases are not introduced, the accuracy of inferences made from models that fit the data will increase. As a consequence of imputation, models that fit the data and explain about 59% of the variability in actual effort are identified. These models enable new inferences to be made about Language Type and Development Type. The sensitivity of the inferences to alternative distributions for imputing missing data is also considered. Furthermore, the authors contemplate the impact of these distributions on the explained variability of actual effort and show how valid effort estimates can be derived to improve estimate consistency.
Similar content being viewed by others
References
Abran, A., Desharnais, J.-M., Oligny, S., S.T.-Pierre, D., and Symons, C. 2003. COSMIC-FFP, Measurement Manual, Version 2.2, January.
Albrecht, A.J. 1979. Measuring application development, Proceedings of IBM Applications Development Joint SHARE/GUIDE Symposium, Monterey, CA, pp. 83–92.
Albrecht, A.J. and Gaffney, J.E. 1983. Software function, source lines of code, and development effort prediction: A software science validation, IEEE Transactions on Software Engineering 9(6): 639–648.
Altman, D.G. 1993. Practical Statistics for Medical Research. Chapman & Hall.
Angelis, L., Stamelos, I., and Morrisio, M. 2001. Building a software cost estimation model based on categorical data, Proceedings of the Conference IEEE Metrics 2001, London, 4–6 April, pp. 4–15.
Boehm, B.W. 1981. Software Engineering Economics. New Jersey, Prentice-Hall.
Cartwright, M.H., Shepperd, M.J., and Song, Q. 2003. Dealing with missing software project data, 9th International Software Metrics Symposium (METRICS’03), September, pp. 154–166.
Congdon, P. 2001. Bayesian Statistical Modelling, Wiley Series in Probability and Statistics. Wiley.
Dekker, T. 2004. Control enhancement projects based on size measurement, Proceedings of the 1st Software Measurement European Forum, Istituto di Ricerca Internazionale, 28–30 January, Rome, Italy, pp. 63–72.
Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.B. 1998. Bayesian Data Analysis. Chapman & Hall.
Gilks, W.R., Richardson, S., and Spiegelhalter, D.J. 1996. Markov Chain Monte Carlo in Practice. Chapman & Hall.
Hughes, R.T. 1996. Expert judgement as an estimating method, Information and Software Technology 38: 67–75.
International Software Benchmarking Standards Group. 2003. Data Repository, http://www.isbg.org.au.
Kitchenham, B.A. 1992. Empirical assumptions that underlie software cost-estimation models, Information and Software Technology 34(4): 211–218.
Lindley, D.V. 2000. The philosophy of statistics, The Statistician 49(3): 293–337.
Little, R. and Rubin, D. 1999. Comment on “Adjusting non-ignorable dropout using semiparametric models,” by D.O. Scharfstein, Rotnitzky, and Robins, Journal of the American Statistical Association 94: 1130–1132.
Little, R.J.A. and Rubin, D.B., 2002. Statistical Analysis with Missing Data, 2nd ed. New York, Wiley.
Matson, J.E., Barrett, B.E., and Mellichamp, J.M. 1994. Software development cost estimation using function points, IEEE Transactions on Software Engineering 20(4): 275–287.
Moses, J. 2001. A consideration of the impact of interactions with module effects on the direct measurement of subjective software attributes, Proceedings of the 7th IEEE Symposium on Software Metrics, London, UK, April, pp. 112–123.
Moses, J. and Farrow, M. 2003. A procedure for assessing the influence of problem domain on effort estimation consistency, Software Quality Journal 11(4): 283–300.
Moses, J. and Farrow, M. 2004. A consideration of the variation in development effort consistency due to function points, Proceedings of the 1st Software Measurement European Forum, Istituto di Ricerca Internazionale, 28–30 January, Rome, Italy, pp. 247–256.
Myrtveit, I., Stensrud, E., and Olsson, U.H. 2001. Analyzing data sets with missing data: An empirical evaluation of imputation methods and likelihood-based methods, IEEE Transactions on Software Engineering, November, 999–1013.
Spiegelhalter, D.J., Thomas, A., Best, N., and Gilks, W. 1996. BUGS 0.5, Bayesian Inference Using Gibbs Sampling Manual (version 2), MRC Biostatistics Unit, Cambridge, UK.
Stensrud, E., Foss, T., Kitchenham, B., and Myrtveit, I. 2003. A further empirical investigation of the relationship between MRE and project size, Empirical Software Engineering 8(2): 139–161.
Strike, K., El Emam, K., and Madhavji, N. 2001. Software cost estimation with incomplete data, IEEE Transactions on Software Engineering 27(10): 890–908.
Symons, C.R. 1991. Software Sizing and Estimating Mk II (Function Point Analysis). Wiley.
Walpole, R.E. and Myers, R.H. 1993. Probability and Statistics for Engineers and Scientists, 5th ed. Prentice-Hall.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Moses, J., Farrow, M. Assessing Variation in Development Effort Consistency Using a Data Source with Missing Data. Software Qual J 13, 71–89 (2005). https://doi.org/10.1007/s11219-004-5261-z
Issue Date:
DOI: https://doi.org/10.1007/s11219-004-5261-z