Skip to main content
Log in

Assessing Variation in Development Effort Consistency Using a Data Source with Missing Data

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

In this study the authors analyse the International Software Benchmarking Standards Group data repository, Release 8.0. The data repository comprises project data from several different companies. However, the repository exhibits missing data, which must be handled in an appropriate manner, otherwise inferences may be made that are biased and misleading. The authors re-examine a statistical model that explained about 62% of the variability in actual software development effort (Summary Work Effort) which was conditioned on a sample from the repository of 339 observations. This model exhibited covariates Adjusted Function Points and Maximum Team Size and dependence on Language Type (which includes categories 2nd, 3rd, 4th Generation Languages and Application Program Generators) and Development Type (enhancement, new development and re-development). The authors now use Bayesian inference and the Bayesian statistical simulation program, BUGS, to impute missing data avoiding deletion of observations with missing Maximum Team size and increasing sample size to 616. Providing that by imputing data distributional biases are not introduced, the accuracy of inferences made from models that fit the data will increase. As a consequence of imputation, models that fit the data and explain about 59% of the variability in actual effort are identified. These models enable new inferences to be made about Language Type and Development Type. The sensitivity of the inferences to alternative distributions for imputing missing data is also considered. Furthermore, the authors contemplate the impact of these distributions on the explained variability of actual effort and show how valid effort estimates can be derived to improve estimate consistency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Abran, A., Desharnais, J.-M., Oligny, S., S.T.-Pierre, D., and Symons, C. 2003. COSMIC-FFP, Measurement Manual, Version 2.2, January.

  • Albrecht, A.J. 1979. Measuring application development, Proceedings of IBM Applications Development Joint SHARE/GUIDE Symposium, Monterey, CA, pp. 83–92.

  • Albrecht, A.J. and Gaffney, J.E. 1983. Software function, source lines of code, and development effort prediction: A software science validation, IEEE Transactions on Software Engineering 9(6): 639–648.

    Google Scholar 

  • Altman, D.G. 1993. Practical Statistics for Medical Research. Chapman & Hall.

  • Angelis, L., Stamelos, I., and Morrisio, M. 2001. Building a software cost estimation model based on categorical data, Proceedings of the Conference IEEE Metrics 2001, London, 4–6 April, pp. 4–15.

  • Boehm, B.W. 1981. Software Engineering Economics. New Jersey, Prentice-Hall.

    Google Scholar 

  • Cartwright, M.H., Shepperd, M.J., and Song, Q. 2003. Dealing with missing software project data, 9th International Software Metrics Symposium (METRICS’03), September, pp. 154–166.

  • Congdon, P. 2001. Bayesian Statistical Modelling, Wiley Series in Probability and Statistics. Wiley.

  • Dekker, T. 2004. Control enhancement projects based on size measurement, Proceedings of the 1st Software Measurement European Forum, Istituto di Ricerca Internazionale, 28–30 January, Rome, Italy, pp. 63–72.

  • Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.B. 1998. Bayesian Data Analysis. Chapman & Hall.

  • Gilks, W.R., Richardson, S., and Spiegelhalter, D.J. 1996. Markov Chain Monte Carlo in Practice. Chapman & Hall.

  • Hughes, R.T. 1996. Expert judgement as an estimating method, Information and Software Technology 38: 67–75.

    Google Scholar 

  • International Software Benchmarking Standards Group. 2003. Data Repository, http://www.isbg.org.au.

  • Kitchenham, B.A. 1992. Empirical assumptions that underlie software cost-estimation models, Information and Software Technology 34(4): 211–218.

    Google Scholar 

  • Lindley, D.V. 2000. The philosophy of statistics, The Statistician 49(3): 293–337.

    Google Scholar 

  • Little, R. and Rubin, D. 1999. Comment on “Adjusting non-ignorable dropout using semiparametric models,” by D.O. Scharfstein, Rotnitzky, and Robins, Journal of the American Statistical Association 94: 1130–1132.

    Google Scholar 

  • Little, R.J.A. and Rubin, D.B., 2002. Statistical Analysis with Missing Data, 2nd ed. New York, Wiley.

    Google Scholar 

  • Matson, J.E., Barrett, B.E., and Mellichamp, J.M. 1994. Software development cost estimation using function points, IEEE Transactions on Software Engineering 20(4): 275–287.

    Google Scholar 

  • Moses, J. 2001. A consideration of the impact of interactions with module effects on the direct measurement of subjective software attributes, Proceedings of the 7th IEEE Symposium on Software Metrics, London, UK, April, pp. 112–123.

  • Moses, J. and Farrow, M. 2003. A procedure for assessing the influence of problem domain on effort estimation consistency, Software Quality Journal 11(4): 283–300.

    Google Scholar 

  • Moses, J. and Farrow, M. 2004. A consideration of the variation in development effort consistency due to function points, Proceedings of the 1st Software Measurement European Forum, Istituto di Ricerca Internazionale, 28–30 January, Rome, Italy, pp. 247–256.

  • Myrtveit, I., Stensrud, E., and Olsson, U.H. 2001. Analyzing data sets with missing data: An empirical evaluation of imputation methods and likelihood-based methods, IEEE Transactions on Software Engineering, November, 999–1013.

    Google Scholar 

  • Spiegelhalter, D.J., Thomas, A., Best, N., and Gilks, W. 1996. BUGS 0.5, Bayesian Inference Using Gibbs Sampling Manual (version 2), MRC Biostatistics Unit, Cambridge, UK.

  • Stensrud, E., Foss, T., Kitchenham, B., and Myrtveit, I. 2003. A further empirical investigation of the relationship between MRE and project size, Empirical Software Engineering 8(2): 139–161.

    Google Scholar 

  • Strike, K., El Emam, K., and Madhavji, N. 2001. Software cost estimation with incomplete data, IEEE Transactions on Software Engineering 27(10): 890–908.

    Google Scholar 

  • Symons, C.R. 1991. Software Sizing and Estimating Mk II (Function Point Analysis). Wiley.

  • Walpole, R.E. and Myers, R.H. 1993. Probability and Statistics for Engineers and Scientists, 5th ed. Prentice-Hall.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John Moses.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moses, J., Farrow, M. Assessing Variation in Development Effort Consistency Using a Data Source with Missing Data. Software Qual J 13, 71–89 (2005). https://doi.org/10.1007/s11219-004-5261-z

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-004-5261-z

Keywords

Navigation