Skip to main content
Log in

Evaluation of preliminary data analysis framework in software cost estimation based on ISBSG R9 Data

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Previous research has argued that preliminary data analysis is necessary for software cost estimation. In this paper, a framework for such analysis is applied to a substantial corpus of historical project data (ISBSG R9 data), selected without explicit bias. The consequent analysis yields sets of dominant variables, which are then used to construct project effort estimation models. Performance of the predictors on the raw variables and the extracted sets of variables is then measured in terms of Mean Magnitude of Relative Error (MMRE), Median of Magnitude of Relative Error (MdMRE) and prediction at levels 0.05, 0.1, and 0.25. The results from the comparative evaluation suggest that more accurate prediction models can be constructed for the selected prediction techniques. The framework processed predictor variables are statistically significant, at the 95% confidence level for both parametric techniques and one non-parametric technique. The results are also compared with the latest published results obtained by other research based on the same data set. The comparison indicates that, the models constructed using framework processed data are generally more accurate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30
Fig. 31
Fig. 32
Fig. 33

Similar content being viewed by others

Notes

  1. Actually the variance.

References

  • Angelis, L., & Stamelos, I. (2000). A simulation tool for efficient analogy based cost estimation. Empirical Software Engineering, 5(1), 35–68.

    Article  Google Scholar 

  • Bailey, J., & Basili, V. (1981). A meta-model for software development resource experiments. In Proceedings of the Fifth International Software Engineering (pp. 107–116). Los Alamitos: IEEE CS Press.

  • Barnett, V., & Lewis, T. (1985). Outliers in statistical data (2nd ed.). New York: John Wiley & Sons.

    Google Scholar 

  • Bisio, R., & Malabocchia, F. (1995). Cost estimation of software projects through case base reasoning. In Proceedings of the First International Conference on Case-Based Reasoning Research & Development. Springer-Verlag.

  • Boehm, B. W. (1981). Software engineering economics. Englewood Cliffs: Prentice Hall.

    MATH  Google Scholar 

  • Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmount: Wadsworth Inc.

    MATH  Google Scholar 

  • Breiman, L., & Spector, P. (1992). Submodel selection and evaluation in regression: The X-random case. International Statistical Review, 60, 291–319.

    Article  Google Scholar 

  • Briand, L. C., Eman, K. E., Maxwell, K., Surmann, D., & Wieczorek, I. (1999). An assessment and comparison of common software cost estimation modelling techniques. In Proceedings of the International Conference on Software Engineering, ICSE99 (pp. 313–322). Los Angeles.

  • Briand, L. C., Langley, T., & Wieczorek, I. (2000). A replicated assessment and comparison of common software cost modelling techniques. In Proceedings of the 22nd International Conference on Software Engineering (pp. 377–386). Limerick, Ireland.

  • Chatfield, C. (1983). Statistics for technology—a course in applied statistics (3rd ed.). Chapman & Hall/CRC.

  • Conte, S. D., Dunsmore, H. E., & Shen, V. Y. (1986). Software engineering metrics and models. The Benjamin/Cummings Publishing Company, Inc.

  • Cook, D., & Weisberg, S. (1994). An introduction to regression graphics. Wiley Series.

  • Dalgaard, P. (2002). Introductory Statistics with R. Springer. ISBN 0-387-95475-9.

  • Dillon, W. R., & Goldstein, M. (1984). Multivariate analysis: Methods and applications. New York: John Wiley & Sons.

    MATH  Google Scholar 

  • Everitt, B. (1993). Cluster analysis (3rd ed.). Arnold.

  • Ferens, D. V. (1992). An evaluation of three Function Point models for estimation of software effort. In IEEE National Aerospace and Electronics Conference—NAECON92 (Vol. 2, pp. 625–642).

  • Foss, T., Stensrud, E., Kitchenham, B., & Myrtveit, I. (2003). A simulation study of the model evaluation criterion MMRE. IEEE Transactions on Software Engineering, 29(11), 985–995.

    Article  Google Scholar 

  • Hair, J. F., Anderson, R. E., Tatham, R. L., & Black, W. C. (1995). Multivariate data analysis (4th ed.). Prentice-Hall, Inc.

  • Jeffery, D. R., & Low, G. C. (1990). Calibrating estimation tools for software development. Software Engineering Journal, 5(4), 215–221.

    Google Scholar 

  • Jeffery, R., Ruhe, M., & Wieczorek, I. (2000). A comparative study of two software development cost modeling techniques using multi-organizational and company-specific data. Information & Software Technology, 42(14), 1009–1016.

    Article  Google Scholar 

  • Jeffery, R., Ruhe, M., & Wieczorek, I. (2001). Using public domain metrics to estimate software development effort. In Proceeding of the 7th METRICS 2001 (pp. 239–247).

  • Judd, C. M., Smith, E. R., & Kidder, L. H. (1991). Research methods and social relations (6th ed.). USA: Harcourt Brace Jovanovich College Publishers.

    Google Scholar 

  • Kachigan, S. K. (1991). Multivariate statistical analysis, a conceptual introduction (2nd ed.). New York: Radius Press.

    Google Scholar 

  • Kemerer, C. F. (1987). An empirical validation of software cost estimation models. Communication on the ACM, 30(5), 416–429.

    Article  Google Scholar 

  • Kitchenham, B. A. (1998). A procedure for analyzing unbalanced datasets. IEEE Transactions on Software Engineering, 24(4), 278–301.

    Article  Google Scholar 

  • Kitchenham, B. A., MacDonell, S. G., Pickard, L., & Shepperd, M. J. (2001). What accuracy statistics really measure. IEEE Proceedings Software, 148(3), 81–85.

    Article  Google Scholar 

  • Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (pp. 223–228).

  • Liu, Q. (2005). Optimal utilization of historical data sets for the construction of software cost prediction models. PhD thesis, School of Computing, Engineering and Information Sciences, Northumbria University, UK.

  • Liu, Q., & Mintram, R. C. (2005). Preliminary data analysis methods in software estimation. Software Quality Journal, 13, 91–115.

    Article  Google Scholar 

  • Liu, Q., Mintram, R. C., & Vincent, J. (2005). Evaluation of cost estimation models. In Proceedings of the International Conference on Computer Science and Information Systems, Athens, Greece.

  • Lokan, C. (1999). An empirical study of the correlations between Function Point elements. In Proceedings of the 6th International METRICS Symposium (pp. 200–206).

  • Marouane, R., & Mili, A. (1989). Economics of software project management in Tunisia: Basic Tucomo. Information and Software Technology, 31, 251–257.

    Article  Google Scholar 

  • Maxwell, K., Wassenhove, L. V., & Dutta, S. (1996). A software development productivity of european space, military and industrial applications. IEEE Transactions on Software Engineering, 22(10), 704–718.

    Article  Google Scholar 

  • Maxwell, K. D. (2002). Applied statistics for software managers. UpperSaddle River: Pearsson Education Inc.

    Google Scholar 

  • Miyazaki, Y., Takanou, A., Nozaki, H., Nakagawa, N., & Okada, K. (1991). Method to estimate parameter values in software prediction models. Information and Software Technology, 33(3), 239–243.

    Article  Google Scholar 

  • Moses, J., & Farrow, M. (2005). Assessing variation in development effort consistency using a data source with missing data. Software Quality Journal, 13(1), 71–89.

    Article  Google Scholar 

  • Mukhopadhyay, T., & Vicinanzat, S. S. (1992). Examining the feasibility of a Case-Based Reasoning model for software effort estimation. MIS Quarterly, 16(2), 155–171.

    Article  Google Scholar 

  • Oja, E. (1992a). Principal components, minor components and linear neural networks. Neural Networks, 5, 927–935.

    Article  Google Scholar 

  • Oja, E. (1992b). A simplified neuron model as a principal component analyser. Journal of Mathematical Biology, 15, 267–273.

    Article  MathSciNet  Google Scholar 

  • Oligny, S., Bourque, P., & Abran, A. (1997a). An empirical assessment of project duration models in software engineering. In Proceedings of the 8th European Software Control and Metrics Conference (ESCOM’97) (p. 9). Adrian Cowderoy, Berlin.

  • Oligny, S., Bourque, P., Abran, A., & Fournier, B. (1997b). Refining empirical models of project duration in software engineering. In Proceedings IFPUG 1997 Fall Conference. Scottsdale: International Function Point Users Group.

  • Pare, D., & Abran, A. (2005). Obvious outliers in the isbsg repository of software projects: Exploratory research. Metrics News, 10(1), 28–36.

    Google Scholar 

  • Putnam, L. H., & Myers, W. (1992). Measures for excellence: Reliable software on time, within budget. Yourdon Press.

  • Shepperd, M., & Schofield, C. (1997). Estimating software project effort using analogies. IEEE Transactions on Software Engineering, 23(12), 736–743.

    Article  Google Scholar 

  • Shepperd, M., Schofield, C., & Kitchenham, B. A. (1996). Effort estimation using analogy. In Proceedings of the 18th International Conference on Software Engieering ICSE-18 (pp. 170–175).

  • Srinivasan, K., & Fisher, D. (1995). Machine learning approaches to estimating software development effort. IEEE Transactions on Software Engineering, 21(2), 126–137.

    Article  Google Scholar 

  • Staudte, R. G., & Sheather, S. J. (1990). Robust estimation and testing, Wiley series in probability and mathematical statistics. John Wiley & Sons.

  • Stephen, A. D. (1997). Forecasting principles and application. Irwin: McGraw-Hill.

    Google Scholar 

  • Stone, M. (1974). Cross-validation choice and assessment of statistic predictions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), B-36(1), 111–147.

    Google Scholar 

  • Wittig, G., & Finnie, G. (1997). Estimating software development effort with connectionist models. Information and Software Technology, 39(7), 469–476.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qin Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Q., Qin, W.Z., Mintram, R. et al. Evaluation of preliminary data analysis framework in software cost estimation based on ISBSG R9 Data. Software Qual J 16, 411–458 (2008). https://doi.org/10.1007/s11219-007-9041-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-007-9041-4

Keywords