Abstract
Previous research has argued that preliminary data analysis is necessary for software cost estimation. In this paper, a framework for such analysis is applied to a substantial corpus of historical project data (ISBSG R9 data), selected without explicit bias. The consequent analysis yields sets of dominant variables, which are then used to construct project effort estimation models. Performance of the predictors on the raw variables and the extracted sets of variables is then measured in terms of Mean Magnitude of Relative Error (MMRE), Median of Magnitude of Relative Error (MdMRE) and prediction at levels 0.05, 0.1, and 0.25. The results from the comparative evaluation suggest that more accurate prediction models can be constructed for the selected prediction techniques. The framework processed predictor variables are statistically significant, at the 95% confidence level for both parametric techniques and one non-parametric technique. The results are also compared with the latest published results obtained by other research based on the same data set. The comparison indicates that, the models constructed using framework processed data are generally more accurate.

































Similar content being viewed by others
Notes
Actually the variance.
References
Angelis, L., & Stamelos, I. (2000). A simulation tool for efficient analogy based cost estimation. Empirical Software Engineering, 5(1), 35–68.
Bailey, J., & Basili, V. (1981). A meta-model for software development resource experiments. In Proceedings of the Fifth International Software Engineering (pp. 107–116). Los Alamitos: IEEE CS Press.
Barnett, V., & Lewis, T. (1985). Outliers in statistical data (2nd ed.). New York: John Wiley & Sons.
Bisio, R., & Malabocchia, F. (1995). Cost estimation of software projects through case base reasoning. In Proceedings of the First International Conference on Case-Based Reasoning Research & Development. Springer-Verlag.
Boehm, B. W. (1981). Software engineering economics. Englewood Cliffs: Prentice Hall.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmount: Wadsworth Inc.
Breiman, L., & Spector, P. (1992). Submodel selection and evaluation in regression: The X-random case. International Statistical Review, 60, 291–319.
Briand, L. C., Eman, K. E., Maxwell, K., Surmann, D., & Wieczorek, I. (1999). An assessment and comparison of common software cost estimation modelling techniques. In Proceedings of the International Conference on Software Engineering, ICSE99 (pp. 313–322). Los Angeles.
Briand, L. C., Langley, T., & Wieczorek, I. (2000). A replicated assessment and comparison of common software cost modelling techniques. In Proceedings of the 22nd International Conference on Software Engineering (pp. 377–386). Limerick, Ireland.
Chatfield, C. (1983). Statistics for technology—a course in applied statistics (3rd ed.). Chapman & Hall/CRC.
Conte, S. D., Dunsmore, H. E., & Shen, V. Y. (1986). Software engineering metrics and models. The Benjamin/Cummings Publishing Company, Inc.
Cook, D., & Weisberg, S. (1994). An introduction to regression graphics. Wiley Series.
Dalgaard, P. (2002). Introductory Statistics with R. Springer. ISBN 0-387-95475-9.
Dillon, W. R., & Goldstein, M. (1984). Multivariate analysis: Methods and applications. New York: John Wiley & Sons.
Everitt, B. (1993). Cluster analysis (3rd ed.). Arnold.
Ferens, D. V. (1992). An evaluation of three Function Point models for estimation of software effort. In IEEE National Aerospace and Electronics Conference—NAECON92 (Vol. 2, pp. 625–642).
Foss, T., Stensrud, E., Kitchenham, B., & Myrtveit, I. (2003). A simulation study of the model evaluation criterion MMRE. IEEE Transactions on Software Engineering, 29(11), 985–995.
Hair, J. F., Anderson, R. E., Tatham, R. L., & Black, W. C. (1995). Multivariate data analysis (4th ed.). Prentice-Hall, Inc.
Jeffery, D. R., & Low, G. C. (1990). Calibrating estimation tools for software development. Software Engineering Journal, 5(4), 215–221.
Jeffery, R., Ruhe, M., & Wieczorek, I. (2000). A comparative study of two software development cost modeling techniques using multi-organizational and company-specific data. Information & Software Technology, 42(14), 1009–1016.
Jeffery, R., Ruhe, M., & Wieczorek, I. (2001). Using public domain metrics to estimate software development effort. In Proceeding of the 7th METRICS 2001 (pp. 239–247).
Judd, C. M., Smith, E. R., & Kidder, L. H. (1991). Research methods and social relations (6th ed.). USA: Harcourt Brace Jovanovich College Publishers.
Kachigan, S. K. (1991). Multivariate statistical analysis, a conceptual introduction (2nd ed.). New York: Radius Press.
Kemerer, C. F. (1987). An empirical validation of software cost estimation models. Communication on the ACM, 30(5), 416–429.
Kitchenham, B. A. (1998). A procedure for analyzing unbalanced datasets. IEEE Transactions on Software Engineering, 24(4), 278–301.
Kitchenham, B. A., MacDonell, S. G., Pickard, L., & Shepperd, M. J. (2001). What accuracy statistics really measure. IEEE Proceedings Software, 148(3), 81–85.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (pp. 223–228).
Liu, Q. (2005). Optimal utilization of historical data sets for the construction of software cost prediction models. PhD thesis, School of Computing, Engineering and Information Sciences, Northumbria University, UK.
Liu, Q., & Mintram, R. C. (2005). Preliminary data analysis methods in software estimation. Software Quality Journal, 13, 91–115.
Liu, Q., Mintram, R. C., & Vincent, J. (2005). Evaluation of cost estimation models. In Proceedings of the International Conference on Computer Science and Information Systems, Athens, Greece.
Lokan, C. (1999). An empirical study of the correlations between Function Point elements. In Proceedings of the 6th International METRICS Symposium (pp. 200–206).
Marouane, R., & Mili, A. (1989). Economics of software project management in Tunisia: Basic Tucomo. Information and Software Technology, 31, 251–257.
Maxwell, K., Wassenhove, L. V., & Dutta, S. (1996). A software development productivity of european space, military and industrial applications. IEEE Transactions on Software Engineering, 22(10), 704–718.
Maxwell, K. D. (2002). Applied statistics for software managers. UpperSaddle River: Pearsson Education Inc.
Miyazaki, Y., Takanou, A., Nozaki, H., Nakagawa, N., & Okada, K. (1991). Method to estimate parameter values in software prediction models. Information and Software Technology, 33(3), 239–243.
Moses, J., & Farrow, M. (2005). Assessing variation in development effort consistency using a data source with missing data. Software Quality Journal, 13(1), 71–89.
Mukhopadhyay, T., & Vicinanzat, S. S. (1992). Examining the feasibility of a Case-Based Reasoning model for software effort estimation. MIS Quarterly, 16(2), 155–171.
Oja, E. (1992a). Principal components, minor components and linear neural networks. Neural Networks, 5, 927–935.
Oja, E. (1992b). A simplified neuron model as a principal component analyser. Journal of Mathematical Biology, 15, 267–273.
Oligny, S., Bourque, P., & Abran, A. (1997a). An empirical assessment of project duration models in software engineering. In Proceedings of the 8th European Software Control and Metrics Conference (ESCOM’97) (p. 9). Adrian Cowderoy, Berlin.
Oligny, S., Bourque, P., Abran, A., & Fournier, B. (1997b). Refining empirical models of project duration in software engineering. In Proceedings IFPUG 1997 Fall Conference. Scottsdale: International Function Point Users Group.
Pare, D., & Abran, A. (2005). Obvious outliers in the isbsg repository of software projects: Exploratory research. Metrics News, 10(1), 28–36.
Putnam, L. H., & Myers, W. (1992). Measures for excellence: Reliable software on time, within budget. Yourdon Press.
Shepperd, M., & Schofield, C. (1997). Estimating software project effort using analogies. IEEE Transactions on Software Engineering, 23(12), 736–743.
Shepperd, M., Schofield, C., & Kitchenham, B. A. (1996). Effort estimation using analogy. In Proceedings of the 18th International Conference on Software Engieering ICSE-18 (pp. 170–175).
Srinivasan, K., & Fisher, D. (1995). Machine learning approaches to estimating software development effort. IEEE Transactions on Software Engineering, 21(2), 126–137.
Staudte, R. G., & Sheather, S. J. (1990). Robust estimation and testing, Wiley series in probability and mathematical statistics. John Wiley & Sons.
Stephen, A. D. (1997). Forecasting principles and application. Irwin: McGraw-Hill.
Stone, M. (1974). Cross-validation choice and assessment of statistic predictions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), B-36(1), 111–147.
Wittig, G., & Finnie, G. (1997). Estimating software development effort with connectionist models. Information and Software Technology, 39(7), 469–476.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, Q., Qin, W.Z., Mintram, R. et al. Evaluation of preliminary data analysis framework in software cost estimation based on ISBSG R9 Data. Software Qual J 16, 411–458 (2008). https://doi.org/10.1007/s11219-007-9041-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-007-9041-4