Evaluation of preliminary data analysis framework in software cost estimation based on ISBSG R9 Data

Liu, Qin; Qin, Wen Zhong; Mintram, Robert; Ross, Margaret

doi:10.1007/s11219-007-9041-4

Evaluation of preliminary data analysis framework in software cost estimation based on ISBSG R9 Data

Published: 31 January 2008

Volume 16, pages 411–458, (2008)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Qin Liu¹,
Wen Zhong Qin¹,
Robert Mintram² &
…
Margaret Ross³

349 Accesses
7 Citations
Explore all metrics

Abstract

Previous research has argued that preliminary data analysis is necessary for software cost estimation. In this paper, a framework for such analysis is applied to a substantial corpus of historical project data (ISBSG R9 data), selected without explicit bias. The consequent analysis yields sets of dominant variables, which are then used to construct project effort estimation models. Performance of the predictors on the raw variables and the extracted sets of variables is then measured in terms of Mean Magnitude of Relative Error (MMRE), Median of Magnitude of Relative Error (MdMRE) and prediction at levels 0.05, 0.1, and 0.25. The results from the comparative evaluation suggest that more accurate prediction models can be constructed for the selected prediction techniques. The framework processed predictor variables are statistically significant, at the 95% confidence level for both parametric techniques and one non-parametric technique. The results are also compared with the latest published results obtained by other research based on the same data set. The comparison indicates that, the models constructed using framework processed data are generally more accurate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Software Project Estimation Using Smooth Curve Methods and Variable Selection and Regularization Methods as an Alternative to Linear Regression Models when the Reference Database Presents a Wedge-shape Form

Article 21 December 2022

A Systematic Literature Review on Software Development Estimation Techniques

Analysis of the Software Project Estimation Process: A Case Study

Notes

Actually the variance.

References

Angelis, L., & Stamelos, I. (2000). A simulation tool for efficient analogy based cost estimation. Empirical Software Engineering, 5(1), 35–68.
Article Google Scholar
Bailey, J., & Basili, V. (1981). A meta-model for software development resource experiments. In Proceedings of the Fifth International Software Engineering (pp. 107–116). Los Alamitos: IEEE CS Press.
Barnett, V., & Lewis, T. (1985). Outliers in statistical data (2nd ed.). New York: John Wiley & Sons.
Google Scholar
Bisio, R., & Malabocchia, F. (1995). Cost estimation of software projects through case base reasoning. In Proceedings of the First International Conference on Case-Based Reasoning Research & Development. Springer-Verlag.
Boehm, B. W. (1981). Software engineering economics. Englewood Cliffs: Prentice Hall.
MATH Google Scholar
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmount: Wadsworth Inc.
MATH Google Scholar
Breiman, L., & Spector, P. (1992). Submodel selection and evaluation in regression: The X-random case. International Statistical Review, 60, 291–319.
Article Google Scholar
Briand, L. C., Eman, K. E., Maxwell, K., Surmann, D., & Wieczorek, I. (1999). An assessment and comparison of common software cost estimation modelling techniques. In Proceedings of the International Conference on Software Engineering, ICSE99 (pp. 313–322). Los Angeles.
Briand, L. C., Langley, T., & Wieczorek, I. (2000). A replicated assessment and comparison of common software cost modelling techniques. In Proceedings of the 22nd International Conference on Software Engineering (pp. 377–386). Limerick, Ireland.
Chatfield, C. (1983). Statistics for technology—a course in applied statistics (3rd ed.). Chapman & Hall/CRC.
Conte, S. D., Dunsmore, H. E., & Shen, V. Y. (1986). Software engineering metrics and models. The Benjamin/Cummings Publishing Company, Inc.
Cook, D., & Weisberg, S. (1994). An introduction to regression graphics. Wiley Series.
Dalgaard, P. (2002). Introductory Statistics with R. Springer. ISBN 0-387-95475-9.
Dillon, W. R., & Goldstein, M. (1984). Multivariate analysis: Methods and applications. New York: John Wiley & Sons.
MATH Google Scholar
Everitt, B. (1993). Cluster analysis (3rd ed.). Arnold.
Ferens, D. V. (1992). An evaluation of three Function Point models for estimation of software effort. In IEEE National Aerospace and Electronics Conference—NAECON92 (Vol. 2, pp. 625–642).
Foss, T., Stensrud, E., Kitchenham, B., & Myrtveit, I. (2003). A simulation study of the model evaluation criterion MMRE. IEEE Transactions on Software Engineering, 29(11), 985–995.
Article Google Scholar
Hair, J. F., Anderson, R. E., Tatham, R. L., & Black, W. C. (1995). Multivariate data analysis (4th ed.). Prentice-Hall, Inc.
Jeffery, D. R., & Low, G. C. (1990). Calibrating estimation tools for software development. Software Engineering Journal, 5(4), 215–221.
Google Scholar
Jeffery, R., Ruhe, M., & Wieczorek, I. (2000). A comparative study of two software development cost modeling techniques using multi-organizational and company-specific data. Information & Software Technology, 42(14), 1009–1016.
Article Google Scholar
Jeffery, R., Ruhe, M., & Wieczorek, I. (2001). Using public domain metrics to estimate software development effort. In Proceeding of the 7th METRICS 2001 (pp. 239–247).
Judd, C. M., Smith, E. R., & Kidder, L. H. (1991). Research methods and social relations (6th ed.). USA: Harcourt Brace Jovanovich College Publishers.
Google Scholar
Kachigan, S. K. (1991). Multivariate statistical analysis, a conceptual introduction (2nd ed.). New York: Radius Press.
Google Scholar
Kemerer, C. F. (1987). An empirical validation of software cost estimation models. Communication on the ACM, 30(5), 416–429.
Article Google Scholar
Kitchenham, B. A. (1998). A procedure for analyzing unbalanced datasets. IEEE Transactions on Software Engineering, 24(4), 278–301.
Article Google Scholar
Kitchenham, B. A., MacDonell, S. G., Pickard, L., & Shepperd, M. J. (2001). What accuracy statistics really measure. IEEE Proceedings Software, 148(3), 81–85.
Article Google Scholar
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (pp. 223–228).
Liu, Q. (2005). Optimal utilization of historical data sets for the construction of software cost prediction models. PhD thesis, School of Computing, Engineering and Information Sciences, Northumbria University, UK.
Liu, Q., & Mintram, R. C. (2005). Preliminary data analysis methods in software estimation. Software Quality Journal, 13, 91–115.
Article Google Scholar
Liu, Q., Mintram, R. C., & Vincent, J. (2005). Evaluation of cost estimation models. In Proceedings of the International Conference on Computer Science and Information Systems, Athens, Greece.
Lokan, C. (1999). An empirical study of the correlations between Function Point elements. In Proceedings of the 6th International METRICS Symposium (pp. 200–206).
Marouane, R., & Mili, A. (1989). Economics of software project management in Tunisia: Basic Tucomo. Information and Software Technology, 31, 251–257.
Article Google Scholar
Maxwell, K., Wassenhove, L. V., & Dutta, S. (1996). A software development productivity of european space, military and industrial applications. IEEE Transactions on Software Engineering, 22(10), 704–718.
Article Google Scholar
Maxwell, K. D. (2002). Applied statistics for software managers. UpperSaddle River: Pearsson Education Inc.
Google Scholar
Miyazaki, Y., Takanou, A., Nozaki, H., Nakagawa, N., & Okada, K. (1991). Method to estimate parameter values in software prediction models. Information and Software Technology, 33(3), 239–243.
Article Google Scholar
Moses, J., & Farrow, M. (2005). Assessing variation in development effort consistency using a data source with missing data. Software Quality Journal, 13(1), 71–89.
Article Google Scholar
Mukhopadhyay, T., & Vicinanzat, S. S. (1992). Examining the feasibility of a Case-Based Reasoning model for software effort estimation. MIS Quarterly, 16(2), 155–171.
Article Google Scholar
Oja, E. (1992a). Principal components, minor components and linear neural networks. Neural Networks, 5, 927–935.
Article Google Scholar
Oja, E. (1992b). A simplified neuron model as a principal component analyser. Journal of Mathematical Biology, 15, 267–273.
Article MathSciNet Google Scholar
Oligny, S., Bourque, P., & Abran, A. (1997a). An empirical assessment of project duration models in software engineering. In Proceedings of the 8th European Software Control and Metrics Conference (ESCOM’97) (p. 9). Adrian Cowderoy, Berlin.
Oligny, S., Bourque, P., Abran, A., & Fournier, B. (1997b). Refining empirical models of project duration in software engineering. In Proceedings IFPUG 1997 Fall Conference. Scottsdale: International Function Point Users Group.
Pare, D., & Abran, A. (2005). Obvious outliers in the isbsg repository of software projects: Exploratory research. Metrics News, 10(1), 28–36.
Google Scholar
Putnam, L. H., & Myers, W. (1992). Measures for excellence: Reliable software on time, within budget. Yourdon Press.
Shepperd, M., & Schofield, C. (1997). Estimating software project effort using analogies. IEEE Transactions on Software Engineering, 23(12), 736–743.
Article Google Scholar
Shepperd, M., Schofield, C., & Kitchenham, B. A. (1996). Effort estimation using analogy. In Proceedings of the 18th International Conference on Software Engieering ICSE-18 (pp. 170–175).
Srinivasan, K., & Fisher, D. (1995). Machine learning approaches to estimating software development effort. IEEE Transactions on Software Engineering, 21(2), 126–137.
Article Google Scholar
Staudte, R. G., & Sheather, S. J. (1990). Robust estimation and testing, Wiley series in probability and mathematical statistics. John Wiley & Sons.
Stephen, A. D. (1997). Forecasting principles and application. Irwin: McGraw-Hill.
Google Scholar
Stone, M. (1974). Cross-validation choice and assessment of statistic predictions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), B-36(1), 111–147.
Google Scholar
Wittig, G., & Finnie, G. (1997). Estimating software development effort with connectionist models. Information and Software Technology, 39(7), 469–476.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Software Engineering, Tongji University at Shanghai, Shanghai, China
Qin Liu & Wen Zhong Qin
School of Design, Engineering and Computing, Bournemouth University, Bournemouth, UK
Robert Mintram
Faculty of Technology, Southampton Solent University, Southampton, UK
Margaret Ross

Authors

Qin Liu
View author publications
You can also search for this author inPubMed Google Scholar
Wen Zhong Qin
View author publications
You can also search for this author inPubMed Google Scholar
Robert Mintram
View author publications
You can also search for this author inPubMed Google Scholar
Margaret Ross
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Qin Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Q., Qin, W.Z., Mintram, R. et al. Evaluation of preliminary data analysis framework in software cost estimation based on ISBSG R9 Data. Software Qual J 16, 411–458 (2008). https://doi.org/10.1007/s11219-007-9041-4

Download citation

Published: 31 January 2008
Issue Date: September 2008
DOI: https://doi.org/10.1007/s11219-007-9041-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of preliminary data analysis framework in software cost estimation based on ISBSG R9 Data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Software Project Estimation Using Smooth Curve Methods and Variable Selection and Regularization Methods as an Alternative to Linear Regression Models when the Reference Database Presents a Wedge-shape Form

A Systematic Literature Review on Software Development Estimation Techniques

Analysis of the Software Project Estimation Process: A Case Study

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now