Skip to main content
Log in

Analysis of attribute weighting heuristics for analogy-based software effort estimation method AQUA+

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Estimation by analogy (EBA) predicts effort for a new project by aggregating effort information of similar projects from a given historical data set. Existing research results have shown that a careful selection and weighting of attributes may improve the performance of the estimation methods. This paper continues along that research line and considers weighting of attributes in order to improve the estimation accuracy. More specifically, the impact of weighting (and selection) of attributes is studied as extensions to our former EBA method AQUA, which has shown promising results and also allows estimation in the case of data sets that have non-quantitative attributes and missing values. The new resulting method is called AQUA+. For attribute weighting, a qualitative analysis pre-step using rough set analysis (RSA) is performed. RSA is a proven machine learning technique for classification of objects. We exploit the RSA results in different ways and define four heuristics for attribute weighting. AQUA+ was evaluated in two ways: (1) comparison between AQUA+ and AQUA, along with the comparative analysis between the proposed four heuristics for AQUA+, (2) comparison of AQUA+ with other EBA methods. The main evaluation results are: (1) better estimation accuracy was obtained by AQUA+ compared to AQUA over all six data sets; and (2) AQUA+ obtained better results than, or very close to that of other EBA methods for the three data sets applied to all the EBA methods. In conclusion, the proposed attribute weighing method using RSA can improve the estimation accuracy of EBA method AQUA+ according to the empirical studies over six data sets. Testing more data sets is necessary to get results that are more statistical significant.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Boehm B (1981) Software engineering economics. Prentice-Hall, Englewood Cliffs, NJ

    MATH  Google Scholar 

  • Briand LC, Wieczorek I (2001) Resource estimation in software engineering. In: Marciniak JJ (ed) Encyclopedia of software engineering, 2nd edn. Wiley, New York

    Google Scholar 

  • Cartwright M, Shepperd M, Song Q (2003) Dealing with missing software project data. Proceedings of the 9th International Symposium on Software Metrics, Australia, pp 154–165 (September)

  • Chen Z, Boehm B, Menzies T, Port D (2005) Finding the right data for software cost modeling. IEEE Software 22(6):38–46

    Article  Google Scholar 

  • Chmielewski MR, Grzymala-Busse JW (1994) Global discretization of continuous attributes as preprocessing for machine learning. Third International Workshop on Rough Sets and Soft Computing, November, USA, pp 294–301

  • Conte SD, Dunsmore H, Shen VY (1986) Software engineering metrics and models. Benjamin-Cummings, Redwood City, CA

    Google Scholar 

  • Desharnais JM (1989) Analyse statistique de la productivitie des projets informatique a partie de la technique des point des fonction. Masters Thesis, University of Montreal

  • Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. Proceedings of 12th International Conference on Machine Learning, USA, pp 194–202

  • Efron B, Gong G (1983) A leisurely look at the bootstrap, the jackknife, and cross-validation. Am Stat 37(1):36–48

    Article  MathSciNet  Google Scholar 

  • Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion MMRE. IEEE Trans Softw Eng 29(11):985–995

    Article  Google Scholar 

  • Huang SJ, Chiu NH (2006) Optimization of analogy weights by genetic algorithm for software effort estimation. Inf Softw Technol 48(11):1034–1045

    Article  Google Scholar 

  • IDSS (2006) ROSE2, Institute of Computing Science, Poznañ University of Technology, http://idss.cs.put.poznan.pl/site/rose.html, November

  • ISBSG (2004) Data R8, International Software Benchmark and Standards Group, http://www.isbsg.org.

  • Jørgensen M, Shepperd M (2007) A systematic review of software development cost estimation studies. IEEE Trans Softw Eng 33(1):33–53

    Article  Google Scholar 

  • Jørgensen M, Indahl U, Sjøberg D (2003) Software effort estimation by analogy and regression toward the mean. J Syst Softw 68(3):253–262

    Article  Google Scholar 

  • Kadoda G, Michelle C, Chen L, Shepperd M (2000) Experiences using case-based reasoning to predict software project effort. Proceedings of EASE 2000—Fourth International Conference on Empirical Assessment and Evaluation in Software Engineering, UK (January)

  • Kemerer CF (1987) An empirical validation of software cost estimation models. Commun ACM 30(5):416–429

    Article  Google Scholar 

  • Kirsopp C, Shepperd M (2002) Case and feature subset selection in case-based software project effort prediction. Proc. 22nd SGAI Int’l Conf. Knowledge-Based Systems and Applied Artificial Intelligence (December)

  • Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324

    Article  MATH  Google Scholar 

  • Laplante PA, Neil CJ (2005) Modeling uncertainty in software engineering using rough sets. Innovations in Systems and Software Engineering 1(1):71–78

    Article  Google Scholar 

  • Leung HKN (2002) Estimating maintenance effort by analogy. Empirical Software Engineering 7(2):157–175

    Article  MATH  Google Scholar 

  • Li JZ, Ruhe G (2005) Data Set USP05, Software Engineering Decision Support Laboratory, University of Calgary, Canada. (Available: http://promisedata.org/repository/#usp05)

  • Li JZ, Ruhe G (2006) A comparative study of attribute weighting heuristics for effort estimation by analogy. Proceedings of ACM-IEEE International Symposium on Empirical Software Engineering (ISESE‘06), Brazil, pp 66–74 (September)

  • Li JZ, Ruhe G (2007) Decision support analysis for software effort estimation by analogy. Proceedings of ICSE 2007 Workshop on Predictor Models in Software Engineering, USA (May)

  • Li JZ, Ruhe G, Al-Emran A, Richter MM (2007) A flexible method for effort estimation by analogy. Empirical Software Engineering 12(1):65–106

    Article  Google Scholar 

  • Mendes E, Watson I, Chris T, Nile M, Steve CA (2003) A comparative study of cost estimation models for web hypermedia applications. Empirical Software Engineering 8(2):163–196

    Article  Google Scholar 

  • Menzies T, Chen Z, Hihn J, Lum K (2006) Selecting best practices for effort estimation. IEEE Trans Softw Eng 32(11):1–13

    Article  Google Scholar 

  • Moløkken K, Jørgensen M (2003) A review of software surveys on software effort estimation. Proceedings of ACM-IEEE International Symposium on Empirical Software Engineering (ISESE‘03), Italy, pp 223–230 (September)

  • Mukhopadhyay T, Vicinanza S, Prietula MJ (1992) Examining the feasibility of a case-based reasoning model for software effort estimation. MIS Quarterly 16(2):155–171

    Article  Google Scholar 

  • Myrtveit I, Stensrud E, Olsson UH (2001) Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods. IEEE Trans Softw Eng 27(11):999–1013

    Article  Google Scholar 

  • Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer, Boston, MA

  • Putnam LH (1978) A general empirical solution to the macro sizing and estimating problem. IEEE Trans Softw Eng 4(4):345–361

    Article  Google Scholar 

  • Ruhe G (1996) Rough sets based data analysis in goal oriented software measurement. Proceedings of the third International Symposium on Software Metrics (METRICS‘96), Germany, pp 10–19 (March)

  • Sayyad SJ, Menzies TJ (2005) The PROMISE repository of software engineering databases. School of Information Technology and Engineering, University of Ottawa, Canada. (Available: http://promise.site.uottawa.ca/SERepository)

  • Shepperd M, Schofield C (1997) Estimating software project effort using analogies. IEEE Trans Softw Eng 23:736–743

    Article  Google Scholar 

  • Shepperd M, Schofield C, Kitchenham B (1996) Effort estimation using analogy. Proceedings of the 18th International Conference on Software Engineering, Germany, pp 170–178 (March)

  • Song Q, Shepperd M, Mair C (2005) Using grey relational analysis to predict software effort with small data sets. METRICS‘05: Proceedings of the 11th IEEE International Software Metrics Symposium, Italy, pp. 35–45 (September)

  • Strike K et al (2001) Software cost estimation with incomplete data. IEEE Trans Softw Eng 27(10):890–908

    Article  Google Scholar 

  • Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco, CA

    MATH  Google Scholar 

  • Zhang M, Yao J (2004) A rough sets based approach to feature selection. Proceedings of the 23rd International Conference of NAFIPS, Canada, pp 434–439 (June)

  • Zhong N, Dong J (2001) Using rough sets with heuristics for feature selection. Journal of Intelligent Information Systems 16(3):199–214

    Article  MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Alberta Informatics Circle of Research Excellence (iCORE) for its financial support of this research. Thanks are also given to Jim McElroy for his contribution to the improvement of readability of this paper. Special thanks are given to the anonymous reviewers for their valuable and in-depth comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingzhou Li.

Additional information

Editor: José Carlo Maldonado

Appendices

Appendix A

1.1 Definition of Attributes in USP05-FT and USP05-RQ

Table 21 Definition of attributes

Appendix B

2.1 Detailed Results of the Comparative Study

Table 22 Results of USP05-FT
Table 23 Results of ISBSG04-2
Table 24 Results of Mends03
Table 25 Results of Kem87
Table 26 Results of Desh89

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Ruhe, G. Analysis of attribute weighting heuristics for analogy-based software effort estimation method AQUA+ . Empir Software Eng 13, 63–96 (2008). https://doi.org/10.1007/s10664-007-9054-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-007-9054-4

Keywords

Navigation