Skip to main content
Log in

A flexible method for software effort estimation by analogy

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Effort estimation by analogy uses information from former similar projects to predict the effort for a new project. Existing analogy-based methods are limited by their inability to handle non-quantitative data and missing values. The accuracy of predictions needs improvement as well. In this paper, we propose a new flexible method called AQUA that is able to overcome the limitations of former methods. AQUA combines ideas from two known analogy-based estimation techniques: case-based reasoning and collaborative filtering. The method is applicable to predict effort related to any object at the requirement, feature, or project levels. Which are the main contributions of AQUA when compared to other methods? First, AQUA supports non-quantitative data by defining similarity measures for different data types. Second, it is able to tolerate missing values. Third, the results from an explorative study in this paper shows that the prediction accuracy is sensitive to both the number N of analogies (similar objects) taken for adaptation and the threshold T for the degree of similarity, which is true especially for larger data sets. A fixed and small number of analogies, as assumed in existing analogy-based methods, may not produce the best accuracy of prediction. Fourth, a flexible mechanism based on learning of existing data is proposed for determining the appropriate values of N and T likely to offer the best accuracy of prediction. New criteria to measure the quality of prediction are proposed. AQUA was validated against two internal and one public domain data sets with non-quantitative attributes and missing values. The obtained results are encouraging. In addition, acomparative analysis with existing analogy-based estimation methods was conducted using three publicly available data sets that were used by these methods. Intwo of the three cases, AQUA outperformed all other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Angelis L, Stamelos I (2000) A simulation tool for efficient analogy based cost estimation. Empir Software Eng 5:35–68

    Article  Google Scholar 

  • Angelis L, Stamelos I, Morisio M (2001) Building a software cost estimation model based on categorical data. METRICS'01: Proceedings of the IEEE 7th International Symposium on Software Metrics. England, UK, pp 4–15

  • Basili VR, Caldiera G, Rombach HD (1994) The goal question metric approach. Encyclopedia of Software Engineering. John Wiley, Inc.

  • Boehm BW, Clark B, Horowitz E, Westland JC, Madachy RJ, Selby RW (1995) Cost models for future software life cycle processes: COCOMO 2.0. Ann Softw Eng 1:57–94

    Article  Google Scholar 

  • Briand LC, Wieczorek I (2001) Resource estimation in software engineering. In: Marciniak JJ (ed) Encyclopedia of software engineering (2nd edition). John Wiley, New York

    Google Scholar 

  • Burkhard H, Richter MM (2000) On the notion of similarity in case based reasoning and fuzzy theory. In: Pal S et al (eds) Soft-computing and case based reasoning. Springer Verlag

  • Conte SD, Dunsmore H, Shen VY (1986) Software engineering metrics and models. Benjamin-Cummings Publishing Co. Inc.

  • Efron B, Gong G (1983) A leisurely look at the bootstrap, the jackknife, and cross-validation. Am Stat 37(1):36–48

    Article  MathSciNet  Google Scholar 

  • Fenton NE, Pfleeger SL (1997) Software metrics: a rigorous & practical approach (2nd edition). PWS Publishing Company, Boston

    Google Scholar 

  • Frigge M, Hoaglin DC, Iglewicz B (1989) Some implementations of the boxplot. Am Stat 43(1):50–54

    Article  Google Scholar 

  • Herlocker JL et al (1999) An algorithmic framework for performing collaborative filtering. SIGIR'99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Berkley, USA

  • Idri A, Abran A (2001) A fuzzy logic based measures for software project similarity: validation and possible improvements. METRICS'01: Proceedings of the IEEE 7th International Symposium on Software Metrics. England, UK, pp 85–96

  • Idri A, Abran A et al (2002) Estimating software project effort by analogy based on linguistic values. METRICS'02: Pproceedings of the Eighth IEEE Symposium on Software Metrics. Ottawa, Canada, pp 21–30

  • ISBSG (2005) International software benchmark and standards group, Data R8, www.isbsg.org, October 18, 2005

  • Kadoda G, Michelle C, Chen L, Shepperd M (2000) Experiences using case-based reasoning to predict software project effort. EASE'2000—Fourth International Conference on Empirical Assessment and Evaluation in Software Engineering. Staffordshire, UK

  • Kemerer CF (1987) An empirical validation of software cost estimation models. Communication of the ACM 30(5):436–445

    Article  Google Scholar 

  • Leung HKN (2002) Estimating maintenance effort by analogy. Empir Software Eng 7(2):157–175

    Article  MATH  Google Scholar 

  • Mendes E, Watson I, Chris T, Nile M, Steve C (2003) A comparative study of cost estimation models for web hypermedia applications. Empir Software Eng 8(2):163–196

    Article  Google Scholar 

  • Mukhopadhyay T, Vicinanza S, Prietula MJ (1992) Examining the feasibility of a case-based reasoning model for software effort estimation. MIS Quarterly 16(2):155–171

    Article  Google Scholar 

  • Myrtveit I, Stensrud E, Shepperd M (2005) Reliability and validity in comparative studies of software prediction models. IEEE Trans Softw Eng 31(5):380–391

    Article  Google Scholar 

  • Ohsugi N et al (2004) Applying collaborative filtering for effort estimation with process metrics. PROFES'04: 5th International Conference on Product Focused Software Process Improvement, LNCS 3009, Japan

  • Pawlak Z (1991) Rough set: theoretical aspects of reasoning about data. Kluwer

  • Richter MM (1995) On the notion of similarity in case-based reasoning. In: Della Riccia G et al (ed) Mathematical and statistical methods in artificial intelligence. Springer, Berlin Heidelberg New York

    Google Scholar 

  • Ruhe M, Jeffery R, Wieczorek I (2003) Cost estimation for web application. ICSE 2003: Proceedings of 25th International Conference on Software Engineering. Oregon, USA, pp 285–294

  • Sarwar B et al (2001) Item-based collaborative filtering recommendation algorithms. ACM WWW'01: Proceedings of the Tenth International Conference on World Wide Web. Hong Kong, pp 285–295

  • Shepperd M, Schofield C (1997) Estimating software project effort using analogies. IEEE Trans Softw Eng 23(12):736–743

    Article  Google Scholar 

  • Song Q, Shepperd M, Mair C (2005) Using grey relational analysis to predict software effort with small data sets. METRICS'05: Proceedings of the 11th IEEE International Software Metrics Symposium. Como, Italy, pp 35–45

  • Tautz C, Althoff K, Nick M (2000) A case-based reasoning approach for managing qualitative experience. 17th National Conference on Artificial Intelligence (AAAI-00) Workshop on Intelligent Lessons Learned Systems. Austin, Texas

  • Turner CR, Fuggetta A, Lavazza L, Wolf AL (1999) A conceptual basis for feature engineering. J Syst Softw 49(1):3–15

    Article  Google Scholar 

  • Walkerden F, Jeffery R (1999) An empirical study of analogy-based software effort estimation. Empir Software Eng 4(2):135–158

    Article  Google Scholar 

  • Wangenheim CG, Althoff K, Barcia RM (2000) Goal-oriented and similarity-based retrieval of software engineering experienceware. In: Ruhe G, Bomarius F (eds) Learning software organizations—methodology and applications. Lecture Notes in Computer Science 1756, Springer Verlag

  • Watson I (1997) Applying case-based reasoning: techniques for enterprise systems. Morgan Kaufmann, San Francisco, CA

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingzhou Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Ruhe, G., Al-Emran, A. et al. A flexible method for software effort estimation by analogy. Empir Software Eng 12, 65–106 (2007). https://doi.org/10.1007/s10664-006-7552-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-006-7552-4

Keywords

Navigation