Skip to main content
Log in

LSEbA: least squares regression and estimation by analogy in a semi-parametric model for software cost estimation

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

The importance of Software Cost Estimation at the early stages of the development life cycle is clearly portrayed by the utilization of several models and methods, appeared so far in the literature. The researchers’ interest has been focused on two well known techniques, namely the parametric Regression Analysis and the non-parametric Estimation by Analogy. Despite the several comparison studies, there seems to be a discrepancy in choosing the best prediction technique between them. In this paper, we introduce a semi-parametric technique, called LSEbA that achieves to combine the aforementioned methods retaining the advantages of both approaches. Furthermore, the proposed method is consistent with the mixed nature of Software Cost Estimation data and takes advantage of the whole pure information of the dataset even if there is a large amount of missing values. The paper analytically illustrates the process of building such a model and presents the experimentation on three representative datasets verifying the benefits of the proposed model in terms of accuracy, bias and spread. Comparisons of LSEbA with linear regression, estimation by analogy and a combination of them, based on the average of their outcomes are made through accuracy metrics, statistical tests and a graphical tool, the Regression Error Characteristic curves.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Angelis L, Stamelos I, Morisio M (2001) Building a software cost estimation model based on categorical data. Proceedings of the IEEE 8th International Symposium on Software Metrics, pp. 4–15

  • Anglin P, Gencay R (1996) Semiparametric estimation of a hedonic price function. J Appl Econ 11(6):633–648

    Article  Google Scholar 

  • Bi J, Bennet K-P (2003) Regression error characteristics curves. Proceedings of the AIII 20th International Conference on Machine Learning, pp. 43–50

  • Briand L, Langley T, Wieczorek I (2000) A replicated assessment and comparison of common software cost modeling techniques. Proceedings of the IEEE International Conference Software Engineering, pp. 377–386

  • Cartwright MH, Shepperd MJ, Song Q (2003) Dealing with missing software project data Proceedings of the METRICS, pp. 154–165

  • Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion MMRE. IEEE Trans Softw Eng 29(11):985–995

    Article  Google Scholar 

  • Hardle W (1990) Applied non-parametric regression. Economics Society Monographs, Cambridge University Press

  • Hardle W, Liang H, Gao J (2000) Partially linear models. Physica-Verlag, Heidelberg

    Google Scholar 

  • ISBSG Dataset 10 (2007), http://www.isbsg.org

  • Jorgensen M, Shepperd M (2007) A systematic review of software development cost estimation studies. IEEE Trans Softw Eng 33(1):33–53

    Article  Google Scholar 

  • Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. John Wiley, New York

    Google Scholar 

  • Kitchenham B (1998) A procedure for analyzing unbalanced datasets. IEEE Trans Softw Eng 24(4):278–301

    Article  Google Scholar 

  • Kitchenham B, Mendes E (2004) A comparison of cross-company and within-company effort estimation models for web applications. Proceedings of the Empirical Assessment in Software Engineering, pp. 47–55

  • Kitchenham B, Pickard L, MacDonell S, Shepperd M (2001) What accuracy statistics really measure. IEE Proc Software 148(3):81–85

    Article  Google Scholar 

  • Kitchenham B, Pfleeger L, McColl B, Eagan S (2002) A case study of maintenance estimation accuracy. J Syst Softw 64(1):57–77

    Article  Google Scholar 

  • Korte M, Port D (2008) Confidence in software cost estimation results based on mmre and pred. Proceedings of the 4th ACM International Workshop on Predictor Models in Software Engineering, pp. 63–70

  • Liebchen G, Shepperd M (2008) Data sets and data quality in software engineering. Proceedings of the 4th ACM International Workshop on Predictor Models in Software Engineering, pp. 39–44

  • Lokan C, Mendes E (2006) Cross-company and single-company effort models using the ISBSG database: a further replicated study. Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering, pp. 75–84

  • MacDonell S, Shepperd M (2003) Combining techniques to optimize effort predictions in software project management. J Syst Softw 66(2):91–98

    Article  Google Scholar 

  • Mair C, Shepperd M (2005) The consistency of empirical comparisons of regression and analogy-based software project cost prediction. Proceedings of the International Symposium on Empirical Software Engineering, pp. 509–518

  • Mendes E, Kitchenham BA (2004) Further comparison of cross-company and within company effort estimation models for web applications. Proceedings of the 10th IEEE International Symposium on Software Metrics, pp. 348–357

  • Mendes E, Lokan C (2008) Replicating studies on cross—vs single-company effort models using the ISBSG database. Emp Softw Eng 13(1):3–37

    Article  Google Scholar 

  • Mendes E, Lokan C, Harrison R, Triggs C (2005) A replicated comparison of cross-company and within-company effort estimation models using the ISBSG database. Proceedings of the IEEE 11th International Software Metrics Symposium, pp. 36–46

  • Mittas N, Athanasiades M, Angelis L (2008) Improving analogy-based software cost estimation by a resampling method. Inform Softw Technol 50(3):221–230

    Article  Google Scholar 

  • Mittas N, Angelis L (2008a) Combining regression and estimation by analogy in a semi-parametric model for software cost estimation. Proceedings of the ACM-IEEE 2nd International Symposium on Empirical Software Engineering and Management, pp. 70–79

  • Mittas N, Angelis L (2008b) Comparing cost prediction models by resampling techniques. J Syst Softw 81(5):616–632

    Article  Google Scholar 

  • Mittas N, Angelis L (2008c) Comparing software cost prediction models by a visualization tool. Proceedings of the IEEE 34th Euromicro Conference on Software Engineering and Advanced Applications, pp. 433–440

  • Myrtveit I, Stensrud E, Olsson U (2001) Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods. IEEE Trans Softw Eng 27(11):999–1013

    Article  Google Scholar 

  • Myrtveit I, Stensrud E, Shepperd M (2005) Reliability and validity in comparative studies of software prediction models. IEEE Trans Softw Eng 31(5):380–391

    Article  Google Scholar 

  • NASA93 (2007) Dataset, http://promisedata.org/repository/#nasa93. (NASA93 2007)

  • Port D, Korte M (2008) Comparative studies of the model evaluation criterions mmre and pred in software cost estimation research. Proceedings of the ACM-IEEE 2nd International Symposium on Empirical Software Engineering and Management, pp. 51–60

  • Robinson P (1988) Root-n-consistent semiparametric regression. Econometrica 56(4):931–954

    Article  MATH  MathSciNet  Google Scholar 

  • Sentas P, Angelis L, Stamelos I, Bleris G (2005) Software productivity and effort prediction with ordinal regression. Inform Softw Technol 47:17–29

    Article  Google Scholar 

  • Shepperd M, Schofield C (1997) Estimating software project effort using analogies. IEEE Trans Softw Eng 23(11):736–743

    Article  Google Scholar 

  • Sheskin DJ (2004) Handbook of parametric and nonparametric statistical procedures (Third Edition) Chapman & Hall/CRC

  • Strike K, Emam KE, Madhavji N (2001) Software cost estimation with incomplete data. IEEE Trans Softw Eng 27(10):890–908

    Article  Google Scholar 

  • Wissmann M, Toutenburg H, Shalabh (2007) Role of categorical variables in multicollinearity in the linear regression model. Technical Report, Number 008, Department of Statistics, University of Munich

Download references

Acknowledgement

We would like to thank the reviewers and the editor for their valuable comments which helped us to improve the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lefteris Angelis.

Additional information

Editor: Emilia Mendes

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mittas, N., Angelis, L. LSEbA: least squares regression and estimation by analogy in a semi-parametric model for software cost estimation. Empir Software Eng 15, 523–555 (2010). https://doi.org/10.1007/s10664-010-9128-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-010-9128-6

Keywords

Navigation