Skip to main content
Log in

A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Cost estimation and effort allocation are the key challenges for successful project planning and management in software development. Therefore, both industry and the research community have been working on various models and techniques to accurately predict the cost of projects. Recently, researchers have started debating whether the prediction performance depends on the structure of data rather than the models used. In this article, we focus on a new aspect of data homogeneity, “cross- versus within-application domain”, and investigate what kind of training data should be used for software cost estimation in the embedded systems domain. In addition, we try to find out the effect of training dataset size on the prediction performance. Based on our empirical results, we conclude that it is better to use cross-domain data for embedded software cost estimation and the optimum training data size depends on the method used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. This observation is consistent with other cost models such as COCOMO, where model parameters are determined from a diverse set of software projects. Though COCOMO is a generic model with predefined parameters, these can be fine tuned with local data.

References

  • Albrecht, A. J. (1979). Measuring application development productivity. In Proceedings of the joint SHARE, GUIDE, and IBM application development symposium, Monterey, CL, October 14–17 (pp. 83–92). IBM Corporation.

  • Alpaydin, E. (1998). Techniques for combining multiple learners. Proceedings of Engineering of Intelligent Systems, 2, 6–12.

    Google Scholar 

  • Alpaydin, E. (2004). Introduction to machine learning. Cambridge: MIT.

    Google Scholar 

  • Angelis, L., & Stamelos, I. (2000). A simulation tool for efficient analogy based cost estimation. Journal of Empirical Software Engineering, 5(1), 35–68. doi:10.1023/A:1009897800559.

    Article  Google Scholar 

  • Baskeles, B., Turhan, B., & Bener, A. (2007). Software effort estimation using machine learning methods. ISCIS, 2007, 1–6.

    Google Scholar 

  • Boehm, B. W. (1981). Software engineering economics. Advances in computer science and technology series. Borough: Prentice Hall PTR.

    Google Scholar 

  • Boehm, B. W. (1999). COCOMO II and COQUALMO data collection questionnaire. University of Southern California, Version 2.2.

  • Boehm, B. W. (2009). COCOMO II model definition manual. University of Southern California, Version 1.4. http://sunset.usc.edu/research/.

  • Boetticher, G. D. (2001). Using machine learning to predict project effort: Empirical case studies in data-starved domains. 1st International workshop on model-based requirements engineering, pp. 17–24.

  • Boetticher, G., Menzies, T., & Ostrand, T. (2007). PROMISE repository of empirical software engineering data. West Virginia University, Department of Computer Science. http://promisedata.org/repository.

  • Briand, L. C., Basili, V. R., & Thomas, W. M. (1992). A pattern recognition approach for software engineering data analysis. IEEE Transactions on Software Engineering, 18(11), 931–942. doi:10.1109/32.177363.

    Article  Google Scholar 

  • Brierley, P. (2009). http://www.philbrierley.com/main.html?code/matlab.html&code/codeleft.html.

  • Debardelaben, J. A., Madisetti, V. K., & Gadient, A. J. (1997). Incorporating cost modeling in embedded-system design. IEEE Design & Test of Computers, 14(3), 24–35. doi:10.1109/54.605989.

    Article  Google Scholar 

  • EstimatorPal. (2009). http://software.techrepublic.com.com/download.aspx?docid=236622.

  • Fausett, L. (1994). Fundamentals of neural networks. Borough: Prentice Hall.

    MATH  Google Scholar 

  • Foss, T., Stensrud, E., Kitchenham, B., & Myrtveit, I. (2003). A simulation study of the model evaluation criteria MMRE. IEEE Transactions on Software Engineering, 29(11), 985–995. doi:10.1109/TSE.2003.1245300.

    Article  Google Scholar 

  • Gunn, S. R. (1998). Support vector machines for classification and regression. Faculty of Engineering, Science and Mathematics, School of Electronics and Computer Science, Tech. Rep., May 1998 (online). Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.9736.

  • Igoodsoft. (2009). http://www.igoodsoft.com/sesdevelopment.asp.

  • Kitchenham, B. A., Mendes, E., & Travassos, G. H. (2007). Cross- vs. within-company cost estimation studies: A systematic review. IEEE Transactions on Software Engineering, 33(5), 316–329. doi:10.1109/TSE.2007.1001.

    Article  Google Scholar 

  • Kitchenham, B. A., Pickard, L. M., MacDonell, S. G., & Shepperd, M. J. (2001). What accuracy statistics really measure. IEEE Proceedings-Software, 148(3), 81–85. doi:10.1049/ip-sen:20010506.

    Article  Google Scholar 

  • Korte, M., & Port, D. (2008). Confidence in software cost estimation results. PROMISE, 2008, 63–70. doi:10.1145/1370788.1370804.

    Article  Google Scholar 

  • Leung, H., & Fan, Z. (2001). Software cost estimation. Handbook of software engineering and knowledge engineering. ftp://cs.pitt.edu/chang/handbook/42b.pdf.

  • Lokan, C., Wright, T., Hill, P. R., & Stringer, M. (2001). Organizational benchmarking using the ISBSG data repository. IEEE Software, 18(5), 26–32. doi:10.1109/52.951491.

    Article  Google Scholar 

  • Mason, A. K. & Sweeney, N. (1992). Parametric cost estimating with limited sample sizes. In Proceedings of the 3rd annual artificial intelligence symposium.

  • Menzies, T. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(1), 2–13. doi:10.1109/TSE.2007.256941.

    Article  Google Scholar 

  • Menzies, T., Chen, Z., Hihn, J., & Lum, K. (2006). Selecting best practices for effort estimation. IEEE Transactions on Software Engineering, 32(11), 883–895. doi:10.1109/TSE.2006.114.

    Article  Google Scholar 

  • Ohsugi, N., Monden, A., Kikuchi, N., Barker, M. D., Tsunoda, M., Kakimoto, T., & Matsumoto, K. (2007). Is this cost estimate reliable?—The relationship between homogeneity of analogues and estimation reliability. In 1st International symposium on empirical software engineering and measurement, ESEM 2007.

  • Oliveira, M. N., Martins, P. R. M., Barreto, R. S., & Carvalho, F. F. (2004). Towards a software power cost analysis framework using colored petri net. PATMOS 2004: International workshop on power and timing modeling, optimization and simulation, Santorini, Greece, Vol. 3254, pp. 362–371.

  • Perel, R. J. (1994). Mold cost estimator generator utilizing standard data and linear regression. In Proceedings of the regional technical conference of the society of plastic engineers, pp. GI–G19.

  • Premraj, R., & Zimmermann, T. (2007). Building software cost models using homogenous data. In ESEM ’07: Proceedings of the 1st empirical software engineering and measurement, Madrid, Spain, September 2007, IEEE, pp. 393–400.

  • Putnam, L. H. (1978). A general empirical solution to the macro software sizing and estimating problem. IEEE Transactions on Software Engineering, 4(4), 345–361. doi:10.1109/TSE.1978.231521.

    Google Scholar 

  • Ragan, D., Sandborn, P., & Stoaks, P. (2002). A detailed cost model for concurrent use with hardware/software co-design. DAC 2002, ACM, pp. 269–274.

  • SCEP. (2009). Software cost estimation program. http://www.retisoft.com/Products.html.

  • Shalabi, L. A., & Shaaban, Z. (2006). Normalization as a preprocessing engine for data mining and the approach of preference matrix. In IEEE proceedings of the international conference on dependability of computer systems (DEPCOS-RELCOMEX’06).

  • Shepperd, M., Schofield, C., & Kitchenham, B. (1996). Effort estimation using analogy. 18th International conference on software engineering (ICSE'96), p. 170.

  • Smola, A. J., & Schölkopf, B. (2003). A tutorial on support vector regression. NeuroCOLT Technical Report. http://eprints.pascal-network.org/archive/00002057/01/SmoSch03b.pdf.

  • SoftLab. (2009). Software Research Laboratory. Department of Computer Engineering, Bogazici University. http://softlab.boun.edu.tr.

  • Srinivasan, K., & Fisher, D. (1995). Machine learning approaches to estimating software development effort. IEEE Transactions on Software Engineering, 21(2), 126–137. doi:10.1109/32.345828.

    Article  Google Scholar 

  • Stensrud, E., & Myrtveit, I. (1998). Human performance estimating with analogy and regression models: An empirical validation. In Proceedings of 5th international metrics symposium. Bethesda, MD: IEEE Computer Society.

  • Tiwari, V., Malik, S., & Wolfe, A. (1994). Power analysis of embedded software: A first step towards software power minimization. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2(4), 437–445. doi: 10.1109/92.335012.

  • Vahid, F., & Givargis, T. D. (2002). Embedded system design: A unified hardware/software introduction. New York: Wiley.

    Google Scholar 

  • Walston, C. E., & Felix, C. P. (1977). A method of programming measurement and estimation. IBM Systems Journal, 16(1), 54–73.

    Article  Google Scholar 

  • Zotos, K., Litke, A., Chatzigeorgiou, A., Nikolaidis, S., Stephanides, G., & Giannakides (Greece), G. (2005). Energy complexity of software in embedded systems. From Proceeding (483) ACIT—Automation, Control, and Applications.

Download references

Acknowledgments

This research is supported in part by Boğaziçi University research fund under grant number BAP 06HA104 and by Tubitak EEEAG 108E014.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ayşe Bakır.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bakır, A., Turhan, B. & Bener, A.B. A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain. Software Qual J 18, 57–80 (2010). https://doi.org/10.1007/s11219-009-9081-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-009-9081-z

Keywords

Navigation