Abstract
Cost estimation and effort allocation are the key challenges for successful project planning and management in software development. Therefore, both industry and the research community have been working on various models and techniques to accurately predict the cost of projects. Recently, researchers have started debating whether the prediction performance depends on the structure of data rather than the models used. In this article, we focus on a new aspect of data homogeneity, “cross- versus within-application domain”, and investigate what kind of training data should be used for software cost estimation in the embedded systems domain. In addition, we try to find out the effect of training dataset size on the prediction performance. Based on our empirical results, we conclude that it is better to use cross-domain data for embedded software cost estimation and the optimum training data size depends on the method used.
Similar content being viewed by others
Notes
This observation is consistent with other cost models such as COCOMO, where model parameters are determined from a diverse set of software projects. Though COCOMO is a generic model with predefined parameters, these can be fine tuned with local data.
References
Albrecht, A. J. (1979). Measuring application development productivity. In Proceedings of the joint SHARE, GUIDE, and IBM application development symposium, Monterey, CL, October 14–17 (pp. 83–92). IBM Corporation.
Alpaydin, E. (1998). Techniques for combining multiple learners. Proceedings of Engineering of Intelligent Systems, 2, 6–12.
Alpaydin, E. (2004). Introduction to machine learning. Cambridge: MIT.
Angelis, L., & Stamelos, I. (2000). A simulation tool for efficient analogy based cost estimation. Journal of Empirical Software Engineering, 5(1), 35–68. doi:10.1023/A:1009897800559.
Baskeles, B., Turhan, B., & Bener, A. (2007). Software effort estimation using machine learning methods. ISCIS, 2007, 1–6.
Boehm, B. W. (1981). Software engineering economics. Advances in computer science and technology series. Borough: Prentice Hall PTR.
Boehm, B. W. (1999). COCOMO II and COQUALMO data collection questionnaire. University of Southern California, Version 2.2.
Boehm, B. W. (2009). COCOMO II model definition manual. University of Southern California, Version 1.4. http://sunset.usc.edu/research/.
Boetticher, G. D. (2001). Using machine learning to predict project effort: Empirical case studies in data-starved domains. 1st International workshop on model-based requirements engineering, pp. 17–24.
Boetticher, G., Menzies, T., & Ostrand, T. (2007). PROMISE repository of empirical software engineering data. West Virginia University, Department of Computer Science. http://promisedata.org/repository.
Briand, L. C., Basili, V. R., & Thomas, W. M. (1992). A pattern recognition approach for software engineering data analysis. IEEE Transactions on Software Engineering, 18(11), 931–942. doi:10.1109/32.177363.
Brierley, P. (2009). http://www.philbrierley.com/main.html?code/matlab.html&code/codeleft.html.
Debardelaben, J. A., Madisetti, V. K., & Gadient, A. J. (1997). Incorporating cost modeling in embedded-system design. IEEE Design & Test of Computers, 14(3), 24–35. doi:10.1109/54.605989.
EstimatorPal. (2009). http://software.techrepublic.com.com/download.aspx?docid=236622.
Fausett, L. (1994). Fundamentals of neural networks. Borough: Prentice Hall.
Foss, T., Stensrud, E., Kitchenham, B., & Myrtveit, I. (2003). A simulation study of the model evaluation criteria MMRE. IEEE Transactions on Software Engineering, 29(11), 985–995. doi:10.1109/TSE.2003.1245300.
Gunn, S. R. (1998). Support vector machines for classification and regression. Faculty of Engineering, Science and Mathematics, School of Electronics and Computer Science, Tech. Rep., May 1998 (online). Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.9736.
Igoodsoft. (2009). http://www.igoodsoft.com/sesdevelopment.asp.
Kitchenham, B. A., Mendes, E., & Travassos, G. H. (2007). Cross- vs. within-company cost estimation studies: A systematic review. IEEE Transactions on Software Engineering, 33(5), 316–329. doi:10.1109/TSE.2007.1001.
Kitchenham, B. A., Pickard, L. M., MacDonell, S. G., & Shepperd, M. J. (2001). What accuracy statistics really measure. IEEE Proceedings-Software, 148(3), 81–85. doi:10.1049/ip-sen:20010506.
Korte, M., & Port, D. (2008). Confidence in software cost estimation results. PROMISE, 2008, 63–70. doi:10.1145/1370788.1370804.
Leung, H., & Fan, Z. (2001). Software cost estimation. Handbook of software engineering and knowledge engineering. ftp://cs.pitt.edu/chang/handbook/42b.pdf.
Lokan, C., Wright, T., Hill, P. R., & Stringer, M. (2001). Organizational benchmarking using the ISBSG data repository. IEEE Software, 18(5), 26–32. doi:10.1109/52.951491.
Mason, A. K. & Sweeney, N. (1992). Parametric cost estimating with limited sample sizes. In Proceedings of the 3rd annual artificial intelligence symposium.
Menzies, T. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(1), 2–13. doi:10.1109/TSE.2007.256941.
Menzies, T., Chen, Z., Hihn, J., & Lum, K. (2006). Selecting best practices for effort estimation. IEEE Transactions on Software Engineering, 32(11), 883–895. doi:10.1109/TSE.2006.114.
Ohsugi, N., Monden, A., Kikuchi, N., Barker, M. D., Tsunoda, M., Kakimoto, T., & Matsumoto, K. (2007). Is this cost estimate reliable?—The relationship between homogeneity of analogues and estimation reliability. In 1st International symposium on empirical software engineering and measurement, ESEM 2007.
Oliveira, M. N., Martins, P. R. M., Barreto, R. S., & Carvalho, F. F. (2004). Towards a software power cost analysis framework using colored petri net. PATMOS 2004: International workshop on power and timing modeling, optimization and simulation, Santorini, Greece, Vol. 3254, pp. 362–371.
Perel, R. J. (1994). Mold cost estimator generator utilizing standard data and linear regression. In Proceedings of the regional technical conference of the society of plastic engineers, pp. GI–G19.
Premraj, R., & Zimmermann, T. (2007). Building software cost models using homogenous data. In ESEM ’07: Proceedings of the 1st empirical software engineering and measurement, Madrid, Spain, September 2007, IEEE, pp. 393–400.
Putnam, L. H. (1978). A general empirical solution to the macro software sizing and estimating problem. IEEE Transactions on Software Engineering, 4(4), 345–361. doi:10.1109/TSE.1978.231521.
Ragan, D., Sandborn, P., & Stoaks, P. (2002). A detailed cost model for concurrent use with hardware/software co-design. DAC 2002, ACM, pp. 269–274.
SCEP. (2009). Software cost estimation program. http://www.retisoft.com/Products.html.
Shalabi, L. A., & Shaaban, Z. (2006). Normalization as a preprocessing engine for data mining and the approach of preference matrix. In IEEE proceedings of the international conference on dependability of computer systems (DEPCOS-RELCOMEX’06).
Shepperd, M., Schofield, C., & Kitchenham, B. (1996). Effort estimation using analogy. 18th International conference on software engineering (ICSE'96), p. 170.
Smola, A. J., & Schölkopf, B. (2003). A tutorial on support vector regression. NeuroCOLT Technical Report. http://eprints.pascal-network.org/archive/00002057/01/SmoSch03b.pdf.
SoftLab. (2009). Software Research Laboratory. Department of Computer Engineering, Bogazici University. http://softlab.boun.edu.tr.
Srinivasan, K., & Fisher, D. (1995). Machine learning approaches to estimating software development effort. IEEE Transactions on Software Engineering, 21(2), 126–137. doi:10.1109/32.345828.
Stensrud, E., & Myrtveit, I. (1998). Human performance estimating with analogy and regression models: An empirical validation. In Proceedings of 5th international metrics symposium. Bethesda, MD: IEEE Computer Society.
Tiwari, V., Malik, S., & Wolfe, A. (1994). Power analysis of embedded software: A first step towards software power minimization. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2(4), 437–445. doi: 10.1109/92.335012.
Vahid, F., & Givargis, T. D. (2002). Embedded system design: A unified hardware/software introduction. New York: Wiley.
Walston, C. E., & Felix, C. P. (1977). A method of programming measurement and estimation. IBM Systems Journal, 16(1), 54–73.
Zotos, K., Litke, A., Chatzigeorgiou, A., Nikolaidis, S., Stephanides, G., & Giannakides (Greece), G. (2005). Energy complexity of software in embedded systems. From Proceeding (483) ACIT—Automation, Control, and Applications.
Acknowledgments
This research is supported in part by Boğaziçi University research fund under grant number BAP 06HA104 and by Tubitak EEEAG 108E014.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bakır, A., Turhan, B. & Bener, A.B. A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain. Software Qual J 18, 57–80 (2010). https://doi.org/10.1007/s11219-009-9081-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-009-9081-z