A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain

Bakır, Ayşe; Turhan, Burak; Bener, Ayşe B.

doi:10.1007/s11219-009-9081-z

A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain

Published: 05 July 2009

Volume 18, pages 57–80, (2010)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Ayşe Bakır¹,
Burak Turhan² &
Ayşe B. Bener¹

397 Accesses
29 Citations
Explore all metrics

Abstract

Cost estimation and effort allocation are the key challenges for successful project planning and management in software development. Therefore, both industry and the research community have been working on various models and techniques to accurately predict the cost of projects. Recently, researchers have started debating whether the prediction performance depends on the structure of data rather than the models used. In this article, we focus on a new aspect of data homogeneity, “cross- versus within-application domain”, and investigate what kind of training data should be used for software cost estimation in the embedded systems domain. In addition, we try to find out the effect of training dataset size on the prediction performance. Based on our empirical results, we conclude that it is better to use cross-domain data for embedded software cost estimation and the optimum training data size depends on the method used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data collection and quality challenges in deep learning: a data-centric AI perspective

Article 03 January 2023

Ethics in the Software Development Process: from Codes of Conduct to Ethical Deliberation

Article Open access 21 April 2021

Sampling in software engineering research: a critical review and guidelines

Article 28 April 2022

Notes

This observation is consistent with other cost models such as COCOMO, where model parameters are determined from a diverse set of software projects. Though COCOMO is a generic model with predefined parameters, these can be fine tuned with local data.

References

Albrecht, A. J. (1979). Measuring application development productivity. In Proceedings of the joint SHARE, GUIDE, and IBM application development symposium, Monterey, CL, October 14–17 (pp. 83–92). IBM Corporation.
Alpaydin, E. (1998). Techniques for combining multiple learners. Proceedings of Engineering of Intelligent Systems, 2, 6–12.
Google Scholar
Alpaydin, E. (2004). Introduction to machine learning. Cambridge: MIT.
Google Scholar
Angelis, L., & Stamelos, I. (2000). A simulation tool for efficient analogy based cost estimation. Journal of Empirical Software Engineering, 5(1), 35–68. doi:10.1023/A:1009897800559.
Article Google Scholar
Baskeles, B., Turhan, B., & Bener, A. (2007). Software effort estimation using machine learning methods. ISCIS, 2007, 1–6.
Google Scholar
Boehm, B. W. (1981). Software engineering economics. Advances in computer science and technology series. Borough: Prentice Hall PTR.
Google Scholar
Boehm, B. W. (1999). COCOMO II and COQUALMO data collection questionnaire. University of Southern California, Version 2.2.
Boehm, B. W. (2009). COCOMO II model definition manual. University of Southern California, Version 1.4. http://sunset.usc.edu/research/.
Boetticher, G. D. (2001). Using machine learning to predict project effort: Empirical case studies in data-starved domains. 1st International workshop on model-based requirements engineering, pp. 17–24.
Boetticher, G., Menzies, T., & Ostrand, T. (2007). PROMISE repository of empirical software engineering data. West Virginia University, Department of Computer Science. http://promisedata.org/repository.
Briand, L. C., Basili, V. R., & Thomas, W. M. (1992). A pattern recognition approach for software engineering data analysis. IEEE Transactions on Software Engineering, 18(11), 931–942. doi:10.1109/32.177363.
Article Google Scholar
Brierley, P. (2009). http://www.philbrierley.com/main.html?code/matlab.html&code/codeleft.html.
Debardelaben, J. A., Madisetti, V. K., & Gadient, A. J. (1997). Incorporating cost modeling in embedded-system design. IEEE Design & Test of Computers, 14(3), 24–35. doi:10.1109/54.605989.
Article Google Scholar
EstimatorPal. (2009). http://software.techrepublic.com.com/download.aspx?docid=236622.
Fausett, L. (1994). Fundamentals of neural networks. Borough: Prentice Hall.
MATH Google Scholar
Foss, T., Stensrud, E., Kitchenham, B., & Myrtveit, I. (2003). A simulation study of the model evaluation criteria MMRE. IEEE Transactions on Software Engineering, 29(11), 985–995. doi:10.1109/TSE.2003.1245300.
Article Google Scholar
Gunn, S. R. (1998). Support vector machines for classification and regression. Faculty of Engineering, Science and Mathematics, School of Electronics and Computer Science, Tech. Rep., May 1998 (online). Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.9736.
Igoodsoft. (2009). http://www.igoodsoft.com/sesdevelopment.asp.
Kitchenham, B. A., Mendes, E., & Travassos, G. H. (2007). Cross- vs. within-company cost estimation studies: A systematic review. IEEE Transactions on Software Engineering, 33(5), 316–329. doi:10.1109/TSE.2007.1001.
Article Google Scholar
Kitchenham, B. A., Pickard, L. M., MacDonell, S. G., & Shepperd, M. J. (2001). What accuracy statistics really measure. IEEE Proceedings-Software, 148(3), 81–85. doi:10.1049/ip-sen:20010506.
Article Google Scholar
Korte, M., & Port, D. (2008). Confidence in software cost estimation results. PROMISE, 2008, 63–70. doi:10.1145/1370788.1370804.
Article Google Scholar
Leung, H., & Fan, Z. (2001). Software cost estimation. Handbook of software engineering and knowledge engineering. ftp://cs.pitt.edu/chang/handbook/42b.pdf.
Lokan, C., Wright, T., Hill, P. R., & Stringer, M. (2001). Organizational benchmarking using the ISBSG data repository. IEEE Software, 18(5), 26–32. doi:10.1109/52.951491.
Article Google Scholar
Mason, A. K. & Sweeney, N. (1992). Parametric cost estimating with limited sample sizes. In Proceedings of the 3rd annual artificial intelligence symposium.
Menzies, T. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(1), 2–13. doi:10.1109/TSE.2007.256941.
Article Google Scholar
Menzies, T., Chen, Z., Hihn, J., & Lum, K. (2006). Selecting best practices for effort estimation. IEEE Transactions on Software Engineering, 32(11), 883–895. doi:10.1109/TSE.2006.114.
Article Google Scholar
Ohsugi, N., Monden, A., Kikuchi, N., Barker, M. D., Tsunoda, M., Kakimoto, T., & Matsumoto, K. (2007). Is this cost estimate reliable?—The relationship between homogeneity of analogues and estimation reliability. In 1st International symposium on empirical software engineering and measurement, ESEM 2007.
Oliveira, M. N., Martins, P. R. M., Barreto, R. S., & Carvalho, F. F. (2004). Towards a software power cost analysis framework using colored petri net. PATMOS 2004: International workshop on power and timing modeling, optimization and simulation, Santorini, Greece, Vol. 3254, pp. 362–371.
Perel, R. J. (1994). Mold cost estimator generator utilizing standard data and linear regression. In Proceedings of the regional technical conference of the society of plastic engineers, pp. GI–G19.
Premraj, R., & Zimmermann, T. (2007). Building software cost models using homogenous data. In ESEM ’07: Proceedings of the 1st empirical software engineering and measurement, Madrid, Spain, September 2007, IEEE, pp. 393–400.
Putnam, L. H. (1978). A general empirical solution to the macro software sizing and estimating problem. IEEE Transactions on Software Engineering, 4(4), 345–361. doi:10.1109/TSE.1978.231521.
Google Scholar
Ragan, D., Sandborn, P., & Stoaks, P. (2002). A detailed cost model for concurrent use with hardware/software co-design. DAC 2002, ACM, pp. 269–274.
SCEP. (2009). Software cost estimation program. http://www.retisoft.com/Products.html.
Shalabi, L. A., & Shaaban, Z. (2006). Normalization as a preprocessing engine for data mining and the approach of preference matrix. In IEEE proceedings of the international conference on dependability of computer systems (DEPCOS-RELCOMEX’06).
Shepperd, M., Schofield, C., & Kitchenham, B. (1996). Effort estimation using analogy. 18th International conference on software engineering (ICSE'96), p. 170.
Smola, A. J., & Schölkopf, B. (2003). A tutorial on support vector regression. NeuroCOLT Technical Report. http://eprints.pascal-network.org/archive/00002057/01/SmoSch03b.pdf.
SoftLab. (2009). Software Research Laboratory. Department of Computer Engineering, Bogazici University. http://softlab.boun.edu.tr.
Srinivasan, K., & Fisher, D. (1995). Machine learning approaches to estimating software development effort. IEEE Transactions on Software Engineering, 21(2), 126–137. doi:10.1109/32.345828.
Article Google Scholar
Stensrud, E., & Myrtveit, I. (1998). Human performance estimating with analogy and regression models: An empirical validation. In Proceedings of 5th international metrics symposium. Bethesda, MD: IEEE Computer Society.
Tiwari, V., Malik, S., & Wolfe, A. (1994). Power analysis of embedded software: A first step towards software power minimization. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2(4), 437–445. doi: 10.1109/92.335012.
Vahid, F., & Givargis, T. D. (2002). Embedded system design: A unified hardware/software introduction. New York: Wiley.
Google Scholar
Walston, C. E., & Felix, C. P. (1977). A method of programming measurement and estimation. IBM Systems Journal, 16(1), 54–73.
Article Google Scholar
Zotos, K., Litke, A., Chatzigeorgiou, A., Nikolaidis, S., Stephanides, G., & Giannakides (Greece), G. (2005). Energy complexity of software in embedded systems. From Proceeding (483) ACIT—Automation, Control, and Applications.

Download references

Acknowledgments

This research is supported in part by Boğaziçi University research fund under grant number BAP 06HA104 and by Tubitak EEEAG 108E014.

Author information

Authors and Affiliations

Department of Computer Engineering, Boğaziçi University, 34342, Bebek, Istanbul, Turkey
Ayşe Bakır & Ayşe B. Bener
Software Engineering Group, Institute for Information Technology, National Research Council of Canada, 1200 Montreal Road, Building M50, Ottawa, ON, K1A0R6, Canada
Burak Turhan

Authors

Ayşe Bakır
View author publications
You can also search for this author in PubMed Google Scholar
Burak Turhan
View author publications
You can also search for this author in PubMed Google Scholar
Ayşe B. Bener
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ayşe Bakır.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bakır, A., Turhan, B. & Bener, A.B. A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain. Software Qual J 18, 57–80 (2010). https://doi.org/10.1007/s11219-009-9081-z

Download citation

Published: 05 July 2009
Issue Date: March 2010
DOI: https://doi.org/10.1007/s11219-009-9081-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain

Abstract

Access this article

Similar content being viewed by others

Data collection and quality challenges in deep learning: a data-centric AI perspective

Ethics in the Software Development Process: from Codes of Conduct to Ethical Deliberation

Sampling in software engineering research: a critical review and guidelines

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain

Abstract

Access this article

Similar content being viewed by others

Data collection and quality challenges in deep learning: a data-centric AI perspective

Ethics in the Software Development Process: from Codes of Conduct to Ethical Deliberation

Sampling in software engineering research: a critical review and guidelines

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation