Hyper-Parameter Tuning of Classification and Regression Trees for Software Effort Estimation

Villalobos-Arias, Leonardo; Quesada-López, Christian; Martínez, Alexandra; Jenkins, Marcelo

doi:10.1007/978-3-030-72660-7_56

Leonardo Villalobos-Arias¹⁹,
Christian Quesada-López¹⁹,
Alexandra Martínez¹⁹ &
…
Marcelo Jenkins¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1367))

Included in the following conference series:

World Conference on Information Systems and Technologies

1164 Accesses
3 Citations

Abstract

Classification and regression trees (CART) have been reported to be competitive machine learning algorithms for software effort estimation. In this work, we analyze the impact of hyper-parameter tuning on the accuracy and stability of CART using the grid search, random search, and DODGE approaches. We compared the results of CART with support vector regression (SVR) and ridge regression (RR) models. Results show that tuning improves the performance of CART models up to a maximum of 0.153 standardized accuracy and reduce its stability radio to a minimum of 0.819. Also, CART proved to be as competitive as SVR and outperformed RR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agrawal, A., Yang, X., Agrawal, R., Shen, X., Menzies, T.: Simpler hyperparameter optimization for software analytics: why, how, when? arXiv preprint arXiv:2008.07334 (2020)
Albon, C.: Machine Learning with Python Cookbook: Practical Solutions from Preprocessing to Deep Learning. O’Reilly Media, Inc., Newton (2018)
Google Scholar
Azzeh, M.: Software effort estimation based on optimized model tree. In: Proceedings of the 7th International Conference on Predictive Models in Software Engineering, pp. 1–8 (2011)
Google Scholar
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(Feb), 281–305 (2012)
Google Scholar
Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, pp. 2546–2554 (2011)
Google Scholar
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press, Boca Raton (1984)
MATH Google Scholar
Corazza, A., Di Martino, S., Ferrucci, F., Gravino, C., Sarro, F., Mendes, E.: How effective is tabu search to configure support vector regression for effort estimation? In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering, pp. 1–10 (2010)
Google Scholar
Dejaeger, K., Verbeke, W., Martens, D., Baesens, B.: Data mining techniques for software effort estimation: a comparative study. IEEE Trans. Software Eng. 38(2), 375–397 (2011)
Article Google Scholar
Ertuğrul, E., Baytar, Z., Çatal, Ç., Muratli, Ö.C.: Performance tuning for machine learning-based software development effort prediction models. Turkish J. Electr. Eng. Comput. Sci. 27(2), 1308–1324 (2019)
Article Google Scholar
Fu, W., Menzies, T., Shen, X.: Tuning for software analytics: is it really necessary? Inf. Softw. Technol. 76, 135–146 (2016)
Article Google Scholar
González-Ladrón-de Guevara, F., Fernández-Diego, M., Lokan, C.: The usage of ISBSG data fields in software effort estimation: a systematic mapping study. J. Syst. Softw. 113, 188–215 (2016)
Article Google Scholar
Huang, J., Li, Y.F., Xie, M.: An empirical analysis of data preprocessing for machine learning-based software cost estimation. Inf. Softw. Technol. 67, 108–127 (2015)
Article Google Scholar
Langdon, W.B., Dolado, J., Sarro, F., Harman, M.: Exact mean absolute error of baseline predictor, marp0. Inf. Softw. Technol. 73, 16–18 (2016)
Article Google Scholar
Malgonde, O., Chari, K.: An ensemble-based model for predicting agile software development effort. Empir. Softw. Eng. 24(2), 1017–1055 (2019)
Article Google Scholar
Minku, L.L.: A novel online supervised hyperparameter tuning procedure applied to cross-company software effort estimation. Empirical Softw. Eng. 24, 1–52 (2019)
Article Google Scholar
Najm, A., Marzak, A., Zakrani, A.: Systematic review study of decision trees based software development effort estimation. Organization 11(7) (2020)
Google Scholar
Quesada-López, C., Murillo-Morera, J., Jenkins, M.: Un estudio comparativo de técnicas de minería de datos y aprendizaje máquina para la estimación del esfuerzo utilizando puntos de función. Revista Ibérica de Sistemas e Tecnologias de Informação (E17), 595–609 (2019)
Google Scholar
Scott, A.J., Knott, M.: A cluster analysis method for grouping means in the analysis of variance. Biometrics 507–512 (1974)
Google Scholar
Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)
Book Google Scholar
Shepperd, M., MacDonell, S.: Evaluating prediction systems in software project estimation. Inf. Softw. Technol. 54(8), 820–827 (2012)
Article Google Scholar
Song, L., Minku, L.L., Yao, X.: The impact of parameter tuning on software effort estimation using learning machines. In: Proceedings of the 9th International Conference on Predictive Models in Software Engineering, pp. 1–10 (2013)
Google Scholar
Song, L., Minku, L.L., Yao, X.: The potential benefit of relevance vector machine to software effort estimation. In: Proceedings of the 10th International Conference on Predictive Models in Software Engineering, pp. 52–61 (2014)
Google Scholar
Song, L., Minku, L.L., Yao, X.: Software effort interval prediction via Bayesian inference and synthetic bootstrap resampling. ACM Trans. Softw. Eng. Methodol. (TOSEM) 28(1), 1–46 (2019)
Article Google Scholar
Tantithamthavorn, C., McIntosh, S., Hassan, A.E., Matsumoto, K.: The impact of automated parameter optimization on defect prediction models. IEEE Trans. Software Eng. 45(7), 683–711 (2018)
Article Google Scholar
Villalobos-Arias, L., Quesada-López, C., Guevara-Coto, J., Martínez, A., Jenkins, M.: Evaluating hyper-parameter tuning using random search in support vector machines for software effort estimation. In: Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE 2020). ACM (2020)
Google Scholar
Wen, J., Li, S., Lin, Z., Hu, Y., Huang, C.: Systematic literature review of machine learning based software development effort estimation models. Inf. Softw. Technol. 54(1), 41–59 (2012)
Article Google Scholar
Xia, T., Krishna, R., Chen, J., Mathew, G., Shen, X., Menzies, T.: Hyperparameter optimization for effort estimation. arXiv preprint arXiv:1805.00336 (2018)
Yang, L., Shami, A.: On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415, 295–316 (2020)
Article Google Scholar

Download references

Acknowledgments

This work was supported by project No. 834-B8-A27 at the University of Costa Rica (ECCI-CITIC).

Author information

Authors and Affiliations

Universidad de Costa Rica, San Pedro, Costa Rica
Leonardo Villalobos-Arias, Christian Quesada-López, Alexandra Martínez & Marcelo Jenkins

Authors

Leonardo Villalobos-Arias
View author publications
You can also search for this author in PubMed Google Scholar
Christian Quesada-López
View author publications
You can also search for this author in PubMed Google Scholar
Alexandra Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Jenkins
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leonardo Villalobos-Arias .

Editor information

Editors and Affiliations

ISEG, University of Lisbon, LISBOA, Portugal
Álvaro Rocha
College of Engineering, The Ohio State University, Columbus, OH, USA
Hojjat Adeli
Institute of Data Science and Digital Te, Vilnius University, Vilnius, Lithuania
Gintautas Dzemyda
DCT, Universidade Portucalense, Porto, Portugal
Fernando Moreira
Department of Information Sciences, University of Sheffield, Lisboa, Portugal
Ana Maria Ramalho Correia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Villalobos-Arias, L., Quesada-López, C., Martínez, A., Jenkins, M. (2021). Hyper-Parameter Tuning of Classification and Regression Trees for Software Effort Estimation. In: Rocha, Á., Adeli, H., Dzemyda, G., Moreira, F., Ramalho Correia, A.M. (eds) Trends and Applications in Information Systems and Technologies . WorldCIST 2021. Advances in Intelligent Systems and Computing, vol 1367. Springer, Cham. https://doi.org/10.1007/978-3-030-72660-7_56

Download citation

DOI: https://doi.org/10.1007/978-3-030-72660-7_56
Published: 29 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72659-1
Online ISBN: 978-3-030-72660-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics