Abstract
Classification and regression trees (CART) have been reported to be competitive machine learning algorithms for software effort estimation. In this work, we analyze the impact of hyper-parameter tuning on the accuracy and stability of CART using the grid search, random search, and DODGE approaches. We compared the results of CART with support vector regression (SVR) and ridge regression (RR) models. Results show that tuning improves the performance of CART models up to a maximum of 0.153 standardized accuracy and reduce its stability radio to a minimum of 0.819. Also, CART proved to be as competitive as SVR and outperformed RR.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agrawal, A., Yang, X., Agrawal, R., Shen, X., Menzies, T.: Simpler hyperparameter optimization for software analytics: why, how, when? arXiv preprint arXiv:2008.07334 (2020)
Albon, C.: Machine Learning with Python Cookbook: Practical Solutions from Preprocessing to Deep Learning. O’Reilly Media, Inc., Newton (2018)
Azzeh, M.: Software effort estimation based on optimized model tree. In: Proceedings of the 7th International Conference on Predictive Models in Software Engineering, pp. 1–8 (2011)
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(Feb), 281–305 (2012)
Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, pp. 2546–2554 (2011)
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press, Boca Raton (1984)
Corazza, A., Di Martino, S., Ferrucci, F., Gravino, C., Sarro, F., Mendes, E.: How effective is tabu search to configure support vector regression for effort estimation? In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering, pp. 1–10 (2010)
Dejaeger, K., Verbeke, W., Martens, D., Baesens, B.: Data mining techniques for software effort estimation: a comparative study. IEEE Trans. Software Eng. 38(2), 375–397 (2011)
Ertuğrul, E., Baytar, Z., Çatal, Ç., Muratli, Ö.C.: Performance tuning for machine learning-based software development effort prediction models. Turkish J. Electr. Eng. Comput. Sci. 27(2), 1308–1324 (2019)
Fu, W., Menzies, T., Shen, X.: Tuning for software analytics: is it really necessary? Inf. Softw. Technol. 76, 135–146 (2016)
González-Ladrón-de Guevara, F., Fernández-Diego, M., Lokan, C.: The usage of ISBSG data fields in software effort estimation: a systematic mapping study. J. Syst. Softw. 113, 188–215 (2016)
Huang, J., Li, Y.F., Xie, M.: An empirical analysis of data preprocessing for machine learning-based software cost estimation. Inf. Softw. Technol. 67, 108–127 (2015)
Langdon, W.B., Dolado, J., Sarro, F., Harman, M.: Exact mean absolute error of baseline predictor, marp0. Inf. Softw. Technol. 73, 16–18 (2016)
Malgonde, O., Chari, K.: An ensemble-based model for predicting agile software development effort. Empir. Softw. Eng. 24(2), 1017–1055 (2019)
Minku, L.L.: A novel online supervised hyperparameter tuning procedure applied to cross-company software effort estimation. Empirical Softw. Eng. 24, 1–52 (2019)
Najm, A., Marzak, A., Zakrani, A.: Systematic review study of decision trees based software development effort estimation. Organization 11(7) (2020)
Quesada-López, C., Murillo-Morera, J., Jenkins, M.: Un estudio comparativo de técnicas de minería de datos y aprendizaje máquina para la estimación del esfuerzo utilizando puntos de función. Revista Ibérica de Sistemas e Tecnologias de Informação (E17), 595–609 (2019)
Scott, A.J., Knott, M.: A cluster analysis method for grouping means in the analysis of variance. Biometrics 507–512 (1974)
Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)
Shepperd, M., MacDonell, S.: Evaluating prediction systems in software project estimation. Inf. Softw. Technol. 54(8), 820–827 (2012)
Song, L., Minku, L.L., Yao, X.: The impact of parameter tuning on software effort estimation using learning machines. In: Proceedings of the 9th International Conference on Predictive Models in Software Engineering, pp. 1–10 (2013)
Song, L., Minku, L.L., Yao, X.: The potential benefit of relevance vector machine to software effort estimation. In: Proceedings of the 10th International Conference on Predictive Models in Software Engineering, pp. 52–61 (2014)
Song, L., Minku, L.L., Yao, X.: Software effort interval prediction via Bayesian inference and synthetic bootstrap resampling. ACM Trans. Softw. Eng. Methodol. (TOSEM) 28(1), 1–46 (2019)
Tantithamthavorn, C., McIntosh, S., Hassan, A.E., Matsumoto, K.: The impact of automated parameter optimization on defect prediction models. IEEE Trans. Software Eng. 45(7), 683–711 (2018)
Villalobos-Arias, L., Quesada-López, C., Guevara-Coto, J., Martínez, A., Jenkins, M.: Evaluating hyper-parameter tuning using random search in support vector machines for software effort estimation. In: Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE 2020). ACM (2020)
Wen, J., Li, S., Lin, Z., Hu, Y., Huang, C.: Systematic literature review of machine learning based software development effort estimation models. Inf. Softw. Technol. 54(1), 41–59 (2012)
Xia, T., Krishna, R., Chen, J., Mathew, G., Shen, X., Menzies, T.: Hyperparameter optimization for effort estimation. arXiv preprint arXiv:1805.00336 (2018)
Yang, L., Shami, A.: On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415, 295–316 (2020)
Acknowledgments
This work was supported by project No. 834-B8-A27 at the University of Costa Rica (ECCI-CITIC).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Villalobos-Arias, L., Quesada-López, C., Martínez, A., Jenkins, M. (2021). Hyper-Parameter Tuning of Classification and Regression Trees for Software Effort Estimation. In: Rocha, Á., Adeli, H., Dzemyda, G., Moreira, F., Ramalho Correia, A.M. (eds) Trends and Applications in Information Systems and Technologies . WorldCIST 2021. Advances in Intelligent Systems and Computing, vol 1367. Springer, Cham. https://doi.org/10.1007/978-3-030-72660-7_56
Download citation
DOI: https://doi.org/10.1007/978-3-030-72660-7_56
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72659-1
Online ISBN: 978-3-030-72660-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)