Skip to main content

Hyper-Parameter Tuning of Classification and Regression Trees for Software Effort Estimation

  • Conference paper
  • First Online:
Trends and Applications in Information Systems and Technologies (WorldCIST 2021)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1367))

Included in the following conference series:

Abstract

Classification and regression trees (CART) have been reported to be competitive machine learning algorithms for software effort estimation. In this work, we analyze the impact of hyper-parameter tuning on the accuracy and stability of CART using the grid search, random search, and DODGE approaches. We compared the results of CART with support vector regression (SVR) and ridge regression (RR) models. Results show that tuning improves the performance of CART models up to a maximum of 0.153 standardized accuracy and reduce its stability radio to a minimum of 0.819. Also, CART proved to be as competitive as SVR and outperformed RR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, A., Yang, X., Agrawal, R., Shen, X., Menzies, T.: Simpler hyperparameter optimization for software analytics: why, how, when? arXiv preprint arXiv:2008.07334 (2020)

  2. Albon, C.: Machine Learning with Python Cookbook: Practical Solutions from Preprocessing to Deep Learning. O’Reilly Media, Inc., Newton (2018)

    Google Scholar 

  3. Azzeh, M.: Software effort estimation based on optimized model tree. In: Proceedings of the 7th International Conference on Predictive Models in Software Engineering, pp. 1–8 (2011)

    Google Scholar 

  4. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(Feb), 281–305 (2012)

    Google Scholar 

  5. Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, pp. 2546–2554 (2011)

    Google Scholar 

  6. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press, Boca Raton (1984)

    MATH  Google Scholar 

  7. Corazza, A., Di Martino, S., Ferrucci, F., Gravino, C., Sarro, F., Mendes, E.: How effective is tabu search to configure support vector regression for effort estimation? In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering, pp. 1–10 (2010)

    Google Scholar 

  8. Dejaeger, K., Verbeke, W., Martens, D., Baesens, B.: Data mining techniques for software effort estimation: a comparative study. IEEE Trans. Software Eng. 38(2), 375–397 (2011)

    Article  Google Scholar 

  9. Ertuğrul, E., Baytar, Z., Çatal, Ç., Muratli, Ö.C.: Performance tuning for machine learning-based software development effort prediction models. Turkish J. Electr. Eng. Comput. Sci. 27(2), 1308–1324 (2019)

    Article  Google Scholar 

  10. Fu, W., Menzies, T., Shen, X.: Tuning for software analytics: is it really necessary? Inf. Softw. Technol. 76, 135–146 (2016)

    Article  Google Scholar 

  11. González-Ladrón-de Guevara, F., Fernández-Diego, M., Lokan, C.: The usage of ISBSG data fields in software effort estimation: a systematic mapping study. J. Syst. Softw. 113, 188–215 (2016)

    Article  Google Scholar 

  12. Huang, J., Li, Y.F., Xie, M.: An empirical analysis of data preprocessing for machine learning-based software cost estimation. Inf. Softw. Technol. 67, 108–127 (2015)

    Article  Google Scholar 

  13. Langdon, W.B., Dolado, J., Sarro, F., Harman, M.: Exact mean absolute error of baseline predictor, marp0. Inf. Softw. Technol. 73, 16–18 (2016)

    Article  Google Scholar 

  14. Malgonde, O., Chari, K.: An ensemble-based model for predicting agile software development effort. Empir. Softw. Eng. 24(2), 1017–1055 (2019)

    Article  Google Scholar 

  15. Minku, L.L.: A novel online supervised hyperparameter tuning procedure applied to cross-company software effort estimation. Empirical Softw. Eng. 24, 1–52 (2019)

    Article  Google Scholar 

  16. Najm, A., Marzak, A., Zakrani, A.: Systematic review study of decision trees based software development effort estimation. Organization 11(7) (2020)

    Google Scholar 

  17. Quesada-López, C., Murillo-Morera, J., Jenkins, M.: Un estudio comparativo de técnicas de minería de datos y aprendizaje máquina para la estimación del esfuerzo utilizando puntos de función. Revista Ibérica de Sistemas e Tecnologias de Informação (E17), 595–609 (2019)

    Google Scholar 

  18. Scott, A.J., Knott, M.: A cluster analysis method for grouping means in the analysis of variance. Biometrics 507–512 (1974)

    Google Scholar 

  19. Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, Cambridge (2014)

    Book  Google Scholar 

  20. Shepperd, M., MacDonell, S.: Evaluating prediction systems in software project estimation. Inf. Softw. Technol. 54(8), 820–827 (2012)

    Article  Google Scholar 

  21. Song, L., Minku, L.L., Yao, X.: The impact of parameter tuning on software effort estimation using learning machines. In: Proceedings of the 9th International Conference on Predictive Models in Software Engineering, pp. 1–10 (2013)

    Google Scholar 

  22. Song, L., Minku, L.L., Yao, X.: The potential benefit of relevance vector machine to software effort estimation. In: Proceedings of the 10th International Conference on Predictive Models in Software Engineering, pp. 52–61 (2014)

    Google Scholar 

  23. Song, L., Minku, L.L., Yao, X.: Software effort interval prediction via Bayesian inference and synthetic bootstrap resampling. ACM Trans. Softw. Eng. Methodol. (TOSEM) 28(1), 1–46 (2019)

    Article  Google Scholar 

  24. Tantithamthavorn, C., McIntosh, S., Hassan, A.E., Matsumoto, K.: The impact of automated parameter optimization on defect prediction models. IEEE Trans. Software Eng. 45(7), 683–711 (2018)

    Article  Google Scholar 

  25. Villalobos-Arias, L., Quesada-López, C., Guevara-Coto, J., Martínez, A., Jenkins, M.: Evaluating hyper-parameter tuning using random search in support vector machines for software effort estimation. In: Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE 2020). ACM (2020)

    Google Scholar 

  26. Wen, J., Li, S., Lin, Z., Hu, Y., Huang, C.: Systematic literature review of machine learning based software development effort estimation models. Inf. Softw. Technol. 54(1), 41–59 (2012)

    Article  Google Scholar 

  27. Xia, T., Krishna, R., Chen, J., Mathew, G., Shen, X., Menzies, T.: Hyperparameter optimization for effort estimation. arXiv preprint arXiv:1805.00336 (2018)

  28. Yang, L., Shami, A.: On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415, 295–316 (2020)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by project No. 834-B8-A27 at the University of Costa Rica (ECCI-CITIC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leonardo Villalobos-Arias .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Villalobos-Arias, L., Quesada-López, C., Martínez, A., Jenkins, M. (2021). Hyper-Parameter Tuning of Classification and Regression Trees for Software Effort Estimation. In: Rocha, Á., Adeli, H., Dzemyda, G., Moreira, F., Ramalho Correia, A.M. (eds) Trends and Applications in Information Systems and Technologies . WorldCIST 2021. Advances in Intelligent Systems and Computing, vol 1367. Springer, Cham. https://doi.org/10.1007/978-3-030-72660-7_56

Download citation

Publish with us

Policies and ethics