research-article

Evaluating hyper-parameter tuning using random search in support vector machines for software effort estimation

Authors:
Leonardo Villalobos-Arias

University of Costa Rica, Costa Rica

University of Costa Rica, Costa Rica
View Profile

,
Christian Quesada-López

University of Costa Rica, Costa Rica

University of Costa Rica, Costa Rica
View Profile

,
Jose Guevara-Coto

University of Costa Rica, Costa Rica

University of Costa Rica, Costa Rica
View Profile

,
Alexandra Martínez

University of Costa Rica, Costa Rica

University of Costa Rica, Costa Rica
View Profile

,
Marcelo Jenkins

University of Costa Rica, Costa Rica

University of Costa Rica, Costa Rica
View Profile

PROMISE 2020: Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software EngineeringNovember 2020Pages 31–40https://doi.org/10.1145/3416508.3417121

Published:08 November 2020Publication History

PROMISE 2020: Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering

Pages 31–40

ABSTRACT

Studies in software effort estimation (SEE) have explored the use of hyper-parameter tuning for machine learning algorithms (MLA) to improve the accuracy of effort estimates. In other contexts random search (RS) has shown similar results to grid search, while being less computationally-expensive. In this paper, we investigate to what extent the random search hyper-parameter tuning approach affects the accuracy and stability of support vector regression (SVR) in SEE. Results were compared to those obtained from ridge regression models and grid search-tuned models. A case study with four data sets extracted from the ISBSG 2018 repository shows that random search exhibits similar performance to grid search, rendering it an attractive alternative technique for hyper-parameter tuning. RS-tuned SVR achieved an increase of 0.227 standardized accuracy (SA) with respect to default hyper-parameters. In addition, random search improved prediction stability of SVR models to a minimum ratio of 0.840. The analysis showed that RS-tuned SVR attained performance equivalent to GS-tuned SVR. Future work includes extending this research to cover other hyper-parameter tuning approaches and machine learning algorithms, as well as using additional data sets.

References

Amritanshu Agrawal, Wei Fu, Di Chen, Xipeng Shen, and Tim Menzies. 2019. How to" DODGE" Complex Software Analytics. IEEE Transactions on Software Engineering ( 2019 ).Google Scholar
Chris Albon. 2018. Machine learning with python cookbook: Practical solutions from preprocessing to deep learning. " O'Reilly Media, Inc.".Google Scholar
James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of machine learning research 13, Feb ( 2012 ), 281-305.Google ScholarDigital Library
James S Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for hyper-parameter optimization. In Advances in neural information processing systems. 2546-2554.Google Scholar
Michelle H Cartwright, Martin J Shepperd, and Qinbao Song. 2004. Dealing with missing software project data. In Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No. 03EX717). IEEE, 154-165.Google Scholar
Jacob Cohen. 1992. A power primer. Psychological bulletin 112, 1 ( 1992 ), 155.Google Scholar
Anna Corazza, Sergio Di Martino, Filomena Ferrucci, Carmine Gravino, Federica Sarro, and Emilia Mendes. 2010. How efective is tabu search to configure support vector regression for efort estimation?. In Proceedings of the 6th international conference on predictive models in software engineering. 1-10.Google ScholarDigital Library
Anna Corazza, Sergio Di Martino, Filomena Ferrucci, Carmine Gravino, Federica Sarro, and Emilia Mendes. 2013. Using tabu search to configure support vector regression for efort estimation. Empirical Software Engineering 18, 3 ( 2013 ), 506-546.Google Scholar
Karel Dejaeger, Wouter Verbeke, David Martens, and Bart Baesens. 2011. Data mining techniques for software efort estimation: a comparative study. IEEE transactions on software engineering 38, 2 ( 2011 ), 375-397.Google Scholar
Reiner Dumke and Alain Abran. 2016. COSMIC Function Points: Theory and Advanced Practices. CRC Press.Google Scholar
Egemen Ertuğrul, Zakir Baytar, Çağatay Çatal, and Ömer Can Muratli. 2019. Performance tuning for machine learning-based software development efort prediction models. Turkish Journal of Electrical Engineering & Computer Sciences 27, 2 ( 2019 ), 1308-1324.Google Scholar
S Fingerman. 2011. Practical software project estimation; a toolkit for estimating software development efort & duration. Sci-Tech News 65, 1 ( 2011 ), 28.Google Scholar
Wei Fu, Tim Menzies, and Xipeng Shen. 2016. Tuning for software analytics: Is it really necessary? Information and Software Technology 76 ( 2016 ), 135-146.Google Scholar
Fernando González-Ladrón-de Guevara, Marta Fernández-Diego, and Chris Lokan. 2016. The usage of ISBSG data fields in software efort estimation: A systematic mapping study. Journal of Systems and Software 113 ( 2016 ), 188-215.Google Scholar
Arthur E Hoerl and Robert W Kennard. 1970. Ridge regression: applications to nonorthogonal problems. Technometrics 12, 1 ( 1970 ), 69-82.Google Scholar
Arthur E Hoerl and Robert W Kennard. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 1 ( 1970 ), 55-67.Google Scholar
Mohamed Hosni, Ali Idri, Alain Abran, and Ali Bou Nassif. 2018. On the value of parameter tuning in heterogeneous ensembles efort estimation. Soft Computing 22, 18 ( 2018 ), 5977-6010.Google Scholar
Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin. 2003. A practical guide to support vector classification. https://www.csie.ntu.edu.tw/~cjlin/papers/guide/ guide.pdf. Accessed: 2020-07-07.Google Scholar
S Sathiya Keerthi. 2002. Eficient tuning of SVM hyperparameters using radius/margin bound and iterative algorithms. IEEE Transactions on Neural Networks 13, 5 ( 2002 ), 1225-1229.Google ScholarDigital Library
William B Langdon, Javier Dolado, Federica Sarro, and Mark Harman. 2016. Exact mean absolute error of baseline predictor, MARP0. Information and Software Technology 73 ( 2016 ), 16-18.Google Scholar
Gang Luo. 2016. A review of automatic selection methods for machine learning algorithms and hyper-parameter values. Network Modeling Analysis in Health Informatics and Bioinformatics 5, 1 ( 2016 ), 18.Google Scholar
Onkar Malgonde and Kaushal Chari. 2019. An ensemble-based model for predicting agile software development efort. Empirical Software Engineering 24, 2 ( 2019 ), 1017-1055.Google Scholar
Leandro L Minku. 2019. A novel online supervised hyperparameter tuning procedure applied to cross-company software efort estimation. Empirical Software Engineering ( 2019 ), 1-52.Google Scholar
Adriano LI Oliveira, Petronio L Braga, Ricardo MF Lima, and Márcio L Cornélio. 2010. GA-based method for feature selection and parameters optimization for machine learning regression applied to software efort estimation. information and Software Technology 52, 11 ( 2010 ), 1155-1166.Google Scholar
Robert Rosenthal, Harris Cooper, and L Hedges. 1994. Parametric measures of efect size. The handbook of research synthesis 621, 2 ( 1994 ), 231-244.Google Scholar
Bernhard Schlkopf, Alexander J Smola, and Francis Bach. 2018. Learning with kernels: support vector machines, regularization, optimization, and beyond. the MIT Press.Google Scholar
Andrew Jhon Scott and M Knott. 1974. A cluster analysis method for grouping means in the analysis of variance. Biometrics ( 1974 ), 507-512.Google Scholar
Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding machine learning: From theory to algorithms. Cambridge university press.Google ScholarDigital Library
Martin Shepperd and Steve MacDonell. 2012. Evaluating prediction systems in software project estimation. Information and Software Technology 54, 8 ( 2012 ), 820-827.Google Scholar
Liyan Song, Leandro L Minku, and Xin Yao. 2013. The impact of parameter tuning on software efort estimation using learning machines. In Proceedings of the 9th international conference on predictive models in software engineering. 1-10.Google ScholarDigital Library
Liyan Song, Leandro L Minku, and Xin Yao. 2014. The potential benefit of relevance vector machine to software efort estimation. In Proceedings of the 10th International Conference on Predictive Models in Software Engineering. 52-61.Google ScholarDigital Library
Liyan Song, Leandro L Minku, and Xin Yao. 2019. Software efort interval prediction via Bayesian inference and synthetic bootstrap resampling. ACM Transactions on Software Engineering and Methodology (TOSEM) 28, 1 ( 2019 ), 1-46.Google ScholarDigital Library
Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E Hassan, and Kenichi Matsumoto. 2016. Automated parameter optimization of classification techniques for defect prediction models. In Proceedings of the 38th International Conference on Software Engineering. 321-332.Google ScholarDigital Library
Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E Hassan, and Kenichi Matsumoto. 2018. The impact of automated parameter optimization on defect prediction models. IEEE Transactions on Software Engineering 45, 7 ( 2018 ), 683-711.Google Scholar
Chris Thornton, Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. 2013. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 847-855.Google ScholarDigital Library
Jianfeng Wen, Shixian Li, Zhiyong Lin, Yong Hu, and Changqin Huang. 2012. Systematic literature review of machine learning based software development efort estimation models. Information and Software Technology 54, 1 ( 2012 ), 41-59.Google Scholar
Tianpei Xia, Rahul Krishna, Jianfeng Chen, George Mathew, Xipeng Shen, and Tim Menzies. 2018. Hyperparameter optimization for efort estimation. arXiv preprint arXiv: 1805. 00336 ( 2018 ).Google Scholar
Alice Zheng. 2015. Evaluating machine learning models: a beginner's guide to key concepts and pitfalls. ( 2015 ).Google Scholar

Index Terms

Evaluating hyper-parameter tuning using random search in support vector machines for software effort estimation
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
2. Software and its engineering
  1. Software creation and management

Recommendations

Comparative study of random search hyper-parameter tuning for software effort estimation
PROMISE 2021: Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering

Empirical studies on software effort estimation have employed hyper-parameter tuning algorithms to improve model accuracy and stability. While these tuners can improve model performance, some might be overly complex or costly for the low dimensionality ...
Read More
Incremental training of support vector machines using hyperspheres

In the conventional incremental training of support vector machines, candidates for support vectors tend to be deleted if the separating hyperplane rotates as the training data are added. To solve this problem, in this paper, we propose an incremental ...
Read More
An overview on twin support vector machines

Twin support vector machines (TWSVM) is based on the idea of proximal SVM based on generalized eigenvalues (GEPSVM), which determines two nonparallel planes by solving two related SVM-type problems, so that its computing cost in the training phase is 1/...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PROMISE 2020: Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering
November 2020
80 pages
ISBN:9781450381277
DOI:10.1145/3416508
General Chair:
Leandro Minku,
Program Chairs:
Tim Menzies,
Mei Nagappan
Copyright © 2020 ACM
© 2020 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 November 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Software effort estimation
empirical study
grid search
hyper-parameter tuning
random search
support vector machines
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate64of125submissions,51%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 230
  Total Downloads
- Downloads (Last 12 months)90
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Evaluating hyper-parameter tuning using random search in support vector machines for software effort estimation

PROMISE 2020: Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Comparative study of random search hyper-parameter tuning for software effort estimation

Incremental training of support vector machines using hyperspheres

An overview on twin support vector machines