Hyper-parameter optimization in classification: To-do or not-to-do

doi:10.1016/j.patcog.2020.107245

Pattern Recognition

Volume 103, July 2020, 107245

https://doi.org/10.1016/j.patcog.2020.107245 Get rights and content

Highlights

•
We found hyper-param tuning is not well justified in many cases but still very useful in a few.
•
We propose a framework to address the problem of deciding to-tune or not-to-tune.
•
We implemented a prototype of the framework with 486 datasets and 4 algorithm.
•
The results indicates our framework is effective at avoiding effects of ineffective tuning.
•
Our framework enables a life-long learning approach to the problem.

Abstract

Hyper-parameter optimization is a process to find suitable hyper-parameters for predictive models. It typically incurs highly demanding computational costs due to the need of the time-consuming model training process to determine the effectiveness of each set of candidate hyper-parameter values. A priori, there is no guarantee that hyper-parameter optimization leads to improved performance. In this work, we propose a framework to address the problem of whether one should apply hyper-parameter optimization or use the default hyper-parameter settings for traditional classification algorithms. We implemented a prototype of the framework, which we use a basis for a three-fold evaluation with 486 datasets and 4 algorithms. The results indicate that our framework is effective at supporting modeling tasks in avoiding adverse effects of using ineffective optimizations. The results also demonstrate that incrementally adding training datasets improves the predictive performance of framework instantiations and hence enables “life-long learning.”

Section snippets

Background

In this section, we will present some background information on meta-learning, which our proposed approach bases on. Besides, we will briefly introduce a particular hyper-parameter tuning method that we used in our prototypical implementation of the framework, namely Bayesian optimization.

Related work

Given a training dataset, building machine learning models typically involves two steps: selecting a learning algorithm and then optimizing hyper-parameters to maximize model performance. Two related steps are often referred as algorithm selection and hyper-parameter optimization problems. The studied problem in this paper, deciding to-tune or not-to-tune, fits in between those two steps as depicted in Fig. 1. The literature review for this study thus consists of an overview of selection

The proposed framework

In Fig. 2, we illustrate the conceptual model of the proposed framework. The framework consists of two distinguished phases: the meta-model training phase and the application phase. In the training phase, the aim is to induce meta-models that are capable of deriving utilizable information from meta-knowledge to make a recommendation in the application phase, that is, to give a user feedback on the anticipated effect of hyper-parameter optimization for a given learning algorithm $A_{k}$ . The

A prototypical implementation of the proposed framework

In order to validate the proposed framework and gain further insights into its applicability, we implemented a prototype instantiation. In Fig. 2, we annotate 6 items to be specified for the implementation. They are:

1.
Item 1: A target algorithm to be decided to-tune or not-to-tune.
2.
Item 2: A collection of training datasets for the target algorithm.
3.
Item 3: A performance measurement strategy to train and evaluate tune and default models.
4.
Item 4: An approach to characterize training datasets.
5.
Item 5: A

Prototype evaluation

In this section, we evaluate all meta-models using three evaluations. In the first evaluation, the aim is to estimate the predictive performance of the meta-models. In the second evaluation, our goal is to shed some light on the impact of the proposed framework on base modeling tasks in terms of how our approach would help in avoiding unjustified optimizations. Finally, in the third evaluation, we investigate the possibility of using our approach as an ongoing learning solution.

Discussion

Since real-world applications are generally time-sensitive, deciding whether to tune or not is still critical. As shown in the study, typical random guessing methods, although are common in the community, are unreliable for learning algorithms. Using them to decide tuning or not could eventually lead to either wasting time on unexpected optimization or missing potential improvement in model performance. In contrast, the evaluations show that our approach is far more reliable and effective. By

Conclusions and future work

In this work, we have proposed a framework for predicting the effectiveness of hyper-parameter optimization for traditional classification algorithms. We have also illustrated the framework with a prototype of 4 different learning algorithms and 486 datasets. Our empirical evaluation results indicate that the framework technique can be used to systematically and incrementally determine the problem “to-tune-or-not-to-tune.”

In future work, we plan to implement our approach in form of a supporting

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ngoc Tran received the B.E. degree from Vietnam National University, Ho Chi Minh, Vietnam, in 2006 and the M.S. degree from Vrije Universiteit Brussel, Brussels, Belgium in 2015. He is currently a Ph.D. student at Swinburne University of Technology, Melbourne, Australia. His current research interests mainly focus on machine learning and domain-specific visual language.

References (32)

S. Ali et al.
On learning algorithm selection for classification
Appl. Soft Comput.
(2006)
J. Bergstra et al.
Random search for hyper-parameter optimization
J. Mach. Learn. Res.
(2012)
J. Kennedy et al.
Particle swarm optimization
Proceedings of the 1995 International Conference on Neural Networks (Perth, Australia)
(1995)
B. Shahriari et al.
Taking the human out of the loop: a review of Bayesian optimization
Proc. IEEE
(2016)
C. Thornton et al.
Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms
Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Chicago, IL, USA)
(2013)
P. Ridd et al.
Using metalearning to predict when parameter optimization is likely to improve classification accuracy
Proceedings of the 2014 International Conference on Meta-learning and Algorithm Selection (Prague, Czech Republic)
(2014)
R. Engels et al.
Using a data metric for preprocessing advice for data mining applications
Proceedings of the 13rd European Conference on Artificial Intelligence (Brighton, UK)
(1998)
J. Gama et al.
Characterization of classification algorithms
Proceedings of the 7th Portuguese Conference on Artificial Intelligence (Madeira Island, Portugal)
(1995)
B. Pfahringer et al.
Meta-learning by landmarking various learning algorithms
Proceedings of the 17th International Conference on Machine Learning (Haifa, Israel)
(2000)
J. Mockus
Application of Bayesian approach to numerical methods of global and stochastic optimization
J. Glob. Optim.
(1994)

J. Petrak

Fast subsampling performance estimates for classification algorithm selection

Proceedings of the ECML-00 Workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination (Barcelona, Spain)

(2000)

R. Leite et al.

Active testing strategy to predict the best classification algorithm via sampling and metalearning

Proceedings of the 19th European Conference on Artificial Intelligence (Lisbon, Portugal)

(2010)

J.N. van Rijn et al.

Fast algorithm selection using learning curves

Proceedings of the 14th International Symposium on Intelligent Data Analysis (Saint-Etienne, France)

(2015)

J. Bergstra et al.

Algorithms for hyper-parameter optimization

Proceedings of the 24th International Conference on Neural Information Processing Systems (Granada, Spain)

(2011)

R. Bardenet et al.

Collaborative hyperparameter tuning

Proceedings of the 30th International Conference on International Conference on Machine Learning (Atlanta, GA, USA)

(2013)

K. Swersky et al.

Multi-task Bayesian optimization

Proceedings of the 26th International Conference on Neural Information Processing Systems (Lake Tahoe, Nevada)

(2013)

Cited by (0)

Jean-Guy Schneider is a professor in Software Engineering and Academic Director, Industry Capstone in the School of Information Technology at Deakin University, Burwood Campus, Australia. He is also an Adjunct Professor at Swinburne University of Technology. His research covers a variety of areas in both Computer Science and Software Engineering and he has published app. 100 scientific articles. Jean-Guy is a member of IEEE and ACM.

Ingo Weber is a Full Professor and Head of Chair for Software and Business Engineering at TU Berlin, Germany. In addition he is a Conjoint Associate Professor at the University of New South Wales (UNSW) and an Adjunct Associate Professor at Swinburne University of Technology. Ingo has published over 100 refereed papers and three books, including “DevOps: A Software Architect’s Perspective”, Addisson-Wesley, 2015, and “Architecture for Blockchain Applications”, Springer, 2019. Ingo has served as a reviewer for many prestigious journals, including various IEEE and ACM Transactions, and as PC member for WWW, BPM (also as PC co-chair), ICSOC, AAAI, ICAPS, IJCAI, and many other conferences and workshops. Prior to TU Berlin, Ingo worked at Data61, CSIRO (formerly NICTA), at UNSW in Sydney, Australia, and at SAP Research in Germany.

A. K. Qin is an associate professor in the Department of Computer Science and Software Engineering at Swinburne University of Technology, Hawthorn Campus, Australia and leads Swinburne’s Intelligent Data Analytics Lab and Machine Learning and Intelligent Optimization (MLIO) Research Group. His major research interests include evolutionary computation, machine learning, computer vision, GPU computing, services computing and mobile computing. He is now the vice-chair of the IEEE Neural Networks Technical Committee, and co-chairing the IEEE Emergent Technologies Task Forces on “Collaborative Learning and Optimization” and “Multitask Learning and Multitask Optimization”.

¹: Majority of the work done while this author was with Data61, CSIRO.

²: Majority of the work was done while this author was at Swinburne University of Technology.

View full text

Hyper-parameter optimization in classification: To-do or not-to-do

Highlights

Abstract

Section snippets

Background

Related work

The proposed framework

A prototypical implementation of the proposed framework

Prototype evaluation

Discussion

Conclusions and future work

Declaration of Competing Interest

Appl. Soft Comput.

Random search for hyper-parameter optimization

J. Mach. Learn. Res.

Particle swarm optimization

Proceedings of the 1995 International Conference on Neural Networks (Perth, Australia)

Taking the human out of the loop: a review of Bayesian optimization

Proc. IEEE

Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms

Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Chicago, IL, USA)

Using metalearning to predict when parameter optimization is likely to improve classification accuracy

Proceedings of the 2014 International Conference on Meta-learning and Algorithm Selection (Prague, Czech Republic)

Using a data metric for preprocessing advice for data mining applications

Proceedings of the 13rd European Conference on Artificial Intelligence (Brighton, UK)

Characterization of classification algorithms

Proceedings of the 7th Portuguese Conference on Artificial Intelligence (Madeira Island, Portugal)

Meta-learning by landmarking various learning algorithms

Proceedings of the 17th International Conference on Machine Learning (Haifa, Israel)

Application of Bayesian approach to numerical methods of global and stochastic optimization

J. Glob. Optim.