Automatic design of hyper-heuristic based on reinforcement learning
Introduction
An optimization model involves finding the feasible solutions from a finite set of solutions that exist in the search space, and then identifying only the optimal solution. There are some exact optimization models that guarantee global optimality. However, this guarantee is limited to small problems, and for complex problems, the exact optimization model may take a long time to reach optimality [64]. In this case, the use of heuristics to produce a solution that is good enough for solving the problem in a reasonable timeframe is a viable option. A heuristic involves applying a practical methodology that is not guaranteed to converge to a global optimum, but a sufficiently good solution. In view of the availability of a large variety of problem-specific heuristics, a key question concerning the selection of a particular heuristic has been posed in the literature in recent years. This leads to the studies on the hyper-heuristic methodology.
A hyper-heuristic is a high-level automated methodology for selecting or generating a set of heuristics [7]. The term “hyper-heuristic” was coined by Denzinger et al. [16]. Fig. 1 shows a classification of hyper-heuristic approaches. There are two main hyper-heuristic categories, i.e. selection hyper-heuristic and generation hyper-heuristic [8]. These two categories can be defined as ‘heuristics to select heuristics’ and ‘heuristics to generate heuristics’ respectively [7]. The heuristics to be selected or generated in a hyper-heuristic model are known as the low-level heuristics (LLHs). Both selection and generation hyper-heuristics can be further divided into two categories based on the nature of the LLHs [8], namely either constructive or perturbative hyper-heuristics. A constructive hyper-heuristic incrementally builds a complete solution from scratch. On the other hand, a perturbative hyper-heuristic iteratively improves an existing solution by performing its perturbative mechanisms. According to Ochoa et al. [50], perturbative LLHs can be further categorised into ruin-recreate heuristics, mutation heuristics, crossover heuristics, and hill-climbing heuristics. All these LLHs are problem specific, e.g. for the Traveling Salesman Problem, the perturbative LLHs include swap mutation, insert mutation, order crossover, partially mapped crossover, 2-opt local search, 3-opt local search.
A traditional selection hyper-heuristic model consists of two levels, as shown in Fig. 2 [7]. The low level contains a representation of the problem, evaluation function(s), and a set of problem specific LLHs. The high level manages which LLH to use for producing a new solution(s), and then decides whether to accept the solution(s). Therefore, the high-level heuristic performs two separate tasks i.e. (i) LLH selection and (ii) move acceptance [51]. The LLH selection method is a strategy to select an appropriate perturbative LLH to modify the current solution and the move acceptance method decides whether to accept the newly generated solution. Both LLH selection and move acceptance methods have a dramatic impact on the performance of the hyper-heuristic model. Their effects are problem-dependent as different problem domains, or even different instances in a single domain, have different fitness landscape properties [56]. Choosing suitable LLH selection and move acceptance methods for a particular problem is a non-trivial task during the process of designing a robust hyper-heuristic model.
There are many different pairwise combinations of LLH selection and move acceptance methods in the literature [7]. However, in most of the existing hyper-heuristic models, the high-level heuristic is designed manually [56]. In such manual design, the concern is to determine which combination of the LLH selection and move acceptance methods is the best for a particular problem. One of the most straightforward methods is trial-and-error to find the best combination for each problem. However, this is extremely time consuming as the possible number of combinations of these methods is large. In addition, the optimal configuration of these methods varies during different stages of the optimization process. As a result, an effective way that can automatically design a hyper-heuristic model is necessary.
This research focuses on automating the design process of hyper-heuristic models. Specifically, Reinforcement Learning (RL) [62] is investigated to intelligently select the appropriate LLH selection and move acceptance methods for different stages of the optimization process. RL is a learning algorithm that selects a suitable action based on experience by interacting with the environment. The learner receives a reward or penalty after performing each selected action. As such, it learns which action to perform by evaluating the action choices through cumulative rewards. In the context of automatically designing a hyper-heuristic model, the combinations of the high-level heuristic components (i.e. LLH selection and move acceptance methods) denote the action choices. The RL algorithm rewards or penalizes each combination of the components based on its performance during the optimization process. There are a few RL algorithms that have been extensively applied as feedback mechanisms to tackle decision making problems e.g. Dynamic Programming (DP) [15], SARSA [63], and Q-learning [67].
Q-learning is proposed to automatically design the hyper-heuristic model during different stages of the optimization process in this article. The resulting Q-learning based hyper-heuristic model is denoted as QHH. In QHH, different criteria in selecting the LLH and accepting a proposed solution are used at different intervals. As an example, a criterion at time t may lead to selecting an LLH that has a fast speed in creating a new solution, while at time t’, a criterion may lead to choosing an LLH that results in a better performance. To demonstrate the effectiveness of QHH, benchmark instances from six problem domains in the Hyper-heuristic Flexible (HyFlex) framework [50] is used, and the results indicate the competitiveness of QHH.
The remaining of this article contains a description of related work in Section 2, which covers a review on hyper-heuristics, reinforcement learning for hyper-heuristics, and automatic design of hyper-heuristics. Q-learning and its applications are discussed in Section 3. Section 4 presents the proposed QHH model. The results and findings including performance comparison are described in Section 5. Finally, concluding remarks are presented in Section 6.
Section snippets
Related work
In this section, a review on hyper-heuristics is presented in Section 2.1. Section 2.2 highlights the applications of RL to hyper-heuristics, while Section 2.3 describes a number of methods to automatically design hyper-heuristics.
Q-learning
Q-learning is one of the commonly used RL algorithms. In Q-learning, each state-action pair is given a total cumulative reward value (denoted as a Q-value), which is measured by a Q-function (Eq. (1)). Suppose S = [s1, s2, s3,…,sn] denotes a set of possible states, A = [a1, a2, a3,…,an] denotes a set of selectable actions, rt+1 denotes the immediate reinforcement signal, α∈[0,1] denotes the learning rate, γ∈[0,1] denotes the discount factor, and Q(st, at) denotes the Q-value at time t, which is
The proposed method
In this section, the proposed QHH model, i.e., a Q-learning-based method to automatically design hyper-heuristics, is presented. To design a hyper-heuristic, a high-level heuristic which contains an LLH selection–move acceptance pair needs to be first determined. By adopting Q-learning, QHH is able to intelligently select appropriate LLH selection–move acceptance pairs during different stages of the optimization process.
Before going into the details of QHH, Table 3 shows the notations used in
Experimental studies
This section presents an empirical evaluation of the proposed QHH model. All experiments have been conducted using a computer with multiple Intel i7-3930K 3.20 GHz processors, and with 15.6GB of memory. At any particular time, each test of the experiment has been handled by one processor only. QHH is implemented through HyFlex [50] with NetBeans IDE 5.5 as the development tool. There are three QHH parameters, i.e. NoE, γ, and α. Note that α is dynamically controlled (see Section 3.3) while NoE
Conclusions
A hyper-heuristic is an automated methodology for selecting or generating heuristics. The design of the high-level heuristic of a hyper-heuristic model is crucial, and is problem-dependent owing to the relevant landscape structures of different problems. This paper presents a method to automatically design a hyper-heuristic model by intelligently selecting appropriate combinations of the high-level heuristic components (i.e. LLH selection-move acceptance pairs) during different stages of the
Acknowledgement
The authors gratefully acknowledge the support of the Research University Grant (Grant No: 1001/PKOMP/814274) of Universiti Sains Malaysia for this research. Also, the first author acknowledges the Ministry of Higher Education of Malaysia for the MyPhD scholarship to study for the Ph.D. degree at the Universiti Sains Malaysia (USM).
References (71)
- et al.
Optimizing properties of nanoclay–nitrile rubber (NBR) composites using face centred central composite design
Mater. Des.
(2012) - et al.
A comparative analysis of selection schemes used in genetic algorithms
- et al.
Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics
Automatica
(2014) - et al.
A new tabu search-based hyper-heuristic algorithm for solving construction leveling problems with limited resource availabilities
Autom. Constr.
(2013) - et al.
Application of face centred central composite design to optimise compression force and tablet diameter for the formulation of mechanically strong and fast disintegrating orodispersible tablets
Int. J. Pharm.
(2012) - et al.
Population based monte carlo tree search hyper-heuristic for combinatorial optimization problems
Inf. Sci.
(2015) - et al.
A new reinforcement learning-based memetic particle swarm optimizer
Appl. Soft Comput.
(2016) Non-zero sum nash Q-learning for unknown deterministic continuous-time linear systems
Automatica
(2015)- et al.
Reinforcement learning algorithms with function approximation: Recent advances and applications
Inf. Sci.
(2014) - et al.
A tabu search hyper-heuristic strategy for t-way test suite generation
Appl. Soft Comput.
(2016)
Designing reusable metaheuristic methods: a semi-automated approach
Fair-share ILS: a simple state-of-the-art iterated local search hyperheuristic
Machine learning for improving heuristic optimisation
A learning algorithm for Markov decision processes with adaptive state aggregation
Iterated local search vs hyper-heuristics: Towards general-purpose search algorithms
Hyper-heuristics: A survey of the state of the art
J. Oper. Res. Soc.
A classification of hyper-heuristic approaches
Handbook of Metaheuristics
Hyperheuristics: Recent developments
Adaptive and Multilevel Metaheuristics
An ant colony optimization-based hyper-heuristic with genetic programming approach for a hybrid flow shop scheduling problem
An artificial bee colony algorithm with a modified choice function for the traveling salesman problem
A hyperheuristic approach to scheduling a sales summit
Practice and Theory of Automated Timetabling III
Two frameworks for cross-domain heuristic and parameter selection using harmony search
Harmony Search Algorithm
Dynamic Programming: Models and Applications
High Performance ATP Systems By Combining Several AI Methods
Evaluation of a family of reinforcement learning cross-domain optimization heuristics
Learning and Intelligent Optimization
A modified choice function Hyper-heuristic controlling unary and binary operators
An improved choice function heuristic selection for cross domain heuristic search
Reinforcement learning for online control of evolutionary algorithms
A cross-domain multi-armed bandit hyper-heuristic
A multi-armed bandit hyper-heuristic
Deep Recurrent Q-learning for partially observable MDPs
A variable neighborhood search-based hyperheuristic for cross-domain optimization problems in CHeSC 2011 competition
Novel feature selection and kernel-based value approximation method for reinforcement learning
Cited by (87)
Reinforcement learning-assisted evolutionary algorithm: A survey and research opportunities
2024, Swarm and Evolutionary ComputationCollaborative Q-learning hyper-heuristic evolutionary algorithm for the production and transportation integrated scheduling of silicon electrodes
2024, Swarm and Evolutionary ComputationHyper-heuristics: A survey and taxonomy
2024, Computers and Industrial EngineeringMulti-armed bandit-based hyper-heuristics for combinatorial optimization problems
2024, European Journal of Operational ResearchA Q-learning-based hyper-heuristic evolutionary algorithm for the distributed flexible job-shop scheduling problem with crane transportation
2023, Expert Systems with ApplicationsA deep reinforcement learning hyper-heuristic with feature fusion for online packing problems
2023, Expert Systems with Applications