Elsevier

Information Sciences

Volumes 436–437, April 2018, Pages 89-107
Information Sciences

Automatic design of hyper-heuristic based on reinforcement learning

https://doi.org/10.1016/j.ins.2018.01.005Get rights and content

Abstract

Hyper-heuristic is a class of methodologies which automates the process of selecting or generating a set of heuristics to solve various optimization problems. A traditional hyper-heuristic model achieves this through a high-level heuristic that consists of two key components, namely a heuristic selection method and a move acceptance method. The effectiveness of the high-level heuristic is highly problem dependent due to the landscape properties of different problems. Most of the current hyper-heuristic models formulate a high-level heuristic by matching different combinations of components manually. This article proposes a method to automatically design the high-level heuristic of a hyper-heuristic model by utilizing a reinforcement learning technique. More specifically, Q-learning is applied to guide the hyper-heuristic model in selecting the proper components during different stages of the optimization process. The proposed method is evaluated comprehensively using benchmark instances from six problem domains in the Hyper-heuristic Flexible Framework. The experimental results show that the proposed method is comparable with most of the top-performing hyper-heuristic models in the current literature.

Introduction

An optimization model involves finding the feasible solutions from a finite set of solutions that exist in the search space, and then identifying only the optimal solution. There are some exact optimization models that guarantee global optimality. However, this guarantee is limited to small problems, and for complex problems, the exact optimization model may take a long time to reach optimality [64]. In this case, the use of heuristics to produce a solution that is good enough for solving the problem in a reasonable timeframe is a viable option. A heuristic involves applying a practical methodology that is not guaranteed to converge to a global optimum, but a sufficiently good solution. In view of the availability of a large variety of problem-specific heuristics, a key question concerning the selection of a particular heuristic has been posed in the literature in recent years. This leads to the studies on the hyper-heuristic methodology.

A hyper-heuristic is a high-level automated methodology for selecting or generating a set of heuristics [7]. The term “hyper-heuristic” was coined by Denzinger et al. [16]. Fig. 1 shows a classification of hyper-heuristic approaches. There are two main hyper-heuristic categories, i.e. selection hyper-heuristic and generation hyper-heuristic [8]. These two categories can be defined as ‘heuristics to select heuristics’ and ‘heuristics to generate heuristics’ respectively [7]. The heuristics to be selected or generated in a hyper-heuristic model are known as the low-level heuristics (LLHs). Both selection and generation hyper-heuristics can be further divided into two categories based on the nature of the LLHs [8], namely either constructive or perturbative hyper-heuristics. A constructive hyper-heuristic incrementally builds a complete solution from scratch. On the other hand, a perturbative hyper-heuristic iteratively improves an existing solution by performing its perturbative mechanisms. According to Ochoa et al. [50], perturbative LLHs can be further categorised into ruin-recreate heuristics, mutation heuristics, crossover heuristics, and hill-climbing heuristics. All these LLHs are problem specific, e.g. for the Traveling Salesman Problem, the perturbative LLHs include swap mutation, insert mutation, order crossover, partially mapped crossover, 2-opt local search, 3-opt local search.

A traditional selection hyper-heuristic model consists of two levels, as shown in Fig. 2 [7]. The low level contains a representation of the problem, evaluation function(s), and a set of problem specific LLHs. The high level manages which LLH to use for producing a new solution(s), and then decides whether to accept the solution(s). Therefore, the high-level heuristic performs two separate tasks i.e. (i) LLH selection and (ii) move acceptance [51]. The LLH selection method is a strategy to select an appropriate perturbative LLH to modify the current solution and the move acceptance method decides whether to accept the newly generated solution. Both LLH selection and move acceptance methods have a dramatic impact on the performance of the hyper-heuristic model. Their effects are problem-dependent as different problem domains, or even different instances in a single domain, have different fitness landscape properties [56]. Choosing suitable LLH selection and move acceptance methods for a particular problem is a non-trivial task during the process of designing a robust hyper-heuristic model.

There are many different pairwise combinations of LLH selection and move acceptance methods in the literature [7]. However, in most of the existing hyper-heuristic models, the high-level heuristic is designed manually [56]. In such manual design, the concern is to determine which combination of the LLH selection and move acceptance methods is the best for a particular problem. One of the most straightforward methods is trial-and-error to find the best combination for each problem. However, this is extremely time consuming as the possible number of combinations of these methods is large. In addition, the optimal configuration of these methods varies during different stages of the optimization process. As a result, an effective way that can automatically design a hyper-heuristic model is necessary.

This research focuses on automating the design process of hyper-heuristic models. Specifically, Reinforcement Learning (RL) [62] is investigated to intelligently select the appropriate LLH selection and move acceptance methods for different stages of the optimization process. RL is a learning algorithm that selects a suitable action based on experience by interacting with the environment. The learner receives a reward or penalty after performing each selected action. As such, it learns which action to perform by evaluating the action choices through cumulative rewards. In the context of automatically designing a hyper-heuristic model, the combinations of the high-level heuristic components (i.e. LLH selection and move acceptance methods) denote the action choices. The RL algorithm rewards or penalizes each combination of the components based on its performance during the optimization process. There are a few RL algorithms that have been extensively applied as feedback mechanisms to tackle decision making problems e.g. Dynamic Programming (DP) [15], SARSA [63], and Q-learning [67].

Q-learning is proposed to automatically design the hyper-heuristic model during different stages of the optimization process in this article. The resulting Q-learning based hyper-heuristic model is denoted as QHH. In QHH, different criteria in selecting the LLH and accepting a proposed solution are used at different intervals. As an example, a criterion at time t may lead to selecting an LLH that has a fast speed in creating a new solution, while at time t’, a criterion may lead to choosing an LLH that results in a better performance. To demonstrate the effectiveness of QHH, benchmark instances from six problem domains in the Hyper-heuristic Flexible (HyFlex) framework [50] is used, and the results indicate the competitiveness of QHH.

The remaining of this article contains a description of related work in Section 2, which covers a review on hyper-heuristics, reinforcement learning for hyper-heuristics, and automatic design of hyper-heuristics. Q-learning and its applications are discussed in Section 3. Section 4 presents the proposed QHH model. The results and findings including performance comparison are described in Section 5. Finally, concluding remarks are presented in Section 6.

Section snippets

Related work

In this section, a review on hyper-heuristics is presented in Section 2.1. Section 2.2 highlights the applications of RL to hyper-heuristics, while Section 2.3 describes a number of methods to automatically design hyper-heuristics.

Q-learning

Q-learning is one of the commonly used RL algorithms. In Q-learning, each state-action pair is given a total cumulative reward value (denoted as a Q-value), which is measured by a Q-function (Eq. (1)). Suppose S = [s1, s2, s3,…,sn] denotes a set of possible states, A = [a1, a2, a3,…,an] denotes a set of selectable actions, rt+1 denotes the immediate reinforcement signal, α∈[0,1] denotes the learning rate, γ∈[0,1] denotes the discount factor, and Q(st, at) denotes the Q-value at time t, which is

The proposed method

In this section, the proposed QHH model, i.e., a Q-learning-based method to automatically design hyper-heuristics, is presented. To design a hyper-heuristic, a high-level heuristic which contains an LLH selection–move acceptance pair needs to be first determined. By adopting Q-learning, QHH is able to intelligently select appropriate LLH selection–move acceptance pairs during different stages of the optimization process.

Before going into the details of QHH, Table 3 shows the notations used in

Experimental studies

This section presents an empirical evaluation of the proposed QHH model. All experiments have been conducted using a computer with multiple Intel i7-3930K 3.20 GHz processors, and with 15.6GB of memory. At any particular time, each test of the experiment has been handled by one processor only. QHH is implemented through HyFlex [50] with NetBeans IDE 5.5 as the development tool. There are three QHH parameters, i.e. NoE, γ, and α. Note that α is dynamically controlled (see Section 3.3) while NoE

Conclusions

A hyper-heuristic is an automated methodology for selecting or generating heuristics. The design of the high-level heuristic of a hyper-heuristic model is crucial, and is problem-dependent owing to the relevant landscape structures of different problems. This paper presents a method to automatically design a hyper-heuristic model by intelligently selecting appropriate combinations of the high-level heuristic components (i.e. LLH selection-move acceptance pairs) during different stages of the

Acknowledgement

The authors gratefully acknowledge the support of the Research University Grant (Grant No: 1001/PKOMP/814274) of Universiti Sains Malaysia for this research. Also, the first author acknowledges the Ministry of Higher Education of Malaysia for the MyPhD scholarship to study for the Ph.D. degree at the Universiti Sains Malaysia (USM).

References (71)

  • S. Adriaensen et al.

    Designing reusable metaheuristic methods: a semi-automated approach

  • S. Adriaensen et al.

    Fair-share ILS: a simple state-of-the-art iterated local search hyperheuristic

  • S. Asta

    Machine learning for improving heuristic optimisation

    (2015)
  • J. Baras et al.

    A learning algorithm for Markov decision processes with adaptive state aggregation

  • E.K. Burke et al.

    Iterated local search vs hyper-heuristics: Towards general-purpose search algorithms

  • E.K. Burke et al.

    Hyper-heuristics: A survey of the state of the art

    J. Oper. Res. Soc.

    (2013)
  • E.K. Burke et al.

    A classification of hyper-heuristic approaches

    Handbook of Metaheuristics

    (2010)
  • L. Busoniu et al.
    (2010)
  • K. Chakhlevitch et al.

    Hyperheuristics: Recent developments

    Adaptive and Multilevel Metaheuristics

    (2008)
  • ChenL. et al.

    An ant colony optimization-based hyper-heuristic with genetic programming approach for a hybrid flow shop scheduling problem

  • S.S. Choong et al.

    An artificial bee colony algorithm with a modified choice function for the traveling salesman problem

  • P. Cowling et al.

    A hyperheuristic approach to scheduling a sales summit

    Practice and Theory of Automated Timetabling III

    (2000)
  • P. Dempster et al.

    Two frameworks for cross-domain heuristic and parameter selection using harmony search

    Harmony Search Algorithm

    (2016)
  • E.V. Denardo

    Dynamic Programming: Models and Applications

    (2012)
  • J. Denzinger et al.

    High Performance ATP Systems By Combining Several AI Methods

    (1996)
  • L. Di Gaspero et al.

    Evaluation of a family of reinforcement learning cross-domain optimization heuristics

    Learning and Intelligent Optimization

    (2012)
  • J.H. Drake et al.

    A modified choice function Hyper-heuristic controlling unary and binary operators

  • J.H. Drake et al.

    An improved choice function heuristic selection for cross domain heuristic search

  • A.E. Eiben et al.

    Reinforcement learning for online control of evolutionary algorithms

  • A.S. Ferreira

    A cross-domain multi-armed bandit hyper-heuristic

    (2016)
  • A.S. Ferreira et al.

    A multi-armed bandit hyper-heuristic

  • C. Ferreira
    (2006)
  • M. Hausknecht et al.

    Deep Recurrent Q-learning for partially observable MDPs

  • HsiaoP. et al.

    A variable neighborhood search-based hyperheuristic for cross-domain optimization problems in CHeSC 2011 competition

  • S.J. Hunor et al.

    Novel feature selection and kernel-based value approximation method for reinforcement learning

  • Cited by (87)

    • Hyper-heuristics: A survey and taxonomy

      2024, Computers and Industrial Engineering
    View all citing articles on Scopus
    View full text