Generalized decomposition and cross entropy methods for many-objective optimization
Introduction
Multi-objective problems arise naturally in many disciplines, for example in control systems [1], finance [2] and biology [3]. A multi-objective problem (MOP) is defined as:where k is the number of objective functions and is the vector of decision variables defined in a feasible domain . In the event that there is conflict between the objectives, such that improved performance in one objective can only be obtained at the expense of reduced performance in another objective, then Eq. (1) admits no single optimal solution; rather a family of Pareto optimal solutions exists, representing different performance trade-offs for the problem at hand. Given that we would like to reveal this set of trade-offs to the decision maker (DM) then the task of the optimizer is to find a set of solutions that represent this Pareto front (PF) in objective-space. This type of optimization is generally referred to as a posteriori, since the DM applies his or her preferences for a particular trade-off between objectives after the full set of trade-offs have been revealed [4, pp. 63].
MOPs for 2 or 3 objectives have been heavily studied in the literature and effective optimizers are available for these types of problems – for example [5]. However it is now known that the Pareto-based algorithms designed for these types of problems experience a failure mode for problems with 4 or more objectives [6]. These types of problems are typically referred to as many-objective problems (MAPs). For brevity, hereafter we refer to multi and many-objective problems simply as MAPs.
Evolutionary algorithms (EAs) have long been regarded as a suitable choice of method for the a posteriori optimization of MAPs [7]. EAs maintain a family of solutions during the optimization process and therefore have the potential to maintain a representative set of trade-off solutions simultaneously, with the potential to exploit the synergies of a parallel search across all possible trade-offs. The algorithms that have been designed with this purpose in mind are known as multi-objective evolutionary algorithms (MOEAs). Another important reason for EA applicability is that they impose almost no constraints on the problem structure; for example, continuity and differentiability are not required for EA operation. Due to these factors MAP research is vibrant in the EA community, something that can be attested by the number of EAs available for MAPs, see [8]. However all MOEAs require the performance of a solution to be represented as a scalar fitness value, upon which the MOEAs can base their decision as to the direction of search. This is a very well known problem in MAPs and has been investigated by many researchers over the past three decades – seminal examples include [7], [9], [10]. There are two major classes of approaches for resolving this issue: Pareto-based and decomposition-based methods.
Pareto-based methods use the Pareto-dominance relations [4], to induce partial ordering in the objective space. These relations, were initially introduced by Edgeworth [11] and later expanded by Pareto [12]. For example for two vectors if all the elements in are smaller or equal to the corresponding elements in and at least one element in is strictly smaller than its corresponding element in . This partial ordering, induced by the relation, is denoted as , and, in the context of a minimization problem this expression is read as: the vector dominates . For a more complete treatment of Pareto-dominance relations the reader is referred to [4]. However such relations are of limited utility when the number of dimensions1 is increased [6]. This is primarily because the number of non-dominated solutions increases as the dimensionality of the problem increases, and for dimensions greater than around ten, almost all the solutions will tend to be non-dominated [13]. Hence this type of partial ordering appears to be of limited use in high dimensions since, if all the generated solutions are non-dominated, the EA has no objective measure on which to base its selection process.
Decomposition-based methods employ a scalarizing function to aggregate all the objectives into a single objective function. Such methods have been used predominantly in non-linear mathematical programming, where the main algorithm is based on some variant of gradient search [4], [14]. However multi-objective evolutionary algorithms have also employed decomposition, for example [15], [16], [17]. A central issue in decomposition-based methods is how to select a set of weighting vectors that will provide a well distributed set of Pareto optimal points. A popular assumption is that an even distribution of weighting vectors will result in well distributed Pareto optimal points [10]. However, with the help of a novel concept which we call generalized decomposition [18], we will demonstrate that this assumption is flawed and provide an exact solution to this issue, subject to having some prior information about the problem.
This problem with decomposition methods has motivated researchers to employ adaptive approaches for the selection of weighting vectors in decomposition-based algorithms. An interesting adaptive method to select the set of weighting vectors is presented in [19], [20]. The main idea is to identify the Pareto front geometry and then distribute a set of points on that surface in such a way so as to maximize the hypervolume indicator [21]. Subsequently, the points found in the previous step, are used to identify weighting vectors that, upon minimization of the resulting subproblems, would result in similar points on the Pareto front. The idea seems hopeful, however, there are three major difficulties with this approach. First, the authors assume that the Pareto front can be parameterized using the following,where ,2 and the fact that Eq. (2) equals to one means that the objective functions are normalized in the range, . However the problem of solving for the parameters in Eq. (2) is nonconvex. Nevertheless in [19], [20] this issue was not addressed and the Newton method was used. The Newton method however can only perform local search thus will be unable to identify the correct parameters. The effects of this difficulty are seen in [20] whereby a front described by: is generated and the estimate using the Newton method is: . Therefore, the first part of the suggested method can mislead the entire procedure in [19], [20]. The second problem is that the weighting vectors that correspond to points on the identified Pareto front are formulated in a similar fashion to Eq. (2), hence the issue of nonconvexity of the problem formulation emerges again and the resulting weighting vectors will not produce subproblems that converge to the reference points. Lastly, the hypervolume indicator [21], which is used to ascertain the quality of the reference points on the PF, has exponential complexity in the number of objectives [22], [23], which limits the method to approximately 4-objective problems, since the hypervolume must be calculated several times on every iteration of the algorithm [20]. Another interesting method is due to Gu et al. [24] and others. Although these methods appear to be promising there is evidence that adaptive schemes for the selection of the weighting vectors convert the optimization problem to a varying one which can have a potentially high impact on the convergence rate of the algorithm [25].
Despite the successes of MOEAs, particularly on problems with 2 or 3 objectives, their stochastic nature does present certain difficulties. For example, it is very hard to analyse the behavior of MOEAs analytically, thus their performance on a problem cannot be guaranteed prior to application. This is why EAs are usually evaluated experimentally using some test problem sets [26], [27], [28]. More recently, a new family of algorithms has emerged, namely estimation of distribution algorithms (EDAs). EDAs stand in the middle ground between Monte-Carlo simulation and EAs. In EDAs, a probabilistic model is built, based on elite individuals, which subsequently is sampled producing a new population of better3 individuals. From the EA point of view, EDAs can be traced back to recombination operators based on density estimators that use good performing individuals in the population as a sample [29]. A positive aspect of EDAs is that it is straightforward to fuse prior information into the optimization procedure, thus reducing the time to convergence if such information is available. Also, the amount of heuristics, compared with other EAs, is reduced easing the task of mathematical analysis of these algorithms. This is an important aspect which has been overlooked, due to inherent difficulties, in most heuristics for optimization. Studies of this kind are usually applied to algorithms that are not used in practice [30], [31]. However EDAs are not a panacea since they heavily depend on the quality and complexity of the underlying probabilistic model [32]. For instance, a simple EDA based on low-order statistics, i.e. an EDA that does not account for variable dependencies, can be easily misled if, in fact, such dependencies exist in the underlying problem. To overcome such difficulties researchers proposed ever more elaborate models [32], which of course increase the complexity of the algorithm and in some instances the identification of the optimal model is of comparable complexity to that of the optimization problem necessitating the use of heuristics [33]. Acknowledging this issue has led some researchers to suggest hybridization of EDAs based on simple probabilistic models with some form of clustering [34]. This course is further supported by more recent studies [35].
For these reasons we have selected an EDA, the so-called Cross Entropy method (CE), as the main optimization algorithm in our generalized decomposition-based framework. CE was introduced by Rubinstein [36], initially as a rare event estimation technique and subsequently as an algorithm for combinatorial and continuous optimization problems. The most attractive feature of CE is that, for a certain family of instrumental densities, the updating rules can be calculated analytically, and thus are extremely efficient and fast. Also the theoretical background of CE is enabling theoretical studies of this method which can provide sound guidelines about the applicability of this algorithm to problems.
The remainder of this paper is structured as follows. In Section 2 generalized decomposition is described along with the benefits that this method can bring to currently existing MOEAs. Following this, in Section 3 the EDA employed in our framework, the CE-method, is presented along with its form for continuous optimization problems. A many-objective optimization framework based on generalized decomposition and CE is described in Section 4. The algorithms in our comparative studies in Section 6 are described in Section 5. In Section 7 we illustrate how generalized decomposition can be used for preference articulation. Lastly in Section 8 we summarize and conclude this work.
Section snippets
Generalized decomposition
Generalized decomposition (gD) was first introduced in [18], as a way to optimally select the weighting vectors in decomposition-based algorithms, subject to the Pareto front geometry being known a priori. In this work we show that, even if this requirement is not fulfilled, the performance of gD can still be orders of magnitude better with regard to the quality of distribution of Pareto optimal points as measured by the Riesz kernel [37], when compared with two highly regarded methods.
Cross entropy method
In this section we introduce the main ideas of the cross entropy method. Furthermore, in Section 3.2 we present the continuous version of CE, as it is employed in this work.
Generalized decomposition-based many objective cross-entropy
The proposed algorithm is based on the CE method, see Section 3, and the newly introduced concept of generalized decomposition, as described in Section 2, and is known as many-objective cross entropy based on generalized decomposition (MACE-gD) The general idea is that we can generate a set of weighting vectors near regions that are of interest, thus avoiding a waste of resources in a search for Pareto optimal solutions away from such regions. The main algorithm in MACE-gD is the CE method for
Benchmark algorithms
The aims of the empirical testing of MACE-gD that follows are twofold: (1) to compare the algorithm to the existing best-in-class methods for (a) decomposition-based optimization and (b) multi-objective EDAs; (2) to compare the impact of generalized decomposition to the popular even distribution scheme for weight vectors. To satisfy aim (1), we compare MACE-gD against MOEA/D and also the regularity model-based estimation of distribution algorithm (RM-MEDA) [56]. To satisfy aim (2) we introduce
Performance indicator
In Section 2, it was argued that the three objectives that MOEAs have to achieve – namely convergence, diversity and PF coverage – can be reduced to only one, convergence, in the generalized decomposition framework. The most important metric of interest, therefore, becomes some measure of convergence to the PF. Therefore the generational distance (GD) indicator has been chosen as the main performance metric for our comparative study.
- •
Generational Distance (GD), introduced in [62], is defined
Preference articulation
Apart from convergence in MOEA algorithms, which is a relatively well defined concept, there can be no consensus on the meaning of a well distributed Pareto set. Apart from the theoretical difficulties, a proper definition of a well distributed PF cannot be given, mainly because it is contingent on the preferences of the decision maker (DM). Of what use would a Pareto optimal set be, if the solutions that are of interest to the DM are sparsely sampled, if at all?
Generalized decomposition can be
Conclusion
A new concept was introduced and used in the solution of many-objective optimization problems (MAPs), namely generalized decomposition (gD). With the aid of gD, weighting vectors can be selected optimally to satisfy specific requirements in the distribution of the Pareto optimal solutions along the PF. This approach allows decomposition-based MOEAs to focus on only one performance objective, that of convergence to the PF. This can be a significant advantage over other MOEAs that have to tackle
Acknowledgments
The authors would like thank Jacob Mattingley for providing access to his tool CVXGEN [66]. In this work CVXGEN is employed to solve Eq. (7). The authors also gratefully acknowledge Ricardo H.C. Takahashi for useful discussions and for his invaluable perspective with respect to the present work, during his visit to the University of Sheffield, while supported by a Marie Curie International Research Staff Exchange Scheme Fellowship within the 7th European Community Framework Programme.
References (66)
- et al.
Evolutionary algorithms in control systems engineering: a survey
Control Eng. Pract.
(2002) - et al.
Drift analysis and average time complexity of evolutionary algorithms
Artif. Intell.
(2001) - et al.
Energy functionals, numerical integration and asymptotic equidistribution on the sphere
J. Complex.
(2003) - et al.
Distributed aero-engine control systems architecture selection using multi-objective optimisation
Control Eng. Pract.
(1999) - et al.
A hybrid multi-objective immune algorithm for a flow shop scheduling problem with bi-objectives: weighted mean completion time and weighted mean tardiness
Inform. Sci.
(2007) - et al.
An effective hybrid particle swarm optimization algorithm for multi-objective flexible job-shop scheduling problem
Comput. Ind. Eng.
(2009) - M. Tapia, C. Coello, Applications of multi-objective evolutionary algorithms in economics and finance: a survey, in:...
- N. Krasnogor, W. Hart, J. Smith, D. Pelta, Protein structure prediction with evolutionary algorithms, in: Proceedings...
- (1999)
- E. Zitzler, M. Laumanns, L. Thiele, et al., SPEA2: improving the strength Pareto evolutionary algorithm, in: EUROGEN,...
Evolutionary many-objective optimisation: an exploratory analysis
Genetic algorithms and machine learning
Mach. Learn.
An overview of population-based algorithms for multi-objective optimisation
Int. J. Syst. Sci.
MOEA/D: a multiobjective evolutionary algorithm based on decomposition
IEEE Trans. Evol. Comput.
A new constrained ellipsoidal algorithm for nonlinear optimization with equality constraints
IEEE Trans. Magn.
On the performance of multiple-objective genetic local search on the 0/1 knapsack problem – a comparative experiment
IEEE Trans. Evol. Comput.
Multiple single objective Pareto sampling
Generalized decomposition
Asymmetric Pareto-adaptive scheme for multiobjective optimization
Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach
IEEE Trans. Evol. Comput.
A faster algorithm for calculating hypervolume
IEEE Trans. Evol. Comput.
A multiobjective evolutionary algorithm using dynamic weight method
Int. J. Innov. Comput. Inform. Control
A review of multiobjective test problems and a scalable test problem toolkit
IEEE Trans. Evol. Comput.
Comparison of multiobjective evolutionary algorithms: empirical results
Evol. Comput.
Cited by (102)
A dual-robot cooperative arc welding path planning algorithm based on multi-objective cross-entropy optimization
2024, Robotics and Computer-Integrated ManufacturingMaOEA/D with adaptive external population guided weight vector adjustment
2024, Expert Systems with ApplicationsAn XGBoost-assisted evolutionary algorithm for expensive multiobjective optimization problems
2024, Information SciencesEffects of corner weight vectors on the performance of decomposition-based multiobjective algorithms
2023, Swarm and Evolutionary ComputationMethods for constrained optimization of expensive mixed-integer multi-objective problems, with application to an internal combustion engine design problem
2023, European Journal of Operational Research