An analysis of Gray versus binary encoding in genetic search

doi:10.1016/S0020-0255(03)00178-6

Information Sciences

Volume 156, Issues 3–4, 15 November 2003, Pages 253-269

https://doi.org/10.1016/S0020-0255(03)00178-6 Get rights and content

Abstract

This paper employs a Markov model to study the relative performance of binary and Gray coding in genetic algorithms. The results indicate that while there is not much difference between the two for all possible functions, Gray coding does not necessarily improve performance for functions which have fewer local optima in the Gray representation than in binary.

Introduction

Any parameter optimization technique, including genetic algorithms (GAs) [9], requires some method of representing the parameters. Without loss of generality, we discuss here integer parameters. One approach used in GAs is to code an integer parameter directly in its base-2 representation, using explicit bits (the genotype level), and then apply a standard binary-to-integer mapping to decode the parameter value (the phenotype level). Alternatively, one may represent the parameter using the readily available integer data type (thus merging the genotype and phenotype levels). Among possible bit-string representations, the Gray code is known to alleviate the “Hamming cliff” problem. An example of a Hamming cliff is the transition from 7 to 8 in binary coding, where all the bits (in four-bit coding) change (from 0111 to 1000) corresponding to a change of one in the phenotype. The distance between two chromosomes, at the genotype level, is measured by Hamming distance, which is simply the number of bits that differ. In the Gray code, the Hamming distance is always one for any two strings (chromosomes) that are adjacent (differing by one) at the phenotype level. That is not the case in the standard binary code where a single bit flip at the most significant position dramatically changes the value (phenotype). There are many Gray codes [8]; in this paper we use the binary reflected Gray code and refer to it as simply the Gray code. The algorithms for Binary-to-Gray and Gray-to-Binary conversions are given below (a binary string b₁,…,b_L and a Gray string g₁,…,g_L are considered):

procedure Binary-to-Gray
begin
- g₁=b₁;
- for i=2 to n do
  - g_i=b_i−1 XOR b_i;
end

procedure Gray-to-Binary
begin
- b₁=bitvalue=g₁;
- for i=2 to n do
- begin
  - if g_i=1 then bitvalue=COMPLEMENT(bitvalue);
  - b_i=bitvalue;
- end
end

Use of Gray coding has been shown to produce improved GA performance in some cases [2], [4], [13], [19]. This has led some researchers (e.g., [7]) to abandon binary coding in favor of Gray. Some others (e.g., [12]), however, did not find Gray helpful. Most of the previous research into the binary-versus-Gray issue in GAs has been based on (non-exhaustive) empirical studies. In this paper, we undertake a Markov chain theory-based exhaustive numerical approach to investigate the relative performance of the two representations. Our model indicates that for all possible functions there is a very small difference between the two. In [17], [22], it was argued that Gray encoding would outperform binary encoding on the special class of functions for which the number of local minima in the binary Hamming space is greater than the corresponding number in the Gray Hamming space. Our results show that even though it is often the case, it is not universally true.

We also analyze the comparative performance of Gray and binary encoding for a simpler search algorithm of the same genre, namely stochastic hillclimbing. Our Markov model for stochastic hillclimbing shows essentially the same Gray-versus-binary behavior.

The remainder of this paper is organized as follows. In Section 2 we explain integer, binary and Gray neighborhoods and illustrate with a specific example how a function may possess different local optima in different neighborhoods. Section 3 describes the Markov model, and Section 4 explains the metric used for comparison––the expected first passage time to optimality. Results of GA performance with the two encodings are presented in Section 5. Section 6 argues that alongside the choice of representations, the right choice of genetic operators is also important for any practical application. In Section 7 a Markov model is developed for stochastic hillclimbing and relative performance results are presented for Gray and binary. Section 8 provides a summary and some concluding remarks.

Section snippets

Local optima

In any binary representation, the neighbors of a given string are those with Hamming distance one. In the integer representation, the neighbors are those integers immediately greater and smaller. Thus, an L-bit string has exactly L neighbors in any binary representation and two such neighbors in the integer representation. A local optimum in a discrete search space is a point whose fitness is better than those of all of its neighbors.

It is possible for a function to have different numbers of

The Markov model

Markov chains have a long history of being used in the analysis of evolutionary algorithms (e.g., [3], [5], [6], [10], [11], [16], [18], [20], [21]). In the Markov model used here each population configuration represents a state. Let N and L represent, respectively, the population size and the string length. The number of occurrences of each of the 2^L strings in a given state is given by state(i) for i∈S where S={0,1,…,2^L−1}. Let s represent the state space of the GA. Then the size of the state

Expected first passage time to convergence

We fill the $N+2^{L} −1 N × N+2^{L} −1 N$ transition probability matrix with probabilities obtained by using Eq. (1). We compare the performances of binary and Gray encodings using the following metric: the expected first passage time to a state that contains at least one copy (instance) of the global optimum. Clearly, the lower this value, the better.

We denote by p_ij^(t) the probability of transition from state i to state j in t steps. Let f_ij^(t) stand for the probability that in a GA starting from state i the

Results

There are infinitely many functions defined over L bits, differing by function evaluations and their permutations. To have a finite case, we restrict function evaluations to the range 1 to 2^L and we permute these 2^L distinct values. Thus, for L=3, we have a total of (2³)!=40,320 different functions, corresponding to as many permutations. For example, L=3 gives 2³=8 function evaluations: 1,2,…,8, and for these 8 evaluations, one possible permutation is {F(0)=1,F(1)=2,…,F(7)=8}.

Without loss of

Another look at the choice of representation

We considered two different bit-string representations, while assuming exactly the same mutation and crossover operators. However, in practice one may apply different operators, possibly representation- or domain-specific. Thus, in general, the usefulness of a given representation cannot be assessed without taking the operators into account. To prove this point, we show that equivalent operators can always be constructed such that the Gray-coded GA runs exactly the same as the binary-coded GA.

Stochastic hillclimbing

The following version of stochastic hillclimbing [1] is used in this paper (the problem considered is one of minimization):

1.
Select a point––the current point, x_c––at random and evaluate it. Let the fitness be f_c.
2.
Select an adjacent point, x_a, at random and evaluate it. Let f_a be its fitness.
3.
Accept the adjacent point as the current point (that is, x_c←x_a with probability $1 1+ e^{(f_{a}−f_{c})/T},$ where T is a parameter (the temperature) of the algorithm.
4.
If a predetermined termination condition is not satisfied,

Conclusions

This paper has shed some light on the Gray-versus-binary debate in GAs. A finite-population GA has been modeled using well-known techniques from Markov chain theory and the relative performance of Gray-coded and binary-coded GAs studied using the expected first passage time to optimality as the figure of merit. Over all possible functions there is only a small difference between the two representations, and fewer local optima do not necessarily make the task easier for Gray coding. The present

Acknowledgements

The first author was supported by a UMSL Research Award, 2002.

References (23)

D.H. Ackley
A Connectionist Machine for Genetic Hillclimbing
(1997)
R.A. Caruana, J.D. Schaffer, Representation and hidden bias: Gray vs. binary coding for genetic algorithms, in:...
U.K. Chakraborty et al.
Analysis of selection algorithms: A Markov chain approach
Evolutionary Computation
(1996)
U.K. Chakraborty, D.G. Dastidar, Chromosomal encoding in genetic adaptive search, in: Proceedings of International...
T.E. Davis et al.
A Markov chain framework for the simple genetic algorithm
Evolutionary Computation
(1993)
K.A. De Jong, An analysis of the behavior of a class of genetic adaptive systems, Ph.D. Thesis, University of Michigan,...
L.J. Eshelman
The CHC adaptive search algorithm, Foundations of Genetic Algorithms––I
(1991)
E.N. Gilbert
Gray Codes and Paths on the n-Cube
Bell System Technical Journal
(1958)
D.E. Goldberg
Genetic Algorithms in Search, Optimization, and Machine Learning
(1989)
D.E. Goldberg et al.
Finite Markov chain analysis of genetic algorithms

A.E. Eiben et al.

Global convergence of genetic algorithms: a Markov chain analysis

Cited by (48)

A Tutorial On the design, experimentation and application of metaheuristic algorithms to real-World optimization problems
2021, Swarm and Evolutionary Computation
In the last few years, the formulation of real-world optimization problems and their efficient solution via metaheuristic algorithms has been a catalyst for a myriad of research studies. In spite of decades of historical advancements on the design and use of metaheuristics, large difficulties still remain in regards to the understandability, algorithmic design uprightness, and performance verifiability of new technical achievements. A clear example stems from the scarce replicability of works dealing with metaheuristics used for optimization, which is often infeasible due to ambiguity and lack of detail in the presentation of the methods to be reproduced. Additionally, in many cases, there is a questionable statistical significance of their reported results. This work aims at providing the audience with a proposal of good practices which should be embraced when conducting studies about metaheuristics methods used for optimization in order to provide scientific rigor, value and transparency. To this end, we introduce a step by step methodology covering every research phase that should be followed when addressing this scientific field. Specifically, frequently overlooked yet crucial aspects and useful recommendations will be discussed in regards to the formulation of the problem, solution encoding, implementation of search operators, evaluation metrics, design of experiments, and considerations for real-world performance, among others. Finally, we will outline important considerations, challenges, and research directions for the success of newly developed optimization metaheuristics in their deployment and operation over real-world application environments.
Deep convolutional neural network architecture design as a bi-level optimization problem
2021, Neurocomputing
Citation Excerpt :
– Crossover operator: To vary the population at the lower level, we use the two-point crossover operator [40] as it allows varying all parts of the chromosomes. To be able to apply such operator, each parent solution is a set of binary strings [13]. In the two-point crossover process, two cutting points are applied two each parent and then the bits between the cuts are swapped to obtain two offspring solutions.
During the last decade, deep neural networks have shown a great performance in many machine learning tasks such as classification and clustering. One of the most successful networks is the CNN (Convolutional Neural Network), which has been applied in many application domains such as pattern recognition, medical diagnosis, and signal processing. Despite the very interesting performance of CNNs, their architecture design is still so far a major challenge for researchers and practitioners. Several works have been proposed in the literature with the aim to find optimized architectures such as ResNet and VGGNet. Unfortunately, most of these architectures are either manually defined by experts or automatically designed by greedy induction algorithms. Recent works suggest the use of Evolutionary Algorithms (EAs) thanks to their ability to escape locally-optimal architectures. Despite the fact that EAs have shown interesting performance, researchers in this direction have considered the design task as a single-level optimization problem; which represents the main research gap we tackle in this paper. The main contribution behind our work consists in the fact that CNN architecture design has a hierarchical nature and thus could be seen as a Bi-Level Optimization Problem (BLOP) where: (1) the upper level minimizes the network complexity defined by the number of blocks and the number of nodes per block; and (2) the lower level optimizes the convolution block ‘graphs’ topologies by maximizing the classification accuracy. Motivated by the originality of our observation with respect to the state of the art, we frame for the first time the CNN architecture design problem as a BLOP and then solve it using an adapted version of an existing efficient bi-level EA; through the definition of the solution encoding, the fitness function, and the variation operators at each level. The adapted EA is named BLOP-CNN and is assessed on the image classification task using the commonly employed CIFAR-10 and CIFAR-100 benchmark data sets. The analysis of our experimental results show the merits of our proposed method in providing the user with optimized architectures that outperform many recent and prominent architectures coming from the three different approaches, namely: manual design, reinforcement learning-based generation, and evolutionary optimization. Moreover, to show the applicability of our approach, we have conducted a case study on the detection of the COVID-19 using a set of benchmark chest X-ray and Computed Tomography (CT) images.
Optimized coordination of transmission network outages in interconnected power grids
2019, Electric Power Systems Research
Citation Excerpt :
Gray representation attenuates the effects of the Hamming cliffs [20], commonly associated with the binary encoding, when slight changes in the encoded variables (genotype) may cause large variations of fitness values. Other benefits of the Gray encoding are described in Refs. [18] and [25]. In Section 5, it will be illustrated how the uniqueness criterion (proposed in Section 3.2) can be used to switch automatically between Gray and integer encodings to favour diversification or intensification during the evolutionary process.
This paper addresses an interesting problem for independent system operators (ISOs): the coordination of network maintenance schedule. Requested outages should be coordinated by the ISO to avoid unacceptable operating conditions and to perform the system risk evaluation, which requires the analysis of many operating scenarios to accommodate the planned outages. A methodology to solve this combinatorial problem is developed, based on an evolutionary algorithm that comprises strategies to control the diversification and intensification phases of the optimization process. Numerical results obtained with simulation studies, performed on four IEEE test-systems and the reduced southern Brazilian interconnected power system, are discussed. It is shown that the proposed approach can effectively lead to high-quality solutions in a reasonable computing time for short-term operation planning.
Application of metaheuristic algorithms to the identification of nonlinear magneto-viscoelastic constitutive parameters
2018, Journal of Magnetism and Magnetic Materials
Metaheuristic algorithms offer a robust and convenient method to solve highly nonlinear optimisation problems in engineering applications. In this work we evaluate the effectiveness of a collection of canonical algorithms at performing parameter identification for nonlinear constitutive laws that describe coupled, magnetic-field responsive materials. To achieve this, we define an objective function that captures the influence of many physical measurements recorded during experimental analysis of a coupled material, and incorporates the influence of experimental uncertainty. A benchmark of the algorithms is conducted through the evaluation of a magneto-visco-elastic material by means of numerically-derived parallel-plate rotational rheometry. The effectiveness of each algorithm at matching the fictitious, but representative, experimental data was considered using two different metrics. In addition to the ranking based on a non-parametric statistical test, we consider an ad hoc criterion that accounts for only the top performing candidate solutions. It is determined that the continuous real and discrete bitstring genetic algorithm provide the best overall performance in terms of the accuracy of the predicted parameters, while globally-elitist simulated annealing provides the best compromise between accuracy and computational efficiency. When experimental uncertainties exist (which is always the case for data determined within a laboratory setting), it has been observed that the strong link between constitutive parameters and physical material properties, which is typically assumed, no longer holds.
Neighborhood preserving codes for assigning point labels: Applications to stochastic search
2013, Procedia Computer Science
Selecting a good representation of a solution-space is vital to solving any search and optimization problem. In particular, once regions of high performance are found, having the property that small changes in the candidate solution correspond to searching nearby neighborhoods provides the ability to perform effective local optimization. To achieve this, it is common for stochastic search algorithms, such as stochastic hillclimbing, evolutionary algorithms (including genetic algorithms), and simulated annealing, to employ Gray Codes for encoding ordinal points or discretized real numbers. In this paper, we present a novel method to label similar and/or close points within arbitrary graphs with small Hamming distances. The resultant point labels can be seen as an approximate high-dimensional variant of Gray Codes with standard Gray Codes as a subset of the labels found here. The labeling procedure is applicable to any task in which the solution requires the search algorithm to select a small subset of items out of many. Such tasks include vertex selection in graphs, knapsack-constrained item selection, bin packing, prototype selection for machine learning, and numerous scheduling problems, to name a few.
Joint filter and channel pruning of convolutional neural networks as a bi-level optimization problem
2024, Memetic Computing

View all citing articles on Scopus

View full text

An analysis of Gray versus binary encoding in genetic search

Abstract

Introduction

Section snippets

Local optima

The Markov model

Expected first passage time to convergence

Results

Another look at the choice of representation

Stochastic hillclimbing

Conclusions

Acknowledgements

A Connectionist Machine for Genetic Hillclimbing

Analysis of selection algorithms: A Markov chain approach

Evolutionary Computation

A Markov chain framework for the simple genetic algorithm

Evolutionary Computation

The CHC adaptive search algorithm, Foundations of Genetic Algorithms––I

Gray Codes and Paths on the n-Cube

Bell System Technical Journal

Genetic Algorithms in Search, Optimization, and Machine Learning

Finite Markov chain analysis of genetic algorithms

Global convergence of genetic algorithms: a Markov chain analysis