Elsevier

Information Sciences

Volume 156, Issues 3–4, 15 November 2003, Pages 253-269
Information Sciences

An analysis of Gray versus binary encoding in genetic search

https://doi.org/10.1016/S0020-0255(03)00178-6Get rights and content

Abstract

This paper employs a Markov model to study the relative performance of binary and Gray coding in genetic algorithms. The results indicate that while there is not much difference between the two for all possible functions, Gray coding does not necessarily improve performance for functions which have fewer local optima in the Gray representation than in binary.

Introduction

Any parameter optimization technique, including genetic algorithms (GAs) [9], requires some method of representing the parameters. Without loss of generality, we discuss here integer parameters. One approach used in GAs is to code an integer parameter directly in its base-2 representation, using explicit bits (the genotype level), and then apply a standard binary-to-integer mapping to decode the parameter value (the phenotype level). Alternatively, one may represent the parameter using the readily available integer data type (thus merging the genotype and phenotype levels). Among possible bit-string representations, the Gray code is known to alleviate the “Hamming cliff” problem. An example of a Hamming cliff is the transition from 7 to 8 in binary coding, where all the bits (in four-bit coding) change (from 0111 to 1000) corresponding to a change of one in the phenotype. The distance between two chromosomes, at the genotype level, is measured by Hamming distance, which is simply the number of bits that differ. In the Gray code, the Hamming distance is always one for any two strings (chromosomes) that are adjacent (differing by one) at the phenotype level. That is not the case in the standard binary code where a single bit flip at the most significant position dramatically changes the value (phenotype). There are many Gray codes [8]; in this paper we use the binary reflected Gray code and refer to it as simply the Gray code. The algorithms for Binary-to-Gray and Gray-to-Binary conversions are given below (a binary string b1,…,bL and a Gray string g1,…,gL are considered):

  • procedure Binary-to-Gray

  • begin

    • g1=b1;

    • for i=2 to n do

      • gi=bi−1 XOR bi;

  • end

  • procedure Gray-to-Binary

  • begin

    • b1=bitvalue=g1;

    • for i=2 to n do

    • begin

      • if gi=1 then bitvalue=COMPLEMENT(bitvalue);

      • bi=bitvalue;

    • end

  • end


Use of Gray coding has been shown to produce improved GA performance in some cases [2], [4], [13], [19]. This has led some researchers (e.g., [7]) to abandon binary coding in favor of Gray. Some others (e.g., [12]), however, did not find Gray helpful. Most of the previous research into the binary-versus-Gray issue in GAs has been based on (non-exhaustive) empirical studies. In this paper, we undertake a Markov chain theory-based exhaustive numerical approach to investigate the relative performance of the two representations. Our model indicates that for all possible functions there is a very small difference between the two. In [17], [22], it was argued that Gray encoding would outperform binary encoding on the special class of functions for which the number of local minima in the binary Hamming space is greater than the corresponding number in the Gray Hamming space. Our results show that even though it is often the case, it is not universally true.

We also analyze the comparative performance of Gray and binary encoding for a simpler search algorithm of the same genre, namely stochastic hillclimbing. Our Markov model for stochastic hillclimbing shows essentially the same Gray-versus-binary behavior.

The remainder of this paper is organized as follows. In Section 2 we explain integer, binary and Gray neighborhoods and illustrate with a specific example how a function may possess different local optima in different neighborhoods. Section 3 describes the Markov model, and Section 4 explains the metric used for comparison––the expected first passage time to optimality. Results of GA performance with the two encodings are presented in Section 5. Section 6 argues that alongside the choice of representations, the right choice of genetic operators is also important for any practical application. In Section 7 a Markov model is developed for stochastic hillclimbing and relative performance results are presented for Gray and binary. Section 8 provides a summary and some concluding remarks.

Section snippets

Local optima

In any binary representation, the neighbors of a given string are those with Hamming distance one. In the integer representation, the neighbors are those integers immediately greater and smaller. Thus, an L-bit string has exactly L neighbors in any binary representation and two such neighbors in the integer representation. A local optimum in a discrete search space is a point whose fitness is better than those of all of its neighbors.

It is possible for a function to have different numbers of

The Markov model

Markov chains have a long history of being used in the analysis of evolutionary algorithms (e.g., [3], [5], [6], [10], [11], [16], [18], [20], [21]). In the Markov model used here each population configuration represents a state. Let N and L represent, respectively, the population size and the string length. The number of occurrences of each of the 2L strings in a given state is given by state(i) for iS where S={0,1,…,2L−1}. Let s represent the state space of the GA. Then the size of the state

Expected first passage time to convergence

We fill theN+2L−1N×N+2L−1Ntransition probability matrix with probabilities obtained by using Eq. (1). We compare the performances of binary and Gray encodings using the following metric: the expected first passage time to a state that contains at least one copy (instance) of the global optimum. Clearly, the lower this value, the better.

We denote by pij(t) the probability of transition from state i to state j in t steps. Let fij(t) stand for the probability that in a GA starting from state i the

Results

There are infinitely many functions defined over L bits, differing by function evaluations and their permutations. To have a finite case, we restrict function evaluations to the range 1 to 2L and we permute these 2L distinct values. Thus, for L=3, we have a total of (23)!=40,320 different functions, corresponding to as many permutations. For example, L=3 gives 23=8 function evaluations: 1,2,…,8, and for these 8 evaluations, one possible permutation is {F(0)=1,F(1)=2,…,F(7)=8}.

Without loss of

Another look at the choice of representation

We considered two different bit-string representations, while assuming exactly the same mutation and crossover operators. However, in practice one may apply different operators, possibly representation- or domain-specific. Thus, in general, the usefulness of a given representation cannot be assessed without taking the operators into account. To prove this point, we show that equivalent operators can always be constructed such that the Gray-coded GA runs exactly the same as the binary-coded GA.

Stochastic hillclimbing

The following version of stochastic hillclimbing [1] is used in this paper (the problem considered is one of minimization):

  • 1.

    Select a point––the current point, xc––at random and evaluate it. Let the fitness be fc.

  • 2.

    Select an adjacent point, xa, at random and evaluate it. Let fa be its fitness.

  • 3.

    Accept the adjacent point as the current point (that is, xcxa with probability11+e(fa−fc)/T,where T is a parameter (the temperature) of the algorithm.

  • 4.

    If a predetermined termination condition is not satisfied,

Conclusions

This paper has shed some light on the Gray-versus-binary debate in GAs. A finite-population GA has been modeled using well-known techniques from Markov chain theory and the relative performance of Gray-coded and binary-coded GAs studied using the expected first passage time to optimality as the figure of merit. Over all possible functions there is only a small difference between the two representations, and fewer local optima do not necessarily make the task easier for Gray coding. The present

Acknowledgements

The first author was supported by a UMSL Research Award, 2002.

References (23)

  • D.H. Ackley

    A Connectionist Machine for Genetic Hillclimbing

    (1997)
  • R.A. Caruana, J.D. Schaffer, Representation and hidden bias: Gray vs. binary coding for genetic algorithms, in:...
  • U.K. Chakraborty et al.

    Analysis of selection algorithms: A Markov chain approach

    Evolutionary Computation

    (1996)
  • U.K. Chakraborty, D.G. Dastidar, Chromosomal encoding in genetic adaptive search, in: Proceedings of International...
  • T.E. Davis et al.

    A Markov chain framework for the simple genetic algorithm

    Evolutionary Computation

    (1993)
  • K.A. De Jong, An analysis of the behavior of a class of genetic adaptive systems, Ph.D. Thesis, University of Michigan,...
  • L.J. Eshelman

    The CHC adaptive search algorithm, Foundations of Genetic Algorithms––I

    (1991)
  • E.N. Gilbert

    Gray Codes and Paths on the n-Cube

    Bell System Technical Journal

    (1958)
  • D.E. Goldberg

    Genetic Algorithms in Search, Optimization, and Machine Learning

    (1989)
  • D.E. Goldberg et al.

    Finite Markov chain analysis of genetic algorithms

  • A.E. Eiben et al.

    Global convergence of genetic algorithms: a Markov chain analysis

  • Cited by (48)

    • Deep convolutional neural network architecture design as a bi-level optimization problem

      2021, Neurocomputing
      Citation Excerpt :

      – Crossover operator: To vary the population at the lower level, we use the two-point crossover operator [40] as it allows varying all parts of the chromosomes. To be able to apply such operator, each parent solution is a set of binary strings [13]. In the two-point crossover process, two cutting points are applied two each parent and then the bits between the cuts are swapped to obtain two offspring solutions.

    • Optimized coordination of transmission network outages in interconnected power grids

      2019, Electric Power Systems Research
      Citation Excerpt :

      Gray representation attenuates the effects of the Hamming cliffs [20], commonly associated with the binary encoding, when slight changes in the encoded variables (genotype) may cause large variations of fitness values. Other benefits of the Gray encoding are described in Refs. [18] and [25]. In Section 5, it will be illustrated how the uniqueness criterion (proposed in Section 3.2) can be used to switch automatically between Gray and integer encodings to favour diversification or intensification during the evolutionary process.

    View all citing articles on Scopus
    View full text