Genetic algorithm parameter sets for line labelling

https://doi.org/10.1016/S0167-8655(97)00111-6Get rights and content

Abstract

This paper concerns the use of genetic algorithms for line labelling. We are interested in finding an optimal set of algorithm control parameters for this problem. We give results from using a simple genetic algorithm to solve several line labelling problems and discuss the effects of crossover type, population size, crossover rate, mutation rate and iteration limit on algorithm performance. We conclude that the algorithm is very sensitive to mutation rate, and that there is a threshold population size beyond which success rates are very high but that this threshold increases rapidly with the problem size. We recommend that a mutation rate of 0.02 be used in conjunction with a crossover rate of between 0.6 and 0.9. Iteration limit should initially be high, and should only be lowered when the other parameters have been tuned.

Introduction

Computer simulations of evolution were first suggested in the late 1950s and early 1960s (Fraser, 1957; Toombs, 1967). Holland's genetic algorithm (Holland, 1975) has become the most commonly used technique for simulated evolutionary optimisation. Genetic algorithms have been successfully used for a wide variety of problems (Ambati et al., 1990; Jefferson et al., 1991; Koza, 1992). Despite this there has never been an adequate theoretical framework which allows researchers to set optimal control parameters for the algorithm, although some qualitative aspects of its behaviour are known (Rudolph, 1994; Grefenstette, 1986; Qi and Palmieri, 1994a, Qi and Palmieri, 1994b). Quantitative results have been obtained on a suite of test functions, and may be estimated from a variety of theoretical models. However, experience has shown that setting optimal control parameters is not easy (Grefenstette, 1986; DeJong, 1990), and that the performance of the algorithm is problem-specific; the “standard” suite of test functions (DeJong, 1975) is criticised in (Graves, 1995); and the available models make simplifying assumptions such as infinite population size (Qi and Palmieri, 1994aQi and Palmieri, 1994b). To make matters worse, the algorithm itself is a grossly oversimplified model of the real evolutionary process: there is no speciation, no complex gene interactions such as dominance and variable penetration, no species interaction, and the “environment” does not change.

In this paper we aim to investigate the use of genetic algorithms for line-labelling. The interpretation of line drawings has been an important topic in machine vision since the seminal work of Huffman (1991), Clowes (1971) and Waltz (1975), and has obvious applications in document analysis, processing architects' sketches, engineering drawings and so on. Viewed as a constraint satisfaction problem, line-labelling is one in which there are many “related” ambiguous solutions. In this paper we cast line labelling in an optimisation framework and consider the effectiveness of a genetic algorithm for locating ambiguous solutions. In particular, we focus on the choice of genetic algorithm control parameters, their interactions, and their problem-specificity.

Section snippets

Genetic algorithms

A genetic algorithm manipulates a population of candidate solutions to a problem. The candidate solutions are typically binary strings, but any representation may be used. At every generation, some of the candidate solutions are paired and parts of each individual are mixed to form two new solutions; this is crossover: uniform crossover exchanges individual bits whereas multi-point crossover exchanges whole substrings. Additionally, every individual is subject to random change – mutation. The

Line labelling

It was Waltz who first showed how a dictionary of consistent junction labellings could be used in an efficient search for consistent interpretations of polyhedral objects – this led to his seminal discrete relaxation algorithm. Such dictionaries are derived from the geometric constraints on the projection of 3D scenes onto 2D planes (Waltz, 1975; Sugihara, 1978). Hancock and Kittler have built on the work of Faugeras and Berthod (1981) and Hummel and Zucker (1983) by developing a Bayesian

Parameter choice

It is well known that choosing suitable parameter values for genetic algorithms is very difficult (Grefenstette, 1986; DeJong, 1990). Here we adopt an empirical approach in which several different parameter sets will be tried on a set of labelling problems. We would expect the genetic algorithm to proceed by combining locally consistent labellings until a solution is found. The choice of operator parameters (crossover and mutation rates) is essentially a tradeoff between conservatism and

Experimental design

We wish to examine the effects of varying population size, iteration limit, crossover rate, and mutation rate for a variety of problems and two crossover types. The space of possible parameter sets is large even if we restrict ourselves to a few values for each parameter: if 9 values of each parameter are to be tested, 94=6561 possible combinations are available. To reduce the dimensionality of this space we adopt a graeco-latin square design: if we encode the four parameters as 1–9, I–IX, A–I

Conclusion

We have demonstrated that for a range of line labelling problems, 2-point crossover often out-performs uniform crossover, and that mutation rates of the order of 0.02 with crossover rates in the range [0.6,0.9] produce the best results, given a sufficiently large population size and a reasonable iteration limit. For small problems (about 10 lines), a population size of 30 is adequate, but larger problems (20 to 40 lines) require significantly larger populations. A rough calculation indicates

References (23)

  • M.B. Clowes

    On seeing things

    Artif. Intell.

    (1971)
  • E.R. Hancock et al.

    Discrete relaxation

    Pattern Recognition

    (1990)
  • K. Sugihara

    Picture language for skeletal polyhedra

    Comput. Graphics Image Process.

    (1978)
  • J. Ambati et al.

    Heuristic combinatorial optimisation by simulated darwinian evolution: A polynomial time algorithm for the travelling salesman problem

    Biological Cybernetics

    (1990)
  • DeJong, K.A., 1975. An analysis of the behaviour of a class of genetic adaptive systems. Ph.D. Thesis. Dept. of...
  • DeJong, K.A., Spears, W.M., 1990. An analysis of the interacting rôles of population size and crossover in genetic...
  • Dobson, A.J., 1983. An Introduction to Statistical Modelling. Chapman and Hall,...
  • O.D. Faugeras et al.

    Improving consistency and reducing ambiguity in stochastic labeling: An optimisation approach

    IEEE Pattern. Anal. Machine Intell.

    (1981)
  • A.S. Fraser

    Simulation of genetic systems by automatic digital computers

    Australian J. Biological Sci.

    (1957)
  • C. Graves et al.

    Test driving three 1995 genetic algorithms: New test functions and geometric matching

    J. Heuristics

    (1995)
  • Green, M., Francis, B., Payne, C. (Eds.), 1993. The GLIM System Release 4 Manual. Oxford University Press,...
  • Cited by (15)

    • Software vendors travel management decisions using an elitist nonhomogeneous genetic algorithm

      2018, International Journal of Production Economics
      Citation Excerpt :

      A study finding optimum GA control parameters was initiated by Grefenstette (1986). Other authors have proposed various approaches to find the optimum value of PC (Schaffer and Morishma, 1987; Myers and Hancock, 1997; Cicirello and Smith, 2000), and PM (Fogarty, 1989; Hesser and Maïnner, 1990). These approaches help to preserve the diversity of the population and protect solutions with higher fitness.

    • Hybrid genetic algorithm and fuzzy clustering for bankruptcy prediction

      2017, Applied Soft Computing Journal
      Citation Excerpt :

      The second ratio set (set 2), to compare with the proposed GA-based selection scheme, was obtained by applying the GA toolbox in the SAS statistical software package (Hsu [115]) and is shown in Table 11. In the proposed GA-based selection schemes, the chromosome length was set as 64 bits representing the 64 candidate financial ratios, the population size was 50 being within the widely used range of [30110] (Myers and Hancock [146]; Grefenstette [147]), the evolution generation was 200, the reproduction rate was 40%, and the mutation probability of each bit was 0.06%. In a random search, both of the proposed GA-based methods were performed a 100 times, and the financial ratios with selected times reaching 90 were identified as the desired ratios (variables).

    • Image analysis and computer vision: 1998

      1999, Computer Vision and Image Understanding
    View all citing articles on Scopus
    View full text