A hybrid genetic algorithm for the design of water distribution networks

https://doi.org/10.1016/j.engappai.2004.10.001Get rights and content

Abstract

Genetic algorithms are currently one of the state-of-the-art techniques for the optimisation of engineering systems including water network design and rehabilitation. They are capable of finding near optimal cost solutions to these problems given certain cost and hydraulic parameters. However, many forms of genetic algorithms rely on random starting points that are often poor solutions and the problem of how to efficiently provide good initial estimates of solution sets automatically is still an ongoing research topic. This paper proposes a novel method, known as CANDA-GA, which uses a heuristic-based, local representative cellular automata approach to provide a good initial population for genetic algorithm runs. CANDA-GA is applied to three networks, one taken from the literature and two taken from industry. The results show that the proposed method consistently outperforms the conventional non-heuristic-based GA approach in terms of producing more economically designed water distribution networks.

Introduction

The problem of designing a water distribution network (WDN) to optimally meet performance and cost criteria is known to be NP hard and a large variety of computational algorithms have been devised for this task. In recent years, the genetic algorithm (GA) has proved to be one of the most popular algorithms in a variety of domains that include engineering optimisation problems and the design of WDNs. The application of GAs to WDN optimisation can be traced back to the late 1990s (Dandy et al., 1996; Savic and Walters, 1997) and whilst there are now many more variants of the algorithm than there were then, it remains a vital tool for WDN optimisation. Therefore, it can be said with some confidence that GAs represent a state-of-the-art approach to WDN optimisation. Conventional GAs usually begin the optimisation process by randomly generating a solution set, evaluating each solution's performance on the problem and then selecting the best for entry into the next generation. Selecting random solutions is an intuitive way of generating unbiased solutions when the algorithm has no prior information on the search space.

More recently, a number of researchers (Neppalli et al., 1996; Harik and Goldberg, 2000; Liaw, 2000; Hopper and Turton, 2001; Yang et al., 2002) have found that if prior knowledge exists or can be generated at a low computational cost, seeding GAs with good initial estimates may generate better solutions with faster convergence. The seeding of a GA with good solutions is not a new idea: Grefenstette (1987) discussed methods and demonstrated the value of incorporating problem-specific knowledge into the GA mechanism, including seeding the population. Louis (1997) found that seeding the GA population with known good solution from case-based reasoning was a feasible approach. They implemented the scheme for the open-shop re-scheduling problem and found that the performance of GA was consistently better than a randomly seeded GA. Oman and Cunningham (2001) experimented with seeding for the travelling salesman problem (TSP) and the job-shop scheduling problem (JSSP), two benchmark tasks for evolutionary algorithms. They seeded the GA with known good solutions in the initial population of the GA and found that the results were significantly improved on the TSP but not on JSSP. Interestingly, they used a varying percentage of seeding, from 25% to 75% and the result for each was remarkably similar although the authors do point out that a 100% seed was not very successful on either problem. The authors also found that the best-quoted results on the TSP were discovered when the seeded solutions incorporated some heuristic element, as well as information from the overall problem definition. Therefore, it follows that a heuristic-based approach to seeding a GA should yield performance enhancements on difficult problems and this forms the basis of the proposed approach.

In recent years, a new kind of algorithm called cellular automata (CAs) has emerged and been widely applied to distributed computing and spatially distributed problems, such as the simulation of physical systems, traffic flows (Emmerich and Rank, 1995, Emmerich and Rank, 1997) and a variety of other applications (Bandani et al., 2001; Toffoli and Margolus, 1987). In a recent publication (Keedwell and Khu, 2004), we described the use of a CA-inspired approach (Cellular Automaton for Network Design Algorithm, CANDA) for the design of water distribution systems, the results of which showed that CANDA can provide good solutions requiring only a very small number of network simulations. However, the CANDA approach alone is not necessarily amenable to being used in the traditional optimisation domain as it does not use typical performance metrics throughout the algorithm run. Without these metrics, it is difficult to direct the search procedure beyond subtly manipulating the rule set of the algorithm. In addition to this, a problem that WDN designers typically face is the limited time that can be spent on design. A pragmatic design usually involves running the network simulator for a limited number of runs especially for large networks involving thousands of pipes as decision variables. GAs, whilst efficient, usually require many thousands of network simulations in an optimisation. Hence, the difficulty of the WDN design problem is to balance the number of network simulations with a level of good design solutions. In light of these facts, we propose a combined CA and GA approach (herein known as CANDA-GA) to address this problem.

In order for the readers to understand the problem faced by WDN designers, a brief description of WDN design is outlined in the next section, followed by short descriptions of GA and CA. The concept of seeding is discussed followed by the proposed CANDA-GA algorithm. The results of applying CANDA-GA on the two-loop network (Alperovits and Shamir, 1977), and two real-world design problems show the advantages of this approach over solely using a GA. This is followed by a general discussion of the results and conclusions.

A water distribution system typically consists of an array of pipes, pumps, valves and other appurtenances. The flows through a water distribution system are governed by complex, non-linear, non-convex and discontinuous hydraulic equations. Water distribution systems can be modelled and simulated through the combined use of the conservation of flow and energy equations. The conservation of energy equations applied to each independent loop of the water distribution system thus constitute a system of non-linear equations.

Assuming water is incompressible, the general expression for the conservation of flow at each node in the network is (Mays and Tung, 1992)Qin-Qout=Qexternal,where Qin and Qout are the pipe flows into and out of the node, respectively, and Qexternal is the external demand or supply at the node.

The conservation of energy equation is required for each loop in the network as given byhL-Hpump=0.Head loss can be related to flow using the expressionhL=KQn,where hL is head loss, K the head loss coefficient, Q the flow, and n the exponent.

The Darcy–Weisbach equation, the Hazen–Williams and Manning empirical equations may be used for computing the friction head losses in pressure pipes which normally represent the most significant element in the determination of distribution of flow in pipe networks. Computational methods such as those of Hardy Cross and Newton–Raphson, and linear theory methods may be used for analysing flow in pipe networks.

As mentioned previously, a variety of computational algorithms exist for the optimum design and rehabilitation of WDNs. In an optimum design problem, the objective is to design a completely new network given a set of costs, demands and other requirements of the network. In the case of a rehabilitation problem, the objective is to propose alternatives or alterations to the existing network in order to meet new criteria that have arisen through its lifetime. In both cases, the algorithm decision variables are the sizes of pipes at a variety of locations in the network. The field of optimisation has primarily focussed on using new algorithms to improve a system by reducing the monetary outlay required to achieve the requisite properties of the network. The pipe layout, the node connectivity, demands of the system and minimum pressure head requirements are typically assumed to be known. Readers are referred to Rossman (1999) or Walski et al. (2001) for more information on WDN modeling, and Alperovits and Shamir (1977), Quindry et al. (1981) or Goulter (1992) on optimal design of WDNs.

The problems facing the optimal design of WDNs are huge; they belong to a class of problems known as NP-hard problems, where the problem is intractable and it is not practical to perform a full enumeration using any rigorous algorithm. For this reason, there are many examples of algorithms passing from artificial intelligence to the optimisation domain. For instance, a network with 12 pipes and 8 potential pipe diameters has 812 possible pipe diameter combinations, which constitute the search space of the problem. Even this very modest network would require an exhaustive search algorithm a considerable amount of time to navigate the entire search space of 68,719,476,736 potential solutions. It is clear that more intelligent methods are required to solve these problems, and these recently have taken the form of GAs.

The history of the GA can be traced back to the late 1970s in the work of Holland (1975) as models of evolution. Their popularity has steadily grown since then and there are now a large number of applications even just within engineering. There have been numerous advances in these algorithms which have benefited the field of optimisation, progressing from the early single-objective algorithm to multiple-objective algorithms (Fonseca and Fleming, 1995) that allow network designers a variety of options when designing a WDN. The GA approach uses a population of individual solutions that iterate from one generation to the next as the search progresses. The performance of each of the solutions is evaluated by an “objective function” which relates the solution variables to the problem at hand. Typically, in WDN optimisation problems, the solution decision variables make changes to the network which is then simulated by a network simulator (such as EPANET, Rossman, 1999). To proceed from one generation to the next, the algorithm uses crossover and mutation operators to generate new solutions and a selection operator to choose which individuals survive into the next generation. By using these mechanisms, the GA is able to quickly traverse the search space whilst avoiding local minima and proceeding to a near-optimal answer. Despite their success, GAs are generally criticised in two main areas of their operation:

  • 1.

    They find different answers to problems depending on their starting position in the search space. This is a problem with all stochastic algorithms as they use random starting points and variables during the optimisation, and therefore two optimisation runs with two random seeds are never the same.

  • 2.

    They are population based and therefore require a large number of objective function evaluations to solve a problem. A typical GA run will use a population size of 100 and run for 1000 generations. Depending on the algorithm used, this can require up to 100,000 or more objective function evaluations (network simulations). This scale of the required computational effort may be large and impractical in many cases.

The proposed CANDA-GA approach goes some way to addressing these concerns by reducing the need for large numbers of generations and also reducing the variability of the GA with different random seeds.

CANDA-GA makes use of a computational method known as CA (Von Neumann, 1966). A CA consists of an interconnected set of nodes (often in regular formation) that use a number of rules to update the state of every node according to the states of neighbouring nodes. These rules and states are normally dependent on the problem being solved, as is the size neighbourhood on which they operate. The neighbourhood defines how many surrounding nodes are taken into account before updating the state of the node in question. An important feature of the CA is that updates for every node are performed in parallel; therefore, in one iteration every node updates its state depending on those surrounding it in the previous iteration.

Traditional optimisation algorithms are driven by global performance, for instance, in the GA, the objective function determines the optimality of a solution compared to the others in the population. This is often determined in WDN optimisation as a combination of hydraulic and cost parameters; an example fitness function could befitness=a(TotalHeadDeficit)+b(Cost),where a and b are constant multipliers, TotalHeadDeficit yields a measure of the violation of the hydraulic criteria set for the problem and Cost is the monetary cost associated with the current solution. However, CAs do not have an objective function such as this and are concerned only with the execution of rules at a local level.

From an optimisation point of view, CAs possess three additional key properties in their execution:

  • 1.

    Parallelism: Updates of each cell state are completed in parallel, and each of the changes to pipe diameters occurs at once. This factor is vital for optimising large WDNs as will be seen in later sections.

  • 2.

    Localist representation: Determines that when a node is updated, its new state is based solely on the old state of the node and of those of its nearest neighbours. Localism is the mechanism by which parallelism can benefit performance in combinatorial problems such as this.

  • 3.

    Homogeneity: Determines that each node is updated according to the same rules. This is important for treating each area of the WDN with the same degree of importance as any other. This homogeneity is also present in other algorithms such as GAs due to their lack of problem-specific knowledge.

In a previous publication (Keedwell and Khu, 2003), the CANDA approach was tested on the three WDNs also used in this paper, the two-loop network (Alperovits and Shamir, 1977) and two large real-world networks (Networks A and B) from the United Kingdom. The task for CANDA was to create an optimal design given a least-cost design as the starting point (where each of the pipe diameters is smallest). The search spaces involved are large, and each space consists of 812 , 63520 and 127720 possible pipe diameters, respectively, for the two-loop, Network A and Network B. In experimentation, the CANDA approach did not produce a better ultimate solution when compared with the GA on smaller problems. However, very good approximate solutions were obtained for an extremely small number of network simulations (<1% of the GA network simulations). Therefore CANDA was shown to be a quick approach to estimate a final solution set. To some degree, every optimisation algorithm, with the exception of a full enumeration, trades some optimality for a saving in computational time. The CANDA approach is no different except that it takes this one step further than the GA. The results are not optimal, but the computation required is significantly less than other methods.

As described previously, the initial population influences the path that the GA will take to the near-optimal final solution, and therefore it is an important part of the algorithm. Therefore, if a search algorithm can discover a set of more optimal initial solutions than random, then the GA performance can be expected to increase. The difficulty with this approach is that the technique used to search the space to discover solutions in the first instance must be more efficient than the GA for the seeding to be effective. Therefore, the seeding of a GA has to be completed with a minimal number of model evaluations whilst representing the best benefit to the algorithm. To accomplish this, a hybrid approach is proposed that utilises the initial power of CANDA without sacrificing the ability of the GA to find solutions that match or exceed the exact requirements of the optimisation. The major advantage that CANDA has over other search techniques is that the number of model evaluations incurred is very small and therefore it is ideal for the seeding of an algorithm such as the GA. In this paper, we consider only the seeding of a GA due to its population basis and because they currently represent a state-of-the-art method for designing WDNs. There is no practical reason, however, why CANDA could not be used to seed other search methods for this problem.

Section snippets

Algorithm overview

A detailed explanation of the operation of the CANDA approach can be seen in Keedwell and Khu (2003). However, a short explanation of the heuristic CANDA method will be given here to aid understanding of the CANDA-GA approach.

The CANDA algorithm works by considering a WDN as a special form of CA; it cannot be definitively considered a CA because the nodes are connected by variable length and diameter links, and the arrangement of them is not regular. Despite this, nodes and links in CANDA

Experimentation

This section describes a set of experiments on three WDN design problems, one taken from the literature and two actual WDNs. The aim of the experiments is to compare a standard GA with the same algorithm in CANDA-GA. The experiments are run in exactly the same fashion for all networks, and are as follows:

  • 1.

    The GA random seed is set to be one of five alternatives, either 1, 12, 123, 1234 or 12345.

  • 2.

    A GA of population 100 (roulette wheel selection, one point crossover, 0.9 mutation and crossover

Discussion

The cellular automaton approach has been found to be a useful tool in its own right when considering the design of WDNs (Keedwell and Khu, 2004). However, whilst it finds reasonable solutions in exceptional model evaluation time, there will conceivably be scenarios where a result which meets a given specification is required. The CANDA-GA approach combines the best of the two algorithms in that CANDA is used to find globally good solutions in the first instance and then the GA is used to

Conclusions

A novel CA-based approach to the seeding of GAs for WDN design optimisation problems has been described. The approach uses a combination of CA and GA technologies to yield improved solutions for the lifetime of the optimisation up to 100,000 generations. The drawbacks of using this approach are few, and the benefits are such that optimisation runs can either be made shorter to achieve a given goal or discover better results in a fixed timeframe. This principle has been shown to work using

Acknowledgements

This research was funded by a UK EPSRC Grant GR/R73393.

The authors wish to thank Godfrey Walters for provision of the industry networks, and also to Prasad Tumula for his assistance in implementing them.

References (25)

  • Emmerich, H., Rank, E., 1995. Analyzing traffic flow by a cellular automaton. Proceedings of the EUROSIM 1995, Vienna,...
  • C.M. Fonseca et al.

    An overview of evolutionary algorithms in multiobjective optimisation

    Evolutionary Computation

    (1995)
  • Cited by (109)

    View all citing articles on Scopus
    View full text