Elsevier

Neurocomputing

Volume 154, 22 April 2015, Pages 200-207
Neurocomputing

A geometric semantic genetic programming system for the electoral redistricting problem

https://doi.org/10.1016/j.neucom.2014.12.003Get rights and content

Abstract

Redistricting consists in dividing a geographic space or region of spatial units into smaller subregions or districts. In this paper, a Genetic Programming framework that addresses the electoral redistricting problem is proposed. The method uses new genetic operators, called geometric semantic genetic operators, that employ semantic information directly in the evolutionary search process with the objective of improving its optimization ability. The system is compared to several different redistricting techniques, including evolutionary and non-evolutionary methods. The simulations were made on ten real data-sets and, even though the studied problem does not belong to the classes of problems for which geometric semantic operators induce a unimodal fitness landscape, the results we present demonstrate the effectiveness of the proposed technique.

Introduction

The zone design problem (also known as redistricting) is the process of dividing a geographic space or region of spatial units into smaller subregions or districts. Probably, the most well-known instance of the zone design problem is the electoral redistricting problem. As reported by Bação et al. [1], electoral redistricting consists in the partitioning of areal units, generally administrative units, into a predetermined number of zones (districts) such that the units in each zone are contiguous, each zone is geographically compact and the sum of the populations of the areal units in any district is as similar as possible in all the districts or lies within a predetermined range [2]. Because of the spatial nature involved in constraints, redistricting is usually seen as a type of spatial clustering. Due to its NP-completeness, the electoral redistricting problem is considered a complex problem, and heuristic techniques seem to provide the best solutions. In this paper, we propose the use of Genetic Programming (GP) [5] to address the electoral redistricting problem. In particular, we use recently defined genetic operators, called geometric semantic genetic operators [11], that allow us to integrate semantic awareness in the evolutionary process.

One of the strongest points of geometric semantic operators [11] is that, using semantic information, they are able to induce, by construction, a unimodal fitness landscape on all problems consisting in matching sets of input data into known targets (like supervised learning problems, such as regression or classification). Geometric semantic operators have been used so far on many different symbolic regression problems (including several complex real-life applications [8], [19], [20], [21]), generally with excellent results, and the fact that the fitness landscape is unimodal is often used as an argument to justify those results [8], [19], [20], [21]. In this paper, for the first time, we apply geometric semantic operators to a problem that does not belong to that class: the objective of the application studied here, in fact, is not matching sets of input data into known targets. Thus, nothing can make us believe that the fitness landscape induced by those operators is unimodal in our case. Interestingly, the results we present demonstrate that geometric semantic operators have a beneficial effect on the search process also in this case. This fact hints that geometric semantic operators may be useful not only in single-objective supervised learning problems, but also in a larger class of problems, and possibly the justification for their effectiveness goes beyond the fact that they may induce unimodal fitness landscapes. This issue opens the door to future investigation and deserves to be deepened in the future.

The paper is organized as follows: Section 2 describes the redistricting problem and its constraints; Section 3 presents the standard GP algorithm and the canonical (syntax-based) genetic operators; Section 4 defines the geometric semantic operators used in this paper. Section 5 describes the representation of the candidate solutions, the fitness function and how the geometric semantic operators have been used in this work. Section 6 presents the experimental settings and the obtained results. Here, a comparison between the proposed framework and several existing redistricting techniques is also presented. Finally, Section 7 concludes the paper and suggests ideas for possible future research.

Section snippets

Redistricting problem: Definition and constraints

In redistricting problems, the aim is to aggregate n geo-spatial regions into c partitions (or districts) subject to some constraints. The most well-known application of the redistricting problem is the electoral redistricting problem. Here, the objective is to create districts usually by grouping smaller administrative units.

The constraints that define a “good” electoral redistricting plan are as follows: 1. All the districts should be equal in population, 2. Each district should be a single

Genetic programming

In this work, the electoral redistricting problem has been addressed using a GP based system. GP [5] is one of the youngest paradigms inside the computational intelligence research area called Evolutionary Computation (EC) and consists in the automated learning of computer programs by means of a process mimicking Darwinian evolution. GP evolves computer programs, traditionally represented as tree structures. Trees represent candidate solutions for the problem at hand and they can be easily

Geometric semantic operators

Recent research in GP have been dedicated to an aspect that was only marginally considered up to some years ago: the definition of methods based on the semantics of the solutions [7], [8], [9], [10]. Although there is no universally accepted definition of semantics in GP, this term often refers to the behavior of a program, once it is executed on a set of data. For this reason, in many references, including here, the term semantics refers to the vector of outputs a program produces on the

Methodology

In this section, we describe how a candidate solution is represented in the proposed GP system. Similar to what has been done in previous works [15], [16], in our system an individual is formed by c trees, each of them associated with one of the clusters that are to be formed. The ith individual of the population is denoted as Ii and each of its trees as Tik. The output of Ii is a vector containing the outputs of all its trees, Ii(x¯)=[T1i(x¯),,Tci(x¯)], where x¯ denotes the vector of

Experimental settings and results

In this section, we experimentally validate the proposed semantic GP-based technique. In particular, we apply the proposed algorithm to 10 instances of the redistricting problem described in Table 1. Data are related to the US census tract data set (year 2000). More in detail, we compare the performance achieved using the proposed system against the following methods: graph partitioning algorithm (Graph) [18], simulated annealing redistricting algorithm (SARA) [17], genetic algorithm (GA) for

Conclusions

In this work a Genetic Programming system has been proposed for addressing the electoral redistricting problem. The proposed system uses recently defined geometric semantic genetic operators to improve the search process. Even though those operators had originally been introduced for supervised learning problems, consisting in matching input data into known targets, where it is possible to show that they always induce a unimodal fitness landscape, some considerations about the characteristics

Acknowledgements

The authors acknowledge projects EnviGP (PTDC/EIA-CCO/103363/2008), MassGP (PTDC/EEI-CTP/2975/2012) and InteleGen (PTDC/DTP-FTO/1747/2012), FCT, Portugal.

Mauro Castelli received the Master׳s degree (Laurea) in computer science from the University of Milano Bicocca, Milan, Italy, in 2008 (“summa cum Laude”), and the Ph.D. degree from the University of Milano Bicocca in 2012. His Ph.D. thesis presented contributions in the field of evolutionary computation and, in particular, genetic programming.

From February 2012 to February 2013, he had a Postdoctoral position at INESC-ID, Lisbon, Portugal. Since March 2013, he has been an Invited Assistant

References (21)

There are more references available in the full text version of this article.

Cited by (6)

  • A binary-constrained Geometric Semantic Genetic Programming for feature selection purposes

    2017, Pattern Recognition Letters
    Citation Excerpt :

    However, the main contribution of this work is related to the Geometric Semantic Genetic Programming (GSGP) technique [20], which encodes the semantic (meaning) of individual trees when performing mutation and crossover operations. GSGP has been employed to a number of problems very recently, such as electoral redistricting problem [6] and real-life applications [35]. One strong point of geometric semantic operators concerns their ability in inducing unimodal fitness landscapes on some problems where one knows the matching between the input and the output data.

  • An introduction to geometric semantic genetic programming

    2017, Studies in Computational Intelligence
  • Automatic random tree generator on FPGA

    2017, Studies in Computational Intelligence
  • Semantic geometric initialization

    2016, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
  • Geometric semantic genetic programming is overkill

    2016, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Mauro Castelli received the Master׳s degree (Laurea) in computer science from the University of Milano Bicocca, Milan, Italy, in 2008 (“summa cum Laude”), and the Ph.D. degree from the University of Milano Bicocca in 2012. His Ph.D. thesis presented contributions in the field of evolutionary computation and, in particular, genetic programming.

From February 2012 to February 2013, he had a Postdoctoral position at INESC-ID, Lisbon, Portugal. Since March 2013, he has been an Invited Assistant Professor with ISEGI, Universidade Nova de Lisboa, Lisbon, Portugal.

His main research interests are in the field of artificial intelligence (in particular, evolutionary computation and genetic programming) and in the application of machine-learning techniques to solve complex real-life problems, especially in the field of biology and medicine.

Roberto Henriques is a Visiting Assistant Professor at ISEGI, Universidade Nova de Lisboa. He is a director of the Degree in Systems and Information Technology and researcher at the Centre for Studies in Information Management (CEGI). He has a Ph.D. in Information Management from Universidade Nova de Lisboa, has a master׳s degree in Science and Geographic Information Systems by ISEGI-NOVA and degree in Biophysics Engineering from the University of Évora. His research focuses on the analysis of geospatial data using techniques of Artificial Intelligence and Data Mining.

Leonardo Vanneschi received the Master׳s degree (“Laurea”) in computer science from the University of Pisa, Pisa, Italy, in 1996 (“summa cum Laude”), and the Ph.D. degree from the University of Lausanne, Lausanne, Switzerland, in 2004. His Ph.D. thesis was honored with the excellence award of the Science Faculty of the University of Lausanne. Since September 2011, he has been an Assistant Professor with ISEGI, Universidade Nova de Lisboa, Lisbon, Portugal. His main research interests include computational intelligence and the study of complex systems, with particular focus on evolutionary computation and genetic programming. Dr. Vanneschi is an Associated Editor of a scientific journal in the area, and a member of the editorial board of two scientific journals and of the steering committee and program committee of various international conferences. Until February 2013, he has published 130 scientific contributions, among which nine have been honored with international awards.

View full text