Elsevier

Pattern Recognition

Volume 40, Issue 8, August 2007, Pages 2211-2225
Pattern Recognition

Designing a classifier by a layered multi-population genetic programming approach

https://doi.org/10.1016/j.patcog.2007.01.003Get rights and content

Abstract

This paper proposes a method called layered genetic programming (LAGEP) to construct a classifier based on multi-population genetic programming (MGP). LAGEP employs layer architecture to arrange multiple populations. A layer is composed of a number of populations. The results of populations are discriminant functions. These functions transform the training set to construct a new training set. The successive layer uses the new training set to obtain better discriminant functions. Moreover, because the functions generated by each layer will be composed to a long discriminant function, which is the result of LAGEP, every layer can evolve with short individuals. For each population, we propose an adaptive mutation rate tuning method to increase the mutation rate based on fitness values and remaining generations. Several experiments are conducted with different settings of LAGEP and several real-world medical problems. Experiment results show that LAGEP achieves comparable accuracy to single population GP in much less time.

Introduction

Genetic programming (GP) [1], an important evolutionary computation (EC) technique, has developed rapidly in recent years. Researchers have proposed creative ideas to improve the effectiveness and efficiency of GP, such as new fitness functions, new architectures, and new individual expressions.

Traditionally, GP works with a single population. Multi-population GP (MGP) [3], [18], which employs several populations to discover optimal solutions, has been proposed and developed. Many different topologies of MGP have been proposed, such as the circle topology and the random topology. Fig. 1 shows the circle topology where circles stand for populations [3]. An important characteristic of MGP is migration. This means that individuals can be transmitted from one population to another. The arrows in Fig. 1 indicate the migration direction. Fernández et al. [18] performed several experiments with parallel and distributed GP (PADGP), isolated multi-population GP (IMGP), where “isolated” means that there is no migration between populations, and traditional single population GP. Their experiments show that PADGP and IMGP usually obtain better performance than traditional single population GP.

Many classifiers have been developed based on GP in recent years [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [19], [21]. To generate classification rules, Freitas [6] proposed the tuple-set-descriptor (TSD), a logical formula to represent an individual. Kotani and Sherrah [9], [13] used GP to perform feature selection before using other classification methods. Multi-category classification problems are more difficult than two-class classification problems. Kishore et al. [7] and the present authors [4] have considered such a problem as multiple two-class classification problems and then generated corresponding expressions or discriminant functions. These methods need k runs for a k-class classification problem. Muni et al. [12] proposed a novel method to solve k-class classification problems in a single run. Each individual in their work is represented by a multi-tree. Evolving one individual is equivalent to evolving k trees simultaneously. Loveard and Ciesielski [11] proposed five methods for solving multi-category classification problems including binary decomposition, static range selection, dynamic range selection, class enumeration, and evidence enumeration. Brameier and Banzhaf [3] used linear GP and MGP techniques. Individuals are represented as strings and can be transmitted between demes, i.e. subpopulations, according to their fitness value. Tsakonas [21] compares four different structures evolved by GP in several different classification problems.

Using functional expressions to represent individuals is effective in GP [4], [7], [10]. The tree structure is a common data structure for functional expressions. However, two problems occur when GP is employed to generate functional expressions. First, it is difficult to choose appropriate operations for a given problem because characteristics of the problem are completely unknown. If the operator set contains many operations, there is a greater possibility of discovering optimal solutions, but the searching space becomes larger and therefore may become impracticable. Fortunately, as shown in Ref. [7], GP with an operation set comprising only basic arithmetic operations, i.e. {+,-,×,÷}, generates results comparable to that with an operation set comprising additional operations. Second, it is difficult to know the proper length of an individual because there is no prior knowledge about optimal solutions. The predefined individual length, as the length of a string-expression individual or the number of available nodes of a tree-expression individual, is usually chosen according to heuristic or empirical assumptions. The following is an example of a classification problem containing 64 dimensional data, i.e. a training instance x is represented by x=(a1,a2,,a64). Suppose that an optimal solution F is known as F=i=164ai. F can be represented as a skew binary tree with a height of 64 or a balanced binary tree with a height of seven, as shown in Fig. 2. An individual can contain at most 264-1 nodes if the predefined maximum depth is 64. A population containing so many large trees is highly complex and is thereby impracticable. On the other hand, if the predefined maximum depth is fixed at seven, it is very difficult to generate the ideal balanced tree. Moreover, the function F will never be obtained if the maximum depth is less than seven.

Using an acceptable and practicable individual size is a simple but dangerous way to avoid this problem. This problem has motivated us to develop this work. Since a long function can be viewed as a composition of a number of small functions, it is possible to combine a number of small GP solutions into a large one. Therefore, it is desirable to generate those small solutions with a practicable size of individuals and then use them to compose a larger solution. For example, consider the above function F and two functions B=i=132ai and C=i=3364ai. Clearly, F can be represented as (B×C), as shown in Fig. 3, where the tree representations of B and C have at most a height of 32 rather than 64. Functions B and C can be generated by two separate GPs and then are combined together to form F. Here we attempt to develop a method by which we can determine a proper node to combine small functions, for example, the shaded × operation in Fig. 3.

The method proposed in this paper is called layered genetic programming (LAGEP). It is a method based on MGP. LAGEP arranges populations in a layered architecture. Populations in the same layer evolve with identical training set and store the results of their best individuals into a dataset; this dataset becomes a new training set for the successive layer. After all layers have finished the evolution process, the output of the final layer is used as the result of LAGEP.

The rest of this paper is organized as follows. Section 2 describes the details of LAGEP. Section 3 presents and discusses the experimental results on selected classification problems. Conclusions are drawn in Section 4.

Section snippets

Proposed LAGEP method

LAGEP is based on multi-population method. In this section, we at first describe the design of each single population including a mutation weight tuning method. Then the design of LAGEP and the benefits of it are explained. The test phase and conflict problem are addressed afterward. Finally, an example demonstrates LAGEP.

Experiments

In this section we describe the experiments and analyze classification results. To conduct the experiments described in this section, we developed a system based on the LAGEP Project [23] executed under an ACER VT7600GL, which is equipped with 3.0 GHz processor and 1.5 GB memory.

Conclusions and future work

In this paper, we propose a MGP method, LAGEP. LAGEP arranges a number of populations into a layer. Every layer evolves its populations to generate a set of discriminant functions. These functions transform the training set to a new training set, which is used for successive layer. The evolution process of every population is efficient because it evolves with short individuals. We also proposed a method to prevent falling into a local optimum for a long time called AMRT.

Experiment results show

Acknowledgments

We would like to express our appreciation to the anonymous reviewers for their useful suggestions and revision. We also wish to thank Dr. Hsinchun Chen and Jiexun Li for many helpful discussions and comments.

About the Author—JUNG-YI LIN was born in Taitung, Taiwan. He received the M.S. degree in Computer Science and Information Engineering from I-Shou University in 2002. He is currently a Ph.D. candidate in Computer Science, National Chiao Tung University, HsinChu, Taiwan. Lin is currently a visiting scholar at Artificial Intelligence Lab, Department of MIS, University of Arizona, Arizona, USA. His research interests include machine learning, data mining, and knowledge discovery.

References (20)

  • B.C. Chien et al.

    Learning effective classifiers with Z-value measure based on genetic programming

    Pattern Recognition

    (2004)
  • A. Tsakonas

    A comparison of classification accuracy of four genetic programming-evolved intelligent structures

    Inf. Sci.

    (2006)
  • J.R. Koza

    Genetic Programming: On the Programming of Computers by Means of Natural Selection

    (1992)
  • W. Banzhaf et al.

    Genetic Programming: An Introduction on the Automatic Evolution of Computer Programs and Its Application

    (1998)
  • M. Brameier et al.

    A comparison of linear genetic programming and neural networks in medical data mining

    IEEE Trans. Evol. Comput.

    (2001)
  • I. De Falco et al.

    Discovering interesting classification rules with genetic programming

    Appl. Soft Comput.

    (2002)
  • A. Freitas, A genetic programming framework for two data mining tasks: classification and generalized rule induction,...
  • J.K. Kishore et al.

    Application of genetic programming for multicategory pattern classification

    IEEE Trans. Evol. Comput.

    (2000)
  • A. Konstam

    Group classification using a mix of genetic programming and genetic algorithms

  • M. Kotani et al.

    Emergence of feature extraction function using genetic programming

There are more references available in the full text version of this article.

Cited by (46)

View all citing articles on Scopus

About the Author—JUNG-YI LIN was born in Taitung, Taiwan. He received the M.S. degree in Computer Science and Information Engineering from I-Shou University in 2002. He is currently a Ph.D. candidate in Computer Science, National Chiao Tung University, HsinChu, Taiwan. Lin is currently a visiting scholar at Artificial Intelligence Lab, Department of MIS, University of Arizona, Arizona, USA. His research interests include machine learning, data mining, and knowledge discovery.

About the Author—HAO-REN KE was born on June 29, 1967 in Taipei, Taiwan, Republic of China. He received the B.S. degree in 1989 and his Ph.D. degree in 1993, both in Computer and Information Science, from National Chiao Tung University. Now he is a professor of the Library, and Institute of Information Management, National Chiao Tung University (NCTU). He is also the associate director of the NCTU Library. His research interests include digital library, digital museum, information retrieval, web service, and data mining. He can be contacted at: [email protected].

About the Author—BEEN-CHIAN CHIEN received the Ph.D. in Computer Science and Information Engineering from National Chiao Tung University in 1992. He was an associate professor of the Department of Computer Science and Information Engineering, I-Shou University, Kaohsiung, Taiwan, from 1996 to 2004. Currently, he is a professor and the head of the department of computer science and information engineering, national university of Tainan, Tainan, Taiwan. His current research activities involve machine learning, content-based image retrieval, intelligent information retrieval and data mining.

About the Author—WEI-PANG YANG was born on May 17, 1950 in Hualien, Taiwan. He received the B.S. degree in mathematics from National Taiwan Normal University in 1974, and the M.S. and Ph.D. degrees from the National Chiao Tung University in 1979 and 1984, respectively, both in Computer Engineering. He was a professor of the Department of CSIE and Department of CIS at the National Chiao Tung University, Hsinchu, Taiwan. He was a visiting scholar at the Harvard University and at the University of Washington. He was the Director of the Computer Center of National Chiao Tung University. Dr. Yang is currently the Head of the Department of Information Management and is the Dean of College of Management. His research interests include database theory and application, information retrieval, data miming, digital library, and digital museum.

View full text