A novel Error-Correcting Output Codes algorithm based on genetic programming

https://doi.org/10.1016/j.swevo.2019.100564Get rights and content

Highlights

  • A novel ECOC algorithm is proposed based on Genetic Programming.

  • Each tree-structure individual represents a solution for a multiclass problem.

  • A new mutation operator is designed to maintain the legality of the individual.

  • A local optimization algorithm tackles tough classes by adding proper columns.

  • A new decoding method is proposed to reduce errors in the decoding process.

Abstract

Error-Correcting Output Codes (ECOC) is widely used in the field of multiclass classification. As an optimal codematrix is key to the performance of an ECOC algorithm, this paper proposes a genetic programming (GP) based ECOC algorithm (GP-ECOC). In the design of individual of our GP, each terminal node represents a class, and nonterminal nodes combine the classes in their child nodes. In this way, an individual is a class combination tree, and represents an ECOC codematrix. A legality checking process is embedded in our algorithm to check each codematrix, so as to ensure each codematrix satisfying ECOC constraints. Those violating the constraints will be corrected by a proposed Guided Mutation operator. Before fitness evaluation, a local optimization algorithm is proposed to append new columns for tough classes, so as to improve the generalization ability of each individual and accelerate the evolutionary speed. In this way, our GP can evolve optimal codematrices through the evolutionary process. Experiments show that compared with other ensemble algorithms, our algorithm can achieve stable and high performances with relatively small ensemble scales on various UCI data sets. To the best of our knowledge, it is the first time that GP has been applied to implement the ECOC encoding algorithm. Our Python code is available at https://github.com/samuellees/gpecoc.

Introduction

In the pattern recognition research field, the multi-class classification problem refers to the task of classifying instances into one of N classes, where N > 2. Due to the effect of class overlapping and class imbalance, the multiclass classification problem still remains to be a tough task nowadays [1].

Instead of tackling multiple classes directly, an alternative solution is the divide and conquer method. That is, a multiclass problem is split into several binary class problems firstly, and then the solutions for these problems are combined to produce a final decision for the original problem. The most widely used framework for this strategy is Error-Correcting Output Codes (ECOC) [2], which originates from the error correcting code scheme in digital communication field [3]. With a proper coding scheme, an ECOC algorithm promises good performances in the multiclass classification tasks, so it has been applied to various fields successfully, such as traffic sign recognition [4], face recognition [5], and action recognition [6].

The key to an ECOC algorithm is the generation of discriminant codematrix in the encoding process. However, it was proved in Ref. [7] that the design of an optimum codematrix is a NP problem. So many researchers had been attracted to this researcher field, and tried to develop various encoding strategies to produce optimal codematrices. As a famous optimization algorithm, Genetic Algorithm (GA) had been successfully applied to optimize codematrices by treating each individual as a codematrix, aiming to minimize the size of codematrices without losing generalization ability [8].

In contrast to the wide use of GA in ECOC framework, no Genetic Programming (GP) based ECOC algorithm has been proposed till now. Compared with the linear structure of GA, the tree structure offers GP higher flexibility, especially in the pattern recognition field [9,10]. Nowadays, GP had been applied to solve diverse problems successfully, including the design of classifier for both semi-supervised learning [11] and supervised learning [12,13], the fusion of base learners [14], feature selection [15,16] and feature extraction [12,17]. Inspired by its significant power, this paper proposes a novel GP based ECOC algorithm (GP-ECOC for short).

In our GP framework, the terminal set is the original class set, and the nonterminal set contains two units: the combination unit and the feature selection unit. With this setting, a nonterminal node combines the classes included in its child nodes, and assigns a proper feature subspace for the associated classifier (dichotomizers). So each individual consists of a set of class partition schemes, and represents an ECOC codematrix. To ensure there is no codematrices breaking ECOC codematrix constraints, a legality check process is carried out right after the generation of new individual. Individuals violating the ECOC constraints are illegal, and will be corrected by the proposed guided mutation operator. With F-score index used as the fitness function, individuals’ fitness values are calculated through an ECOC decoding process. Furthermore, to accelerate the evolutionary speed, an effective local optimization algorithm is designed to add proper columns to a codematrix, aiming to tackle tough classes and to enhance the generalization ability of each individual. In the GP, individuals undergo the crossover and mutation operations to produce new generation, the population evolves towards optimal codematrices. To verify our algorithm’s performance, experiments are carried out on twelve UCI data sets, and results reveal that compared with ten ECOC algorithms and five state-of-art ensemble learning algorithms, our GP-ECOC can achieve satisfactory performance with compact ensemble size.

The rest of this paper is organized as follows: Section 2 introduces the background of ECOC and GP. Section 3 describes the framework of our GP, and gives the details about the design of genetic operators and the local optimization algorithm. In section 4, experiments are carried out to compare our algorithm with some widely deployed ensemble algorithms on UCI data sets. Experimental results are analyzed along with discussions. Finally, section 5 concludes this paper and points out the future work direction.

Section snippets

Background of ECOC

This section reviews ECOC. To clarify our statements, the definitions of symbols used in this study are listed in Table 1.

ECOC has been widely studied since it was proposed in 1994 [18]. Up to now, a lot of novel ECOC algorithms have been proposed based on different principles [19]. In general, an ECOC algorithm contains two steps: the encoding step and the decoding step. The encoding step refers the process of decomposing a multiclass problem into a set of binary problems, and the

The design of GP-ECOC

GP is a widely deployed evolutionary algorithm, proposed by Koza [43]. Up to now, it has been successfully applied to different optimization problems in diverse fields, such as interpreting reinforcement learning policies [44] and different types of knowledge discovery [45,46] by treating individuals as symbolic expressions. And some GP based learning algorithms treated individuals as learners, so as to applied to rainfall prediction [47], building ensemble learning systems [48] and outlier

Data sets

Twelve UCI data sets [54] are used to verify the effectiveness of GP-ECOC, as listed in Table .2. There are some classes with extreme small sample size (less than 9) in the Ecoli, Yeast, and Zoo data. Such classes are deleted, because they cannot reveal any meaningful statistical properties. After this step, all data sets are divided into three parts (the training set, validation set and test set) in the proportion of 2:1:1.

Deployed methods

F-Test, Random Forest, SVM-REF and BSSWSS are applied as feature

Conclusion

In this paper, we proposed a novel GP based ECOC algorithm, aiming to evolve an optimal ECOC codematrix. In this framework, each individual represents a set of class decomposition schemes, and it will be mapped to a codematrix when evaluating its fitness function. We propose some strategies to maintain the legality of each individual, and design a new mutation operator to correct the illegal ones. F-score is applied as the fitness function, and a local optimization algorithm is proposed to

Acknowledgment

This work is supported by National Natural Science Foundation of China (No. 61772023 and 61502402), National Key R&D Program of China (No. 2019QY1803), Natural Science Foundation of Fujian Province (No. 2015J05129 and 2016J01320), and XMU Training Program of Innovation and Entrepreneurship for Undergraduates (No. 2017X0331).

References (63)

  • J. Zhou et al.

    Data-driven decomposition for multi-class classification

    Pattern Recognit.

    (2008)
  • M.A. Bagheri et al.

    A genetic-based subspace analysis method for improving Error-Correcting Output Coding

    Pattern Recognit.

    (2013)
  • M. Lachaize et al.

    Evidential framework for error correcting output code classification

    Eng. Appl. Artif. Intell.

    (2018/08/01/2018)
  • K.H. Liu et al.

    A hierarchical ensemble of ECOC for cancer classification based on multi-class microarray data

    Inf. Sci.

    (2016)
  • J. Bi et al.

    An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme

    Knowl. Based Syst.

    (2018/10/15/2018)
  • D. Hein et al.

    Interpretable policies for reinforcement learning by genetic programming

    Eng. Appl. Artif. Intell.

    (2018/11/01/2018)
  • I.L.S. Russo et al.

    Knowledge discovery in multiobjective optimization problems in engineering via Genetic Programming

    Expert Syst. Appl.

    (2018/06/01/2018)
  • N.K. Lee et al.

    A comprehensive survey on genetic algorithms for DNA motif prediction

    Inf. Sci.

    (2018/10/01/2018)
  • S. Cramer et al.

    Stochastic model genetic programming: deriving pricing equations for rainfall weather derivatives

    Swarm and Evolutionary Computation

    (2019/05/01/2019)
  • C. De Stefano et al.

    Using Bayesian networks for selecting classifiers in GP ensembles

    Inf. Sci.

    (2014/02/10/2014)
  • I.H. Lee et al.

    Adaptive outlier elimination in image registration using genetic programming

    Inf. Sci.

    (2017/12/01/2017)
  • Y. Liang et al.

    Image feature selection using genetic programming for figure-ground segmentation

    Eng. Appl. Artif. Intell.

    (2017/06/01/2017)
  • W.J.d.A. Lobão et al.

    Solving stochastic differential equations through genetic programming and automatic differentiation

    Eng. Appl. Artif. Intell.

    (2018/02/01/2018)
  • E.L. Allwein et al.

    Reducing multiclass to binary: a unifying approach for margin classifiers

    J. Mach. Learn. Res.

    (2001)
  • T.G. Dietterich et al.

    Solving multiclass learning problems via error-correcting output codes

    J. Artif. Intell. Res.

    (1995)
  • X. Baro et al.

    Traffic sign recognition using evolutionary adaboost detection and forest-ECOC classification

    IEEE Trans. Intell. Transp. Syst.

    (2009)
  • J. Qin

    Zero-shot action recognition with error-correcting output codes

    IEEE Conference on Computer Vision and Pattern Recognition

    (2017)
  • K. Cramme et al.

    On the learnability and design of output codes for multiclass problems

    Mach. Learn.

    (2002)
  • P.G. Espejo et al.

    A survey on the application of genetic programming to classification

    IEEE Transactions on Systems, Man, and Cybernetics, Part C

    (2010)
  • A. Majid

    Optimization of Classifiers Using Genetic Programming

    (2016)
  • W. La Cava et al.

    Multidimensional genetic programming for multiclass classification

    Swarm and Evolutionary Computation

    (2018/04/12/2018)
  • Cited by (16)

    • A novel soft-coded error-correcting output codes algorithm

      2023, Pattern Recognition
      Citation Excerpt :

      On the other hand, many evolutionary algorithm-based approaches had been injected into the ECOC framework, so that the codewords were optimized in the evolutionary process [25]. Some algorithms had been proposed based on genetic algorithm [26], genetic programming [27,28], and the Beam search [29]. These algorithms provide codematrices with higher discriminative ability at the cost of higher computational consumption.

    • Feature space and label space selection based on Error-correcting output codes for partial label learning

      2022, Information Sciences
      Citation Excerpt :

      Furthermore, many evolutionary algorithm-based approaches were injected into the ECOC framework, so that the codewords are optimized in the evolutionary process [1]. Some algorithms had been proposed based on genetic algorithm [33,40], genetic programming [21,48], and the Beam search [45], the ECOC family based dynamic ensemble selection strategy [49]. These algorithms obtain higher discriminative ability with higher computational costs.

    • A novel error-correcting output codes based on genetic programming and ternary digit operators

      2021, Pattern Recognition
      Citation Excerpt :

      And like GA, it provides strong search power. In contrast to the extensive use of GA in the ECOC framework, only one GP based ECOC algorithm was proposed [10]. In this GP framework, the nonterminal set contains two units: the combination unit and the feature selection unit.

    View all citing articles on Scopus
    View full text