A novel Error-Correcting Output Codes algorithm based on genetic programming
Introduction
In the pattern recognition research field, the multi-class classification problem refers to the task of classifying instances into one of N classes, where N > 2. Due to the effect of class overlapping and class imbalance, the multiclass classification problem still remains to be a tough task nowadays [1].
Instead of tackling multiple classes directly, an alternative solution is the divide and conquer method. That is, a multiclass problem is split into several binary class problems firstly, and then the solutions for these problems are combined to produce a final decision for the original problem. The most widely used framework for this strategy is Error-Correcting Output Codes (ECOC) [2], which originates from the error correcting code scheme in digital communication field [3]. With a proper coding scheme, an ECOC algorithm promises good performances in the multiclass classification tasks, so it has been applied to various fields successfully, such as traffic sign recognition [4], face recognition [5], and action recognition [6].
The key to an ECOC algorithm is the generation of discriminant codematrix in the encoding process. However, it was proved in Ref. [7] that the design of an optimum codematrix is a NP problem. So many researchers had been attracted to this researcher field, and tried to develop various encoding strategies to produce optimal codematrices. As a famous optimization algorithm, Genetic Algorithm (GA) had been successfully applied to optimize codematrices by treating each individual as a codematrix, aiming to minimize the size of codematrices without losing generalization ability [8].
In contrast to the wide use of GA in ECOC framework, no Genetic Programming (GP) based ECOC algorithm has been proposed till now. Compared with the linear structure of GA, the tree structure offers GP higher flexibility, especially in the pattern recognition field [9,10]. Nowadays, GP had been applied to solve diverse problems successfully, including the design of classifier for both semi-supervised learning [11] and supervised learning [12,13], the fusion of base learners [14], feature selection [15,16] and feature extraction [12,17]. Inspired by its significant power, this paper proposes a novel GP based ECOC algorithm (GP-ECOC for short).
In our GP framework, the terminal set is the original class set, and the nonterminal set contains two units: the combination unit and the feature selection unit. With this setting, a nonterminal node combines the classes included in its child nodes, and assigns a proper feature subspace for the associated classifier (dichotomizers). So each individual consists of a set of class partition schemes, and represents an ECOC codematrix. To ensure there is no codematrices breaking ECOC codematrix constraints, a legality check process is carried out right after the generation of new individual. Individuals violating the ECOC constraints are illegal, and will be corrected by the proposed guided mutation operator. With F-score index used as the fitness function, individuals’ fitness values are calculated through an ECOC decoding process. Furthermore, to accelerate the evolutionary speed, an effective local optimization algorithm is designed to add proper columns to a codematrix, aiming to tackle tough classes and to enhance the generalization ability of each individual. In the GP, individuals undergo the crossover and mutation operations to produce new generation, the population evolves towards optimal codematrices. To verify our algorithm’s performance, experiments are carried out on twelve UCI data sets, and results reveal that compared with ten ECOC algorithms and five state-of-art ensemble learning algorithms, our GP-ECOC can achieve satisfactory performance with compact ensemble size.
The rest of this paper is organized as follows: Section 2 introduces the background of ECOC and GP. Section 3 describes the framework of our GP, and gives the details about the design of genetic operators and the local optimization algorithm. In section 4, experiments are carried out to compare our algorithm with some widely deployed ensemble algorithms on UCI data sets. Experimental results are analyzed along with discussions. Finally, section 5 concludes this paper and points out the future work direction.
Section snippets
Background of ECOC
This section reviews ECOC. To clarify our statements, the definitions of symbols used in this study are listed in Table 1.
ECOC has been widely studied since it was proposed in 1994 [18]. Up to now, a lot of novel ECOC algorithms have been proposed based on different principles [19]. In general, an ECOC algorithm contains two steps: the encoding step and the decoding step. The encoding step refers the process of decomposing a multiclass problem into a set of binary problems, and the
The design of GP-ECOC
GP is a widely deployed evolutionary algorithm, proposed by Koza [43]. Up to now, it has been successfully applied to different optimization problems in diverse fields, such as interpreting reinforcement learning policies [44] and different types of knowledge discovery [45,46] by treating individuals as symbolic expressions. And some GP based learning algorithms treated individuals as learners, so as to applied to rainfall prediction [47], building ensemble learning systems [48] and outlier
Data sets
Twelve UCI data sets [54] are used to verify the effectiveness of GP-ECOC, as listed in Table .2. There are some classes with extreme small sample size (less than 9) in the Ecoli, Yeast, and Zoo data. Such classes are deleted, because they cannot reveal any meaningful statistical properties. After this step, all data sets are divided into three parts (the training set, validation set and test set) in the proportion of 2:1:1.
Deployed methods
F-Test, Random Forest, SVM-REF and BSSWSS are applied as feature
Conclusion
In this paper, we proposed a novel GP based ECOC algorithm, aiming to evolve an optimal ECOC codematrix. In this framework, each individual represents a set of class decomposition schemes, and it will be mapped to a codematrix when evaluating its fitness function. We propose some strategies to maintain the legality of each individual, and design a new mutation operator to correct the illegal ones. F-score is applied as the fitness function, and a local optimization algorithm is proposed to
Acknowledgment
This work is supported by National Natural Science Foundation of China (No. 61772023 and 61502402), National Key R&D Program of China (No. 2019QY1803), Natural Science Foundation of Fujian Province (No. 2015J05129 and 2016J01320), and XMU Training Program of Innovation and Entrepreneurship for Undergraduates (No. 2017X0331).
References (63)
- et al.
Dynamic ensemble selection for multi-class classification with one-class classifiers
Pattern Recognit.
(2018/11/01/2018) - et al.
Securing templates in a face recognition system using Error-Correcting Output Code and chaos theory
Comput. Electr. Eng.
(2018/02/16/2018) - et al.
Minimal design of error-correcting output codes
Pattern Recognit. Lett.
(Apr 15 2012) - et al.
A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting
Swarm and Evolutionary Computation
(2018/04/01/2018) - et al.
Evolving genetic programming classifiers with novelty search
Inf. Sci.
(2016/11/10/2016) A Genetic Programming approach for feature selection in highly dimensional skewed data
Neurocomputing
(2018/01/17/2018)- et al.
MBCGP-FE: a modified balanced cartesian genetic programming feature extractor
Knowl. Based Syst.
(2017/11/01/2017) - et al.
An incremental node embedding technique for error correcting output codes
Pattern Recognit.
(2008/02/01/2008) - et al.
Design of reject rules for ECOC classification systems
Pattern Recognit.
(2012/02/01/2012) - et al.
One versus one multi-class classification fusion using optimizing decision directed acyclic graph for predicting listing status of companies
Inf. Fusion
(2017)
Data-driven decomposition for multi-class classification
Pattern Recognit.
A genetic-based subspace analysis method for improving Error-Correcting Output Coding
Pattern Recognit.
Evidential framework for error correcting output code classification
Eng. Appl. Artif. Intell.
A hierarchical ensemble of ECOC for cancer classification based on multi-class microarray data
Inf. Sci.
An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme
Knowl. Based Syst.
Interpretable policies for reinforcement learning by genetic programming
Eng. Appl. Artif. Intell.
Knowledge discovery in multiobjective optimization problems in engineering via Genetic Programming
Expert Syst. Appl.
A comprehensive survey on genetic algorithms for DNA motif prediction
Inf. Sci.
Stochastic model genetic programming: deriving pricing equations for rainfall weather derivatives
Swarm and Evolutionary Computation
Using Bayesian networks for selecting classifiers in GP ensembles
Inf. Sci.
Adaptive outlier elimination in image registration using genetic programming
Inf. Sci.
Image feature selection using genetic programming for figure-ground segmentation
Eng. Appl. Artif. Intell.
Solving stochastic differential equations through genetic programming and automatic differentiation
Eng. Appl. Artif. Intell.
Reducing multiclass to binary: a unifying approach for margin classifiers
J. Mach. Learn. Res.
Solving multiclass learning problems via error-correcting output codes
J. Artif. Intell. Res.
Traffic sign recognition using evolutionary adaboost detection and forest-ECOC classification
IEEE Trans. Intell. Transp. Syst.
Zero-shot action recognition with error-correcting output codes
IEEE Conference on Computer Vision and Pattern Recognition
On the learnability and design of output codes for multiclass problems
Mach. Learn.
A survey on the application of genetic programming to classification
IEEE Transactions on Systems, Man, and Cybernetics, Part C
Optimization of Classifiers Using Genetic Programming
Multidimensional genetic programming for multiclass classification
Swarm and Evolutionary Computation
Cited by (16)
Feature Elimination through Data Complexity for Error-Correcting Output Codes based micro-expression recognition
2023, Signal Processing: Image CommunicationA novel soft-coded error-correcting output codes algorithm
2023, Pattern RecognitionCitation Excerpt :On the other hand, many evolutionary algorithm-based approaches had been injected into the ECOC framework, so that the codewords were optimized in the evolutionary process [25]. Some algorithms had been proposed based on genetic algorithm [26], genetic programming [27,28], and the Beam search [29]. These algorithms provide codematrices with higher discriminative ability at the cost of higher computational consumption.
Feature space and label space selection based on Error-correcting output codes for partial label learning
2022, Information SciencesCitation Excerpt :Furthermore, many evolutionary algorithm-based approaches were injected into the ECOC framework, so that the codewords are optimized in the evolutionary process [1]. Some algorithms had been proposed based on genetic algorithm [33,40], genetic programming [21,48], and the Beam search [45], the ECOC family based dynamic ensemble selection strategy [49]. These algorithms obtain higher discriminative ability with higher computational costs.
The design of dynamic ensemble selection strategy for the error-correcting output codes family
2021, Information SciencesA novel error-correcting output codes based on genetic programming and ternary digit operators
2021, Pattern RecognitionCitation Excerpt :And like GA, it provides strong search power. In contrast to the extensive use of GA in the ECOC framework, only one GP based ECOC algorithm was proposed [10]. In this GP framework, the nonterminal set contains two units: the combination unit and the feature selection unit.