Single and multiple outputs decision tree classification using bi-level discrete-continues genetic algorithm
Introduction
Data classification models based on decision tree are extensively used in data analysis applications such as data mining. These applications are for data with both single output and multiple outputs [1]. The reasons for this issue are comprehensible nature, robustness to noise, low computational cost for generating the model, ability to deal with redundant features, and generalization ability. However, most classic decision tree training algorithms are based on locally optimization of impurity measures for tree growth. Here, the optimization process leads to select a feature to be used for branching in each node. Thus, such local behavior for optimization leads to a sub-optimal solution [2].
To solve the problem posed by the classical methods of making a decision tree model, various methods have been proposed by the researchers. Two groups of these improver methods include ensemble learning and metaheuristics. In ensemble learning, the final class of a sample is predicted through a voting scheme based on the classes predicted by different single trees [3]. By this, the comprehensibility of a single tree is lost.
On the contrary, metaheuristic methods are used in two ways. In the first way, they are fed by a set of primitive trees and then they create better trees through different operators. To do so, these operators logically combine branches of the trees to make an improved tree. Fu et al. [4] used genetic algorithm (GA) to combine some individual trees each of which is constructed based on a subset of a large dataset. They generated several tree using C4.5 and improved them using GA to obtain a final better tree. Hemmateenejad et al. [5] used ant colony optimization (ACO) to improve classification and regression trees (CART) process. Meanwhile, they used GA operators (e.g., crossover and mutation) that were combined with ACO algorithm to boost the improvement plan. Liu and Fan [6] optimized the results of a decision tree algorithm by GA. They took C4.5 algorithm as a tool to generate rules because of the accuracy and low complexity of the algorithm. After generating rules, they improved them using GA. Karabadji et al. [7] used GA to optimize the required parameters of a decision tree construction process including the most appropriate training samples, model parameters, the set of attributes, and the growing and pruning criteria. Four DT models including BFTree, J48, SimpleCart, and REPTree in WEKA were used to construct the decision trees. Though this way leads to tree improvement, it must be said that this form of using metaheuristic methods makes the final answer quality to be dependent on the initial answers, and on the other hand, all possible trees are not likely to be created.
In the second way, metaheuristic methods are used to construct optimal trees from beginning. Cha and Tappert [8] presented a method to encode and decode a binary decision tree to and from a chromosome where genetic operators such as mutation and crossover can be applied. Theoretical properties of decision trees, encoded chromosomes, and fitness functions were presented in their paper. Otero et al. [9] used ACO to form a decision tree. In their paper, an ACO algorithm was proposed to induce decision trees combining commonly used strategies from both traditional decision tree induction algorithms and ACO. While the majority of researches around metaheuristic in decision tree covers improving pre-prepared trees, their proposed algorithm, called Ant-Tree-Miner, is an outstanding effort to adopt a metaheuristic optimization algorithm to construct a decision tree directly. However, Ant-Tree-Miner can be considered as a method developed on previously presented methods including Ant Colony Decision Tree or ACDT [10] and continues Ant Colony Decision Tree or cACDT [11]. Pacheco et al. [12] proposed a greedy randomized adaptive search procedure (GRASP) method for constructing binary classification trees. Their aim was to build simple trees as less complex as possible while prediction quality in terms of accuracy does not go down. Rivera-Lopez et al. [13] presented a differential evolution based approach for inducing oblique decision trees.
The literature review on the decision tree with several output variables shows that the use of tree improvers is quite neglected. In fact, most researches in multi-output variables classification have focused on upgrading classical single output classification and regression methods to multi-output levels. In this regard, several researches like multi-target regression Tree (MRT) [14], multi-objective regression trees [15], Multi-Target Stepwise Model Tree Induction [16], incremental multi-target model trees for data streams [17], ensemble learning for multi-output regression and classification problem [18], and option predictive clustering trees for multi-target regression [19] are presented.
According to the review of the published articles, the capabilities of the genetic algorithm were not used to construct an optimal tree for single output and multiple outputs data classification. Therefore, in this paper, a method for constructing an optimal decision tree based on a genetic algorithm is presented. The proposed method is able to effectively select important features in a two-level optimization process, while ensuring the maximum prediction accuracy in the classification. In the proposed bi-level genetic algorithm (BiLeGA) method, feature selection is done in outer level while optimal tree structure is generated in inner level. Also, new solution representation, selection, crossover, and mutation operators for the GA are designed. The proposed method is then compared to some most popular techniques both from tree based and non-tree based classification methods. Single and multiple outputs data have been employed in these comparisons. The results demonstrate that BiLeGA improves decision tree models to the best level of classification methods like support vector machine (SVM) [20], artificial neural network (ANN) [21], and logistic regression [22].
The rest of the paper is organized as follows: In Section 2, the overall structure of BiLeGA is explained. Inner level of the proposed bi-level discrete-continues GA is presented in Section 3 followed by outer level explanation in Section 4. Experiments and results are presented in Section 5 and the conclusion is given in Section 6.
Section snippets
Overall structure of BiLeGA
Decisions on constructing a classification tree can be divided into two groups. The first group of decisions involves selecting a number of features among all of them. Important features should be selected here. This decision is also called feature selection. This decision has a direct effect on the size and classification accuracy of a tree. Smaller trees have higher interpretative capability but lower accuracy compared to large trees. In BiLeGA, deciding which feature to choose is on the
Solution representation
Considering d features, a solution equivalent to a binary classification tree is presented by a matrix. In this matrix, the first row comprises random integer numbers ranging from 1 to d. Since the biggest possible tree has branching node, the random numbers and their order identify which feature should be considered for branching in each branching node. To produce a tree from a solution, consider a tree which its branching nodes are numbered using ordinal numbers. Here, the
Solution representation
The task of the outer level is to determine the features of the tree's construction procedure in the inner level. To do so, if the original data set has d′ features, a horizontal vector containing d randomly selected integer numbers ranging from 1 to d′ is used as a solution. In this solution, number represents the feature to be considered in tree's construction. For example, if there are 20 initial features and only 5 features have been considered to be used in the tree,
Single output data sets
The proposed BiLeGA method for continues single output data classification was compared against two well-known decision tree induction methods C5.0 [25] and CART [26] and also three other outstanding classification and regression methods including SVM, ANN, and logistic regression all implemented in IBM SPSS Modeler. Details of the algorithms and default settings used in the software have been explained in [27]. The comparisons have been done for 12 datasets from UCI repository. Characteristics
Conclusions
In this paper, a bi-level optimization method based on genetic algorithm was proposed to feature selection and decision tree construction called BiLeGA for single and multiple outputs data. In the proposed method, the feature selection is done in the outer level and the construction of the optimal tree based on the selected features is performed in the inner level. BiLeGA was tested on a number of datasets and the results revealed that the performance of the generated decision tree models has
Declaration of Competing Interest
There is no conflict of interest to declare.
References (28)
- et al.
Building optimal regression tree by ant colony system–genetic algorithm: application to modeling of melting points
Anal. Chim. Acta
(2011) - et al.
An evolutionary scheme for decision tree construction
Knowl. Based Syst.
(2017) - et al.
Inducing decision trees with an ant colony optimization algorithm
Appl. Soft. Comput.
(2012) - et al.
A grasp method for building classification trees
Expert. Syst. Appl.
(2012) - et al.
Tree ensembles for predicting structured outputs
Pattern Recognit.
(2013) - et al.
The cart decision tree for mining data streams
Inf. Sci.
(2014) - et al.
A survey on multi‐output regression
Wiley Interdiscip. Rev: Data Min. Knowl. Discov.
(2015) - et al.
A survey of evolutionary algorithms for decision-tree induction
IEEE Trans. Syst. Man Cybern. Part C
(2012) - et al.
A survey on ensemble learning for data stream classification
ACM Comput. Surv.
(2017) - et al.
A genetic algorithm-based approach for building accurate decision trees
INFORMS J. Comput.
(2003)