Elsevier

Pattern Recognition Letters

Volume 128, 1 December 2019, Pages 190-196
Pattern Recognition Letters

Single and multiple outputs decision tree classification using bi-level discrete-continues genetic algorithm

https://doi.org/10.1016/j.patrec.2019.09.001Get rights and content

Highlights

  • Optimization of classification decision tree structure is considered for multiple outputs data.

  • A bi-level discrete-continues genetic algorithm is used to generate optimal decision tree structure.

  • New genetic operators are designated to address the problem.

Abstract

Data classification with decision tree models is an attractive method in data analysis and data mining. However, compared to other classification methods, the quality of prediction of these models is lower when classic heuristics and local optimization training methods are employed. To improve the performance of these models for single output and multiple outputs data sets, an optimal tree construction method based on the genetic algorithm is presented. The presented bi-level discrete-continues genetic algorithm method is able to select effective features as well as construct optimal tree. For this purpose, new operators of selection, crossover, and mutation are designed in terms of continuous and discrete variables. Comparison of the proposed method with other well-known classification methods for some test data sets and real world data shows that the performance of the decision tree models has been upgraded to the best of prediction methods level.

Introduction

Data classification models based on decision tree are extensively used in data analysis applications such as data mining. These applications are for data with both single output and multiple outputs [1]. The reasons for this issue are comprehensible nature, robustness to noise, low computational cost for generating the model, ability to deal with redundant features, and generalization ability. However, most classic decision tree training algorithms are based on locally optimization of impurity measures for tree growth. Here, the optimization process leads to select a feature to be used for branching in each node. Thus, such local behavior for optimization leads to a sub-optimal solution [2].

To solve the problem posed by the classical methods of making a decision tree model, various methods have been proposed by the researchers. Two groups of these improver methods include ensemble learning and metaheuristics. In ensemble learning, the final class of a sample is predicted through a voting scheme based on the classes predicted by different single trees [3]. By this, the comprehensibility of a single tree is lost.

On the contrary, metaheuristic methods are used in two ways. In the first way, they are fed by a set of primitive trees and then they create better trees through different operators. To do so, these operators logically combine branches of the trees to make an improved tree. Fu et al. [4] used genetic algorithm (GA) to combine some individual trees each of which is constructed based on a subset of a large dataset. They generated several tree using C4.5 and improved them using GA to obtain a final better tree. Hemmateenejad et al. [5] used ant colony optimization (ACO) to improve classification and regression trees (CART) process. Meanwhile, they used GA operators (e.g., crossover and mutation) that were combined with ACO algorithm to boost the improvement plan. Liu and Fan [6] optimized the results of a decision tree algorithm by GA. They took C4.5 algorithm as a tool to generate rules because of the accuracy and low complexity of the algorithm. After generating rules, they improved them using GA. Karabadji et al. [7] used GA to optimize the required parameters of a decision tree construction process including the most appropriate training samples, model parameters, the set of attributes, and the growing and pruning criteria. Four DT models including BFTree, J48, SimpleCart, and REPTree in WEKA were used to construct the decision trees. Though this way leads to tree improvement, it must be said that this form of using metaheuristic methods makes the final answer quality to be dependent on the initial answers, and on the other hand, all possible trees are not likely to be created.

In the second way, metaheuristic methods are used to construct optimal trees from beginning. Cha and Tappert [8] presented a method to encode and decode a binary decision tree to and from a chromosome where genetic operators such as mutation and crossover can be applied. Theoretical properties of decision trees, encoded chromosomes, and fitness functions were presented in their paper. Otero et al. [9] used ACO to form a decision tree. In their paper, an ACO algorithm was proposed to induce decision trees combining commonly used strategies from both traditional decision tree induction algorithms and ACO. While the majority of researches around metaheuristic in decision tree covers improving pre-prepared trees, their proposed algorithm, called Ant-Tree-Miner, is an outstanding effort to adopt a metaheuristic optimization algorithm to construct a decision tree directly. However, Ant-Tree-Miner can be considered as a method developed on previously presented methods including Ant Colony Decision Tree or ACDT [10] and continues Ant Colony Decision Tree or cACDT [11]. Pacheco et al. [12] proposed a greedy randomized adaptive search procedure (GRASP) method for constructing binary classification trees. Their aim was to build simple trees as less complex as possible while prediction quality in terms of accuracy does not go down. Rivera-Lopez et al. [13] presented a differential evolution based approach for inducing oblique decision trees.

The literature review on the decision tree with several output variables shows that the use of tree improvers is quite neglected. In fact, most researches in multi-output variables classification have focused on upgrading classical single output classification and regression methods to multi-output levels. In this regard, several researches like multi-target regression Tree (MRT) [14], multi-objective regression trees [15], Multi-Target Stepwise Model Tree Induction [16], incremental multi-target model trees for data streams [17], ensemble learning for multi-output regression and classification problem [18], and option predictive clustering trees for multi-target regression [19] are presented.

According to the review of the published articles, the capabilities of the genetic algorithm were not used to construct an optimal tree for single output and multiple outputs data classification. Therefore, in this paper, a method for constructing an optimal decision tree based on a genetic algorithm is presented. The proposed method is able to effectively select important features in a two-level optimization process, while ensuring the maximum prediction accuracy in the classification. In the proposed bi-level genetic algorithm (BiLeGA) method, feature selection is done in outer level while optimal tree structure is generated in inner level. Also, new solution representation, selection, crossover, and mutation operators for the GA are designed. The proposed method is then compared to some most popular techniques both from tree based and non-tree based classification methods. Single and multiple outputs data have been employed in these comparisons. The results demonstrate that BiLeGA improves decision tree models to the best level of classification methods like support vector machine (SVM) [20], artificial neural network (ANN) [21], and logistic regression [22].

The rest of the paper is organized as follows: In Section 2, the overall structure of BiLeGA is explained. Inner level of the proposed bi-level discrete-continues GA is presented in Section 3 followed by outer level explanation in Section 4. Experiments and results are presented in Section 5 and the conclusion is given in Section 6.

Section snippets

Overall structure of BiLeGA

Decisions on constructing a classification tree can be divided into two groups. The first group of decisions involves selecting a number of features among all of them. Important features should be selected here. This decision is also called feature selection. This decision has a direct effect on the size and classification accuracy of a tree. Smaller trees have higher interpretative capability but lower accuracy compared to large trees. In BiLeGA, deciding which feature to choose is on the

Solution representation

Considering d features, a solution equivalent to a binary classification tree is presented by a 2×(2d1) matrix. In this matrix, the first row comprises 2d1 random integer numbers ranging from 1 to d. Since the biggest possible tree has 2d1 branching node, the random numbers and their order identify which feature should be considered for branching in each branching node. To produce a tree from a solution, consider a tree which its branching nodes are numbered using ordinal numbers. Here, the

Solution representation

The task of the outer level is to determine the features of the tree's construction procedure in the inner level. To do so, if the original data set has d′ features, a horizontal vector containing d randomly selected integer numbers ranging from 1 to d′ is used as a solution. In this solution, ith(i=1,2,,d) number represents the feature to be considered in tree's construction. For example, if there are 20 initial features and only 5 features have been considered to be used in the tree, 21153

Single output data sets

The proposed BiLeGA method for continues single output data classification was compared against two well-known decision tree induction methods C5.0 [25] and CART [26] and also three other outstanding classification and regression methods including SVM, ANN, and logistic regression all implemented in IBM SPSS Modeler. Details of the algorithms and default settings used in the software have been explained in [27]. The comparisons have been done for 12 datasets from UCI repository. Characteristics

Conclusions

In this paper, a bi-level optimization method based on genetic algorithm was proposed to feature selection and decision tree construction called BiLeGA for single and multiple outputs data. In the proposed method, the feature selection is done in the outer level and the construction of the optimal tree based on the selected features is performed in the inner level. BiLeGA was tested on a number of datasets and the results revealed that the performance of the generated decision tree models has

Declaration of Competing Interest

There is no conflict of interest to declare.

References (28)

  • D.S. Liu et al.

    A modified decision tree algorithm based on genetic algorithm for mobile user classification problem

    Sci. World J.

    (2014)
  • S.H. Cha et al.

    A genetic algorithm for constructing compact binary decision trees

    J. Pattern Recognit. Res.

    (2009)
  • U. Boryczka et al.

    Ant colony decision trees–a new method for constructing decision trees based on ant colony optimization

  • U. Boryczka et al.

    An adaptive discretization in the acdt algorithm for continuous attributes

  • Cited by (0)

    View full text