Elsevier

Neural Networks

Volume 124, April 2020, Pages 20-38
Neural Networks

Adaptive neural tree exploiting expert nodes to classify high-dimensional data

https://doi.org/10.1016/j.neunet.2019.12.029Get rights and content

Abstract

Classification of high dimensional data suffers from curse of dimensionality and over-fitting. Neural tree is a powerful method which combines a local feature selection and recursive partitioning to solve these problems, but it leads to high depth trees in classifying high dimensional data. On the other hand, if less depth trees are used, the classification accuracy decreases or over-fitting increases. This paper introduces a novel Neural Tree exploiting Expert Nodes (NTEN) to classify high-dimensional data. It is based on a decision tree structure, whose internal nodes are expert nodes performing multi-dimensional splitting. Any expert node has three decision-making abilities. Firstly, they can select the most eligible neural network with respect to the data complexity. Secondly, they evaluate the over-fitting. Thirdly, they can cluster the features to jointly minimize redundancy and overlapping. To this aim, metaheuristic optimization algorithms including GA, NSGA-II, PSO and ACO are applied. Based on these concepts, any expert node splits a class when the over-fitting is low, and clusters the features when the over-fitting is high. Some theoretical results on NTEN are derived, and experiments on 35 standard data show that NTEN reaches good classification results, reduces tree depth without over-fitting and degrading accuracy.

Introduction

High-dimensional data classification causes some significant challenges in basic classifiers, including curse of dimensionality and over-fitting, which makes them impractical and insufficient to large sets of real problems (Shi, Liu, Qi, & Wang, 2018). One of the imposing solutions to these challenges is the dimension reduction (Shi et al., 2018). In addition, data partitioning could help to achieve simpler classifiers in each partition (Castro, Georgiopoulos, Demara, & Gonzalez, 2005). Neural trees by combining a local feature selection and recursive partitioning, could be helpful in this case. Moreover, neural trees by borrowing the fast training phase and the good generalization ability of the decision trees and the strong classification ability of neural networks, lead to powerful classifiers, even in high dimensional data (Federer & Zylberberg, 2018). There exist different neural trees with good achievements in different problems like Perceptron tree (Utgoff, 1989), neural tree with linear discriminant (Rani, Kumar, Micheloni, & Foresti, 2013), decision tree with bounded error (Saettler, Laber, & Pereira, 2017), balanced neural tree (Micheloni, Rani, Kumar, & Foresti, 2012), neural tree with multi-dimensional split (Maji, 2008), generalized neural tree (Foresti & Micheloni, 2002), omnivariate neural tree (Yildiz & Alpaydin, 2001) and neural trees with knowledge transferring (Abpeykar & Ghatee, 2018). In addition, some neural trees achieved significant results in dealing with high dimensional data including CART (Castro et al., 2005), SAINT (Federer & Zylberberg, 2018), adaptive high-order neural tree (Foresti & Dolso, 2004), decision forest of RBF networks (Abpeykar, Ghatee, & Zare, 2019) and neural trees with P2P and SC knowledge transferring (Abpeykar & Ghatee, 2019). By the way there are some dilemmas, which existing neural trees do not consider all of them jointly:

  • 1.

    Accurate classification of high dimensional feature space leads to more depth trees, thus achieving less depth neural trees require more complex computations at each node (Foresti and Micheloni, 2002, Micheloni et al., 2012, Rani et al., 2013).

  • 2.

    Multi-dimensional split is considered in some neural tree models (Foresti and Dolso, 2004, Maji, 2008, Micheloni et al., 2012, Ojha et al., 2017), but no one analyzes the data complexity for splitting, which leads to more accurate classification in neural tree’s node.

  • 3.

    The data partition of each neural tree node is considered as a sub-problem (Utgoff, 1989). Each sub-problem needs an eligible neural network, which leads to accurate and less depth trees (Foresti and Dolso, 2004, Micheloni et al., 2012). There are some studies in hybrid neural trees with different MLPs, but there is not any neural tree with expert nodes. By applying some if–then rules, these nodes can be extended to select an eligible neural network with respect to data complexity.

  • 4.

    Because of huge amount of redundancy in high-dimensional feature spaces, the neural tree faces with over-fitting. Feature clustering by using expert nodes can be used to reduce this problem.

NTEN is characterized by three main novelties, which consider the mentioned dilemmas. These novelties are: (1) multi-dimensional split of feature space, (2) selection of the most appropriate neural network for each local training set and (3) feature clustering in expert nodes in case of over-fitting by applying metaheuristic optimization algorithms. In the third novelty different algorithms are applied for feature clustering, like Genetic Algorithm (GA) (Silva Filho, Souza, & Prudêncio, 2016), Non-dominated Sorted Genetic Algorithm (NSGA-II) (Deb, Pratap, Agarwal, & Meyarivan, 2002), Particle Swarm Optimization (PSO) (Silva Filho et al., 2016) and Ant Colony Optimization (ACO) (Silva Filho et al., 2016). In traditional decision trees, feature selection at each node is done on the basis of entropy (Altınçay, 2007), Gini Index (Hady, Schwenker, & Palm, 2010) and miss classification errors (Chen & Hung, 2009). Data split based on these methods, leads to deep trees on data with high-dimensional feature spaces and needs huge computation. In these cases, a multi-dimensional split could be helpful (Foresti and Dolso, 2004, Foresti and Micheloni, 2002, Rani et al., 2013). NTEN applies multi-dimensional split in its expert nodes and chooses features with the least volume of overlap region in such a way it leads to less depth trees without degradation in the classification accuracy. By selecting features with low volume of overlap region between classes, each neural network trains a feature space with low complexity, therefore it allows computing good boundaries between classes and splitting the samples more confidently. In addition, it leads to child nodes with less overlap volume too. The assignment of the features to the most appropriate neural network based on the data complexity of the local training set (LTS) enhances the classification performance. Finally, when NTEN faces with over-fitting, the expert node clusters the features as a solution for over-fitting (Brownlee, 2016, Cano, 2013, Castro et al., 2005, Cestnik, 1987, Chen and Hung, 2009, Coello and Lechuga, 2002, Cong et al., 2017, Deb et al., 2002, Demšar, 2006, Detrano et al., 1989, Diaconis and Efron, 1983, Evett and Ernest, 1987, Federer and Zylberberg, 2018, Fisher, 1936, Fonollosa et al., 2016, Fonollosa et al., 2014, Fontenla-Romero et al., 2010, Foresti and Dolso, 2004, Foresti and Micheloni, 2002, Foresti and Pieroni, 1998, Frey and Slate, 1991, Gorman and Sejnowski, 1988, Guvenir et al., 1997, Güvenir et al., 1998, Guyon and Elisseeff, 2003, Guyon et al., 2008, Hady et al., 2010, Hall et al., 2009, Ho and Basu, 2002, Ho et al., 2006, Hong and Yang, 1991, Horton and Nakai, 1996, Leondes, 2001, Lipowski and Lipowska, 2012, Ma et al., 2009, Maji, 2008, Marler and Arora, 2004, Mesejo et al., 2016, Michalski et al., 1986, Micheloni et al., 2012, Morán-Fernández et al., 2017, Noordewier et al., 1991, Ojha et al., 2017, Peng et al., 2005, Penrose, 1946, Rani et al., 2015, Rani et al., 2013, Robert, 2014, Saettler et al., 2017). Since redundancy could cause over-fitting in high-dimensional data (Cong et al., 2017), it is also minimized in the clusters.

This paper is organized as follows. Section 2 describes the related works. In Section 3, the NTEN model is presented. Section 4 introduces expert nodes of the NTEN. Theoretical aspects are mentioned in Section 5. Experimental results on high dimensional data are presented in Section 6. Section 7 concludes the paper.

Section snippets

Related works

A neural tree is a decision tree whose non-terminal nodes contain neural network which, recursively, trains and partitions the feature space of the LTS and splits it for the next child node (Foresti and Micheloni, 2002, Foresti and Pieroni, 1998, Micheloni et al., 2012). Different neural trees are used to solve plenty types of problems. The first neural tree, called perceptron tree (Utgoff, 1989), checks whether or not the simple perceptron can separate the LTS. When it cannot, it is replaced

Neural Tree with Expert Nodes (NTEN)

NTEN has a tree structure whose internal nodes contain expert systems that can decide which kind of neural network is eligible to classify the sub-problem associated to the LTS. Then, the eligible classifier trains the samples. When over-fitting is low, the samples, which are classified correctly with a class label with the best accuracy, are assigned to the left child node. The others are sent to the right child which is a new expert node for next classification phases. If the over-fitting of

Expert nodes of the NTEN

In previous section the training and testing procedure of NTEN is discussed, which each node in NTEN is an expert node. The architecture of any expert node of NTEN is presented in Fig. 3. As one can see, such expert node includes three subsystems.

  • The first subsystem consists of three components: a model-base, a knowledge-base and an inference engine. The inference engine gets a classification sub-problem, evaluates its data complexity and, by the aid of the knowledge-base, selects an eligible

Theoretical analysis

Lemma 1

Finding K clusters of features by minimizing objective function of Eq. (3) is finite.

Proof

Let each feature consists of at most θ discrete different values. Thus the redundancy part in objective function Fitness(q) can be evaluated in O(θ2). Also based on Eq. (2) VOR part can be evaluated by O(MC2log2(N)), where M, C and N are the numbers of features, classes and samples, respectively. Thus, the objective function Fitness(q) can be evaluated at most O(max{MC2log2(N),θ2}) times and thus the

Experimental results

To show that NTEN is able to achieve accurate results compared with existing NTs, but with less depth trees and without over-fitting and degrading of the classification accuracy, different experiments have been done on 35 different data, which are mentioned in Table 2.

Note that in these experiments, the FDRK is another version of NTEN that applies feature clustering in the root node instead of expert nodes with over-fitting (Abpeykar & Ghatee, 2018). Comparative results of COF-NT (Rani et al.,

Conclusions

Classification of high-dimensional data suffers from curse of dimensionality and over-fitting. Feature selection is one of the solutions to this problem. On the other hand, neural trees are powerful classifiers that, by combining a local feature selection and a recursive partitioning, can solve simple sub-problems in each node. Respecting all achievements of existing neural trees, there is not a neural tree with multi-dimensional split. Also there is not neural tree with an expert system in the

References (73)

  • Fontenla-RomeroO. et al.

    A new convex objective function for the supervised learning of single-layer neural networks

    Pattern Recognition

    (2010)
  • GormanR.P. et al.

    Analysis of hidden units in a layered network trained to classify sonar targets

    Neural Networks

    (1988)
  • GüvenirH.A. et al.

    Learning differential diagnosis of erythemato-squamous diseases using voting feature intervals

    Artificial Intelligence in Medicine

    (1998)
  • HongZ.-Q. et al.

    Optimal discriminant plane for a small number of samples and design method of classifier on the plane

    Pattern Recognition

    (1991)
  • LipowskiA. et al.

    Roulette-wheel selection via stochastic acceptance

    Physica A. Statistical Mechanics and its Applications

    (2012)
  • MajiP.

    Efficient design of neural network tree using a new splitting criterion

    Neurocomputing

    (2008)
  • MicheloniC. et al.

    A balanced neural tree for pattern classification

    Neural Networks

    (2012)
  • OjhaV.K. et al.

    Ensemble of heterogeneous flexible neural trees using multiobjective genetic programming

    Applied Soft Computing

    (2017)
  • RaniA. et al.

    A neural tree for classification using convex objective function

    Pattern Recognition Letters

    (2015)
  • RaniA. et al.

    Incorporating linear discriminant analysis in neural tree for multidimensional splitting

    Applied Soft Computing

    (2013)
  • SaettlerA. et al.

    Decision tree classification with bounded number of errors

    Information Processing Letters

    (2017)
  • ShiY. et al.

    Learning from label proportions on high-dimensional data

    Neural Networks

    (2018)
  • Silva FilhoT.M. et al.

    A swarm-trained k-nearest prototypes adaptive classifier with automatic feature selection for interval data

    Neural Networks

    (2016)
  • YijingL. et al.

    Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data

    Knowledge-Based Systems

    (2016)
  • ZiyatdinovA. et al.

    Bioinspired early detection through gas flow modulation in chemo-sensory systems

    Sensors and Actuators B (Chemical)

    (2015)
  • AbpeykarS. et al.

    An ensemble of RBF neural networks in decision tree structure with knowledge transferring to accelerate multi-classification

    Neural Computing and Applications

    (2018)
  • AeberhardS. et al.

    Comparison of classifiers in high dimensional settingsTech. rep, 92(02)

    (1992)
  • AsuncionA. et al.

    UCI machine learning repository

    (2007)
  • BlowerP.E. et al.

    MicroRNA expression profiles for the NCI-60 cancer cell panel

    Molecular Cancer Therapeutics

    (2007)
  • BreimanL.

    Classification and regression trees

    (2017)
  • BrownleeJ.

    Master Machine Learning Algorithms: Discover how they work and implement them from scratch

    (2016)
  • CestnikB.

    Assistant 86: A knowledge-elicitation tool for sophisticated users

    Progress in Machine Learning

    (1987)
  • CoelloC.C. et al.

    MOPSO: A proposal for multiple objective particle swarm optimization

  • CongY. et al.

    Online similarity learning for big data with overfitting

    IEEE Transactions on Big Data

    (2017)
  • DebK. et al.

    A fast and elitist multiobjective genetic algorithm: NSGA-II

    IEEE Transactions on Evolutionary Computation

    (2002)
  • DemšarJ.

    Statistical comparisons of classifiers over multiple data sets

    Journal of Machine Learning Research (JMLR)

    (2006)
  • Cited by (13)

    • Thermal conductivity prediction of nano enhanced phase change materials: A comparative machine learning approach

      2022, Journal of Energy Storage
      Citation Excerpt :

      One of the methods which can be used to prevent overfitting problem is to use unseen data (validation) to check ability of the in-hand network in prediction of these data. In addition, nonlinear activation function can be used in the model to prepare a nonlinear ANN model which decrease the chance of overfitting [133]. Then, 911 samples were collected and processes using above procedure.

    • Least auxiliary loss-functions with impact growth adaptation (Laliga) for convolutional neural networks

      2021, Neurocomputing
      Citation Excerpt :

      Therefore, it is important to adjust them dynamically along the training process [2]. An expert system embedded in the learning system can adjust their effects gradually [1]. Some modern regularization terms divide the network weights into some groups to prune unnecessary neurons [32].

    • Theory of adaptive SVD regularization for deep neural networks

      2020, Neural Networks
      Citation Excerpt :

      In Abpeykar, Ghatee, and Zare (2019) the overfitting is solved in the training process by removing redundant features by an optimization approach. In Abpeikar et al. (2020) an expert system has been exploited in the learning model to justify the model with respect to the data complexity. But, these techniques did not justify the regularization scheme in the training process.

    View all citing articles on Scopus
    View full text