Ensemble of classification models with weighted functional link network

https://doi.org/10.1016/j.asoc.2021.107322Get rights and content

Highlights

  • Random mapping and pre-trained weight based classification models are proposed.

  • Numerical and statistical analysis prove the effectiveness of the proposed models.

  • The performance of the proposed models is evaluated on 33 benchmark datasets.

Abstract

Ensemble classifiers with random vector functional link network have shown improved performance in classification problems. In this paper, we propose two approaches to solve the classification problems. In the first approach, the original input space’s data points are mapped explicitly into a randomized feature space via neural network wherein the weights of the hidden layer are generated randomly. After feature projection, classification models twin bounded support vector machines (SVM), least squares twin SVM, twin k-class SVM, least squares twin k-class SVM and robust energy based least squares twin SVM are trained on the extended features (original features and randomized features). In the second approach, twin bounded support vector machines (SVM), least squares twin SVM, twin k-class SVM, least squares twin k-class SVM and robust energy based least squares twin SVM models are used to generate the weights of the hidden layer architecture and the weights of output layer are optimized via closed form solution. The performance of both the proposed architectures is evaluated on 33 datasets — including datasets from the UCI repository and fisheries data (not in UCI). Both the experimental results and statistical tests conducted demonstrate that the proposed approaches perform significantly better than the other baseline models. We also analyze the effect of the number of enhanced features on the performance of the given models.

Introduction

The classification performance of classifiers can be improved by combining the decision of multiple classifiers. These combined classifiers are commonly known as ensemble classifiers or multiple classifier systems. Perturb and combine strategy on individual classifiers are used in ensemble methodology [1]. In perturb strategy, the classifiers are evaluated on the perturbed training datasets and in combine strategy, the outputs of these classifiers are aggregated in a suitable fashion such that the classification of an ensemble model is better as compared to the individual baseline classifiers.

Ditterich [2] suggested three fundamental reasons for the success of ensemble methods: statistical, computational, and representational. The statistical problem arises when the model searches for the hypothesis from a hypotheses space H that is too large for the given size of the training data. When hypotheses space is larger than the available training data, learning algorithm chooses one of the hypothesis from the pool of hypotheses all of which fit the training data well. However, instead of choosing a single classifier, ensemble methods work better by combining all these responses. The computational problem arises when the learning model does not guarantee that the hypothesis selected is the best hypothesis. In algorithms like neural networks and decision tree classifiers, some heuristic methods are employed, as finding the hypothesis that fits the training data is computationally intractable. Heuristics like gradient descent in neural networks may stuck in local minima and hence, may lead to suboptimal performance. Representational reason: sometimes none of the hypothesis in H is able to represent well the target unknown function. To extend the space, it is possible to use a weighted sum of hypotheses taken from H. Additionally, bias–variance decomposition [3] and strength-correlation [4] support the ensemble methodology.

Random vector functional link (RVFL) was originally proposed in [5]. The generalization characteristics of RVFL were given in [6] and compared the learning and generalization capability with the generalized delta rule (GDR) net. Igelnik and Pao [7] showed that the RVFL network is a universal approximator for continuous functions on bounded finite dimensional set and proved that the RVFL network is an efficient universal approximator with the rate of approximation error converging to zero with order O(Cn), where n represents the number of basis functions and C independent of n. Fast learning neural network [8], an algorithm to find the weights of the flat neural networks optimally was proposed. For both the newly added pattern and newly added enhancement nodes, the algorithm updates the output weights of the RVFL network on the fly. RVFL network in modeling and control [9] concluded that the combination of the unsupervised placement of network nodes in accordance with the input data density with subsequent supervised or reinforcement learning values of the linear parameters of the approximator improves the effectiveness of the random basis function approximator and the rate of convergence of the random basis function approximator is achievable only when the number of elements in the network is sufficiently large.

Different learning methods have been combined with the RVFL network to improve the performance across various fields like combination of RVFL network with the statistical hypothesis testing [10], and self-organization of a number of enhancement nodes for the remote sensing application known as statistical self-organizing learning system (SSOLS). This algorithm is divided into two phases: a mapping stage and a learning stage. Combination RVFL network with expectation maximization improved the performance [11]. Density based random vector functional link net [12] combined radial basis function neural net with the RVFL network to improve the performance in word recognition filter. RVFL network based MPEG-4 coders [13] was used for intelligent rate control. Pedestrian detection system [14], based on multi-feature selection, combined Adaboost and RVFL network for accurate detection of pedestrians. Recently, deep RVFL model [15], RVFL learning with privileged information [16], and variance embedded RVFL [17] have been proposed to enhance the generalization of the RVFL based models.

The analysis of RVFL network [18] concluded that the direct links from the input layer to the hidden layer improves the performance of the network and randomization range of input weights and biases effects discrimination power of the RVFL network. RVFL has been applied in diverse fields like time series forecasting [19], optimal control [20] and a dynamic stepwise updating algorithm [21] in which new enhancement data points were added based on the pseudo-inverse solution.

Multi layer perceptron gives a closer approximation compared to the RVFL network and hence, Gaussian mixture RVFL method [11] combined the RVFL network with the expectation maximization (EM) method. L1-norm regularized autoencoder was used to learn the efficient feature representations of the RVFL network [22] which lead to increase in the performance of the network.

Twin bounded support vector machines (TBSVM) [23] implement structural risk minimization principle to embody the marrow of statistical learning. TBSVM solves two smaller QPPs to find the two non-parallel hyperplanes. By adding the regularization term in TBSVM, structural risk is minimized by maximizing the margin. General twin support vector machine [24], [25] introduced pinball loss function to make the model robust to noise and resampling. To introduce sparsity, sparse pinball loss twin support vector machine for classification problems [26], [27] and sparse pinball loss twin support vector machine for clustering [28] have been proposed. Implementation of structural risk minimization principle [29], [30], [31] is one of the advantages of support vector classification (SVC).

Least squares twin support vector machines (LSTSVM) [32], [33] involves a system of linear equations with squared loss function instead of the convex quadratic programming problem (QPP). As LSTSVM is sensitive to noise and outliers, hence energy-based least squares twin support vector machine (ELSTSVM) [34] introduced energy term to reduce the effect of noise and outliers. Tanveer et al. [35] introduced an extra regularization term to the ELSTVM formulation, known as robust energy based least squares twin support vector machines (RELSTSVM), resulting in the optimization problems to be positive definite and hence better generalization. Recent study [36] shows that RELSTSVM is the best classifier among the twin support vector machine models. LSTSVM with enhanced features for classification show improved performance than the LSTSVM models [37].

Different from TBSVM, LSTSVM, ELSTSVM, RELSTSVM andsparse linear programming twin support vector machines [38], a new multiclass approach called twin k-class support vector classification (TWKSVC) [39] based on “1-versus-1-versus-rest” generates k(k1)2 binary classifiers for a k-class classification problem. To reduce the computational complexity of TWKSVC, least squares TWKSVC [40] introduced the equality constraints in the objective function of TWKSVC to solve a linear system of equations instead of solving a QPP. Least squares K-nearest neighbor based weighted multiclass twin support vector machine [41] solves a system of linear equations and uses “1-versus-1-versus-rest” approach to solve the classification problems.

Generally, input data samples provide a multitude of information from different feature representations, such as compressed feature representation obtained from a lower dimensional feature space representation and the sparse feature representation obtained via higher dimensional feature space [42], [43]. With different feature representations, plenty of underlying information is explored by the different learning algorithms. An ensemble of feature spaces with random forest [44] used different feature representations — principal component analysis and linear discriminant analysis, at each node to provide diverse information and hence, diverse decisions improve the overall performance. The authors showed that with feature transformations, the information is increased which leads to better generalization performance. Also, RVFL [5] uses random feature transformation in combination with the original input space and has been successfully used in classification and regression tasks. Motivated by the success of different feature representations, we propose two architectures for the classification problems. In the first method, we investigate the performance of the proposed ensemble classifiers with random weighted network based enhanced features and different classification models. Since the proposed approach maps the original input space to a randomized feature space, we name the models as random vector classification models, like if we use random weighted features and LSTSVM, we name it as RV-LSTSVM and so on. RV-classification models consist of two phases: Enhancement phase and Classification phase. In the enhancement phase, enhanced feature vector pattern is generated. In this phase, random weights H are generated from the input layer to the enhancement layer. Then, an activation function g([Xe][Hs]) is applied to obtain the enhanced feature vector, here X represents the original features, e is a vector of ones, H is a random weight matrix and s is a bias vector. Both the input feature vector and the enhanced feature vector are concatenated to get the extended feature space. Based on this extended feature space, different classification models are applied thereafter for generating the decision hyperplanes. In the second method, we use different classification models to initialize the hidden layer of the proposed architecture. Based on the different models used for initializing the hidden layer, we name the models accordingly, like in TBSVM-FL model, TBSVM is used for generation of hidden layer weights. Closed form solution is used to optimize the weights of the output layer architecture.

The rest of this paper is organized as follows: Section 2 discusses about the related work, Sections 3 Proposed method-1, 4 Proposed method-2 give the proposed approaches, Section 5 gives the computational complexity of the models, Section 6 provides the experimental results and analyses of the results, and the conclusion is given in Section 7.

Section snippets

Related work

In this section, we discuss the formulations of the RVFL Network [5], Sparse pre-trained functional link network [22], Twin Bounded Support Vector Machines [23], Least Squares Twin Support Vector Machines [32], Robust Energy based least squares twin support vector machines [35], Twin k-class support vector classification [39], and Least Squares TWKSVC [40] models.

Let ARl1×n be the data points of class +1 and BRl2×n be the data points of class 1 belonging to two focused kind of samples. And, C

Proposed method-1

The proposed method can be elaborated in two step process: first step is initial feature mapping, from the input data to the implicit feature representation and the second step is the generation of classification models. Here, we apply the transformation on the original input feature pattern to map onto randomized feature space. Fig. 2 shows the proposed architecture. Initially, nonlinear transformation g(.) on the random weighted input feature pattern X is applied to map it into a randomized

Proposed method-2

In this section, we will elaborate the parameter learning process of the proposed models. The learning process of the proposed approach using different models like TBSVM, TWKSVC, LSTWKSVC, RELSTSVM and LSTSVM models is a three step process. The main three steps involved are

  • Weight generation phase.

  • Training of models.

  • Output value prediction.

Computational complexity

Consider a binary class problem with dataset size M×n, where M is the number of samples and n is the number of features corresponding to each sample. Let L be the number of enhanced features. Following the standard mathematical approach, the inverting the n×n matrix requires O(n3) complexity.

In the first proposed approach, wherein the enhanced features are generated via random mapping, inverting the matrix of size (n+L+1)×(n+L+1) are involved while as in baseline models (TBSVM, TWKSVC,

Experimental setup

Experimental study was done to evaluate the performances of different classifiers. The different classification models evaluated are RVFL, TBSVM, RV-TBSVM, TWKSVC, RV-TWKSVC, LSTWKSVC, RV-LSTWKSVC, RELSTSVM, RV-RELSTSVM, LSTSVM, RV-LSTSVM, twin weighted models and RVFL-AE.

We evaluate the proposed and baseline methods based on the benchmark datasets from the UCI repository [45], [47] and real world non-UCI datasets which are about fecundity estimation of fisheries: namely, oocMerl2F (3-class

Conclusion

In this paper, we proposed two approaches for classification problems. The first architecture is based on the random weighted extended feature space and solves different optimization problems corresponding to the models based on these extended feature spaces. Based on the extended features obtained via random weights, one can see that the performance of the proposed models is increased. The performance improvement of the RV-TVBSVM and RV-TWKSVC models achieved approximately 8% increase in

CRediT authorship contribution statement

M. Tanveer: Conceptualization, Methodology, Validation, Writing - review & editing, Supervision, Funding acquisition. M.A. Ganaie: Conceptualization, Methodology, Formal analysis, Investigation, Resources, Writing - original draft, Writing - review & editing, Visualization. P.N. Suganthan: Conceptualization, Methodology, Validation, Resources, Writing - review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by Council of Scientific & Industrial Research (CSIR), New Delhi, INDIA for funding under Extra Mural Research (EMR) Scheme grant no. 22(0751)/17/EMR-II, and Department of Science and Technology under Interdisciplinary Cyber Physical Systems (ICPS) Scheme grant no. DST/ICPS/CPS-Individual/2018/276. We gratefully acknowledge the Indian Institute of Technology Indore for providing facilities and support.

References (50)

  • GanaieM.A. et al.

    Oblique decision tree ensemble via twin bounded SVM

    Expert Syst. Appl.

    (2020)
  • NasiriJ.A. et al.

    Energy-based model of least squares twin support vector machines for human action recognition

    Signal Process.

    (2014)
  • TanveerM. et al.

    Comprehensive evaluation of twin SVM based classifiers on UCI datasets

    Appl. Soft Comput.

    (2019)
  • GanaieM.A. et al.

    LSTSVM classifier with enhanced features from pre-trained functional link network

    Appl. Soft Comput.

    (2020)
  • NasiriJ.A. et al.

    Least squares twin multi-class classification support vector machine

    Pattern Recognit.

    (2015)
  • ZhangL. et al.

    Random forests with ensemble of feature spaces

    Pattern Recognit.

    (2014)
  • González-RufinoE. et al.

    Exhaustive comparison of colour texture features and classification methods to discriminate cells categories in histological images of fish ovary

    Pattern Recognit.

    (2013)
  • BreimanL.

    Bias, Variance, and Arcing ClassifiersTech. Rep. 460

    (1996)
  • DietterichT.G.

    Ensemble methods in machine learning

  • P. Domingos, A unified bias-variance decomposition, in: Proceedings of 17th International Conference on Machine...
  • BreimanL.

    Random forests

    Mach. Learn.

    (2001)
  • PaoY.-H. et al.

    Neural-net computing and the intelligent control of systems

    Internat. J. Control

    (1992)
  • IgelnikB. et al.

    Stochastic choice of basis functions in adaptive function approximation and the functional-link net

    IEEE Trans. Neural Netw.

    (1995)
  • ChenC.P. et al.

    A rapid learning and dynamic stepwise updating algorithm for flat neural networks and the application to time-series prediction

    IEEE Trans. Syst. Man Cybern. B

    (1999)
  • TyukinI.Y. et al.

    Feasibility of random basis function approximators for modeling and control

  • Cited by (26)

    • Experimental evaluation of stochastic configuration networks: Is SC algorithm inferior to hyper-parameter optimization method?

      2022, Applied Soft Computing
      Citation Excerpt :

      The deep random vector functional link (dRVFL) and the ensemble deep random vector functional link (edRVFL) networks were been presented in [7]. And several edRVFL-based applications have also been proposed recently [8,9]. Moreover, the RVFL possesses the universal approximation property as shown in [10] and theoretically validates the RVFL in [11]

    • Metro passenger flow forecasting though multi-source time-series fusion: An ensemble deep learning approach

      2022, Applied Soft Computing
      Citation Excerpt :

      Especially, based on the concept of ‘dividing and conquering’, an emerging hybrid approach called decomposition ensemble (DE) strategy is introduced to decompose original data series into several modes containing different features, then train and predict each mode with their own features base simultaneously and obtain the final prediction result by aggregating that of all modes, thereby improving the overall prediction accuracy [48]. Inspired by the existing researches, the aim of this study is to establish an innovative interval prediction approach for MPF demand based on the ensemble learning framework, which combines several individual models to obtain better generalization performance [49]. Meanwhile, some necessary technical modules are established to achieve the corresponding functional requirements, mainly involving SEI construction and selection, multivariate preprocessing mechanism (called MOHHOMVMD), and a novel interval prediction approach long short-term memory-based lower upper bound estimation in MOHHO framework (MOHHOLSTM).

    • Regularized Least Squares Twin SVM for Multiclass Classification

      2022, Big Data Research
      Citation Excerpt :

      To scale the TWSVM models to large data, large scale TWSVM [43] and large scale pinball TWSVM [47] models have been proposed. TWSVM models have been incorporated in ensemble framework [49,50] to boost the performance. TWSVM has also been applied to classification [48], clustering [55] and regression [21,31,58].

    View all citing articles on Scopus
    View full text