Elsevier

Neurocomputing

Volume 261, 25 October 2017, Pages 50-56
Neurocomputing

Sparse coding extreme learning machine for classification

https://doi.org/10.1016/j.neucom.2016.06.078Get rights and content

Abstract

As one of supervised learning algorithms, extreme learning machine (ELM) has been proposed for training single-hidden-layer feedforward neural networks and shown great generalization performance. ELM randomly assigns the weights and biases between input and hidden layers and only learns the weights between hidden and output layers. Physiological research has shown that neurons at the same layer are laterally inhibited to each other such that outputs of each layer are sparse. However, it is difficult for ELM to accommodate the lateral inhibition by directly using random feature mapping. Therefore, this paper proposes a sparse coding ELM (ScELM) algorithm, which can map the input feature vector into a sparse representation. In this proposed ScELM algorithm, an unsupervised way is used for sparse coding and dictionary is randomly assigned rather than learned. Gradient projection based method is used for the sparse coding. The output weights are trained through the same supervised way as ELM. Experimental results on the benchmark datasets have shown that this proposed ScELM algorithm can outperform other state-of-the-art methods in terms of classification accuracy.

Introduction

During the past decades, neural networks are widely studied in the areas of machine learning, pattern recognition and robotics since they are able to approximate complex nonlinear functions so as to provide much higher classification accuracy. Many learning algorithms have been proposed for training neural networks, for example, support vector machine (SVM) [1], [2] for single-hidden-layer neural networks (SLNN), back-propagation (BP) algorithm and deep learning algorithms [3], [4], [5], [6], [7] for multiple-hidden-layer neural networks (MLNN).

SVM can be seen as a training method for SLNN based on standard optimization method by maximizing the margin between two classes. However, it is difficult for SVM to deal with large-scale data since the quadratic programming required to obtain the optimal solution is computationally expensive when the number of training samples is too large.

Further efforts have been put on training MLNNs. BP algorithm is a pioneer for this type of efforts. It minimizes the training errors based on gradient descent strategy and the errors are back-propagated from the output layer to previous hidden layers. However, in real applications, BP algorithm has not shown great performance for neural networks with many hidden layers. This is because that the gradients become smaller and smaller with the back-propagation process from the top to lower layers such that the updates are weak at lower layers. Recently, several deep learning algorithms have been proposed, e.g., deep Boltzmann machine (DBM) [4], [6], [7], deep belief network (DBN) [5], convolutional neural network (CNN) [3], stacked denoise autoencoder (SDAE) [8], [9], [10] and stacked sparse autoencoder (SSAE) [11], [12]. The underlying idea of deep learning is that feature extraction and classification are combined together in a unified MLNN architecture. In these algorithms, learning of connection weights is basically divided into two processes. The first one is bottom-up layer-wise pre-training through unsupervised ways with a common objective function that output and input are as close as possible between two neighboring layers. For example, DBM performs Gibbs sampling to maximize the log-likelihood of training data and SSAE performs self-taught sparse coding. The second one is top-down fine-tuning of connection weights through a supervised way based on gradient descent strategy. However, the gradient descent based pre-training and fine-tuning is likely to converge to a local optimum.

Recently, extreme learning machine (ELM) was proposed for training SLNNs [13], [14], [15]. One contribution of ELM is that the weights and bias between input and hidden layers are randomly generated such that only the weights between hidden and output layers require training. The other contribution of ELM is that it obtains an optimal output weights by minimizing not only the training errors but also the norm of output weights such that better generalization performance is achieved [16]. This objective function is solved by using Lagrange multiplier method. Theoretically, ELM can obtain a global optimum [17] and therefore it is unlikely to fall into a local optimum. In terms of computation, the training cost of ELM is much lower than other state-of-the-art learning methods.

However, it is difficult to accommodate the lateral inhibition between neurons by directly using random feature mapping in ELM. Physiological research has shown that neurons at the same layer are laterally inhibited to each other such that the outputs of each layer are sparse [18]. Therefore, this paper proposes a sparse coding ELM (ScELM) algorithm which uses sparse coding technique to map the inputs to the hidden layer instead of the random mapping used by ELM. The gradient projection (GP) based method with l1 norm optimization [19] is used in the encoding stage while the output weights between hidden and output layers are learned by using Lagrange multiplier algorithm. The contribution of this proposed ScELM is that the sparsity makes hidden-layer feature representations more salient and distinctive resulting that they can contribute more for classification.

Some pioneer work has been proposed by combining l1 norm optimization with ELM. One method uses l1 norm optimization to obtain the sparse output weights [20], but hidden-layer feature representations are not sparse. Given original features, another method first calculates their sparse representations and then use such sparse representations as the inputs of ELM based SLNN [21]. In other words, feature’s sparse coding routine is beyond the neural network. Compared with the above existing methods, this proposed ScELM algorithm uses sparse mapping instead of random mapping between input and hidden layers. It is important to note that randomness are somewhat remained in the sense that the based vectors (i.e., directory) for sparse coding are randomly assigned in the proposed ScELM.

The remainder of this paper is organized as follows. Section 2 reviews some related work in sparse coding. Section 3 presents details of this proposed ScELM algorithm. Experiment results are shown in Section 4.

Section snippets

Sparse coding algorithms

By exploring the receptive fields of simple neurons in the visual stripe cortex of cats, Hubel and Wiesel posited that the receptive field of primary visual cortex (i.e., V1 neurons) can produce a sparse representation for visual signal [22]. The electrophysiological experiments further validated the sparse coding principle existed in the visual cortex [23]. These findings inspired engineering community to develop sparse coding algorithms for signal processing.

There have been various algorithms

Architecture

As shown in Fig. 1, this proposed ScELM aims to train a single-hidden-layer neural network. Between input and hidden layers, it uses sparse coding technique to map input features into a mid-level feature space. Given an input feature vector, the hidden layer outputs its sparse representation. Since the dictionary used for sparse coding is randomly assigned using uniform distribution in this proposed ScELM algorithm, training is not required between input and hidden layers. The calculation

Experimental setup

Our experiments use a total of 16 data sets, including 8 binary-classification cases and 8 multi-classification cases, to evaluate this proposed ScELM algorithm. Most of the data sets are taken from UCI Machine Learning Repository [38]. The details of these data sets are shown in Table 1. In this table, column “Random Perm” shows whether the training and test data are randomly assigned. For each data set, there are a total of 50 collections of randomly assigned training-test data partitions. In

Conclusions

This paper proposes a new method for learning SLNNs, called ScELM. It uses sparse coding technique to map the input features to hidden feature representations such that it can improve the classification performance. This paper conducts extensive experiments on publicly available databases to evaluate this proposed ScELM algorithm and the results show that the ScELM gets better performance than ELM and SVM in terms of classification. Future work includes using other sparse coding algorithms and

Acknowledgment

This work is supported by National Natural Science Foundation of China under grant 61473089.

Yuanlong Yu received the B.Eng. degree in automatic control in 2000 from the Beijing Institute of Technology, Beijing, China, the M.Eng. degree in computer applied technology in 2003 from Tsinghua University, Beijing, and the Ph.D. degree in electrical engineering in 2010 from Memorial University of Newfoundland, St. Johns, NL, Canada. After completing his doctoral studies, he worked as a Postdoctoral Fellow at Memorial University of Newfoundland. Since September 2011, he has been with

References (39)

  • R. Salakhutdinov et al.

    An efficient learning procedure for deep boltzmann machines

    Neural Comput.

    (2012)
  • Y. Bengio et al.

    Greedy layer-wise training of deep networks

    Proceedings of Advances in Neural Information Processing Systems

    (2006)
  • P. Vincent et al.

    Extracting and composing robust features with denoising autoencoders

    Proceedings of International Conference on Machine Learning

    (2008)
  • P. Vincent et al.

    Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion

    J. Mach. Learn. Res.

    (2010)
  • A. Coates et al.

    Search machine learning repository: The importance of encoding versus training with sparse coding and vector quantization

    Proceedings of International Conference on Machine Learning

    (2011)
  • LeeH. et al.

    Sparse deep belief net model for visual area v2

    Proceedings of Advances in Neural Information Processing Systems

    (2008)
  • HuangG.-B. et al.

    Extreme learning machine for regression and multiclass classification

    IEEE Trans. Syst. Man Cybern. Part B Cybern.

    (2012)
  • P.L. Bartlett

    The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network

    IEEE Trans. Inf. Theory

    (1998)
  • HuangG.-B. et al.

    Universal approximation using incremental constructive feedforward networks with random hidden nodes

    IEEE Trans. Neural Netw.

    (2006)
  • Cited by (43)

    • High-emitter identification model establishment using weighted extreme learning machine and active sampling

      2021, Neurocomputing
      Citation Excerpt :

      In CSd-ELM, the dissimilarity and cost-sensitive factors were embedded into the classifier. In [41], a sparse coding ELM algorithm, which is denoted by ScELM, was proposed that can make the feature representations of the hidden layer more relevant. In ScELM, the mapping between the inputs layer and the hidden layer used the sparse coding method, rather than the random mapping in classical ELM.

    • Multi-scale skeleton adaptive weighted GCN for skeleton-based human action recognition in IoT

      2021, Applied Soft Computing
      Citation Excerpt :

      Si et al. propose a human action recognition model based on hierarchical spatial reasoning and temporal stack learning network [39,40]. CNN performs well in image detection tasks [41,42]. In recent years, CNN based human action recognition has also made remarkable achievements.

    • Using IoT technology for computer-integrated manufacturing systems in the semiconductor industry

      2020, Applied Soft Computing Journal
      Citation Excerpt :

      As shown in Fig. 4, if the users add test lot data from OMI, that information would be kept in the MES database, not RFID. Moreover, when the FOUP of the test lot is loaded onto the port, TCS would send a request to the MES DB to get test lot information [36,37]. The objective setting smart system process is a difficult one for most individuals, particularly those who have never been asked to set objectives.

    • A MapReduce-based K-means clustering algorithm

      2022, Journal of Supercomputing
    View all citing articles on Scopus

    Yuanlong Yu received the B.Eng. degree in automatic control in 2000 from the Beijing Institute of Technology, Beijing, China, the M.Eng. degree in computer applied technology in 2003 from Tsinghua University, Beijing, and the Ph.D. degree in electrical engineering in 2010 from Memorial University of Newfoundland, St. Johns, NL, Canada. After completing his doctoral studies, he worked as a Postdoctoral Fellow at Memorial University of Newfoundland. Since September 2011, he has been with Dalhousie University, Halifax, NS, Canada, as a Postdoctoral Fellow. Since 2013, he worked as a Professor at Fuzhou University, China. His main interests are computer vision, pattern recognition, machine learning, visual attention, autonomous mental development and cognitive robotics.

    Zhenzhen Sun received the Bachelor’s degree in computer science and technology in 2015 at Fuzhou University, Fuzhou, China. Currently, she is a master student at Fuzhou University. Her research interests include computer vision and machine learning.

    View full text