Sparse coding extreme learning machine for classification
Introduction
During the past decades, neural networks are widely studied in the areas of machine learning, pattern recognition and robotics since they are able to approximate complex nonlinear functions so as to provide much higher classification accuracy. Many learning algorithms have been proposed for training neural networks, for example, support vector machine (SVM) [1], [2] for single-hidden-layer neural networks (SLNN), back-propagation (BP) algorithm and deep learning algorithms [3], [4], [5], [6], [7] for multiple-hidden-layer neural networks (MLNN).
SVM can be seen as a training method for SLNN based on standard optimization method by maximizing the margin between two classes. However, it is difficult for SVM to deal with large-scale data since the quadratic programming required to obtain the optimal solution is computationally expensive when the number of training samples is too large.
Further efforts have been put on training MLNNs. BP algorithm is a pioneer for this type of efforts. It minimizes the training errors based on gradient descent strategy and the errors are back-propagated from the output layer to previous hidden layers. However, in real applications, BP algorithm has not shown great performance for neural networks with many hidden layers. This is because that the gradients become smaller and smaller with the back-propagation process from the top to lower layers such that the updates are weak at lower layers. Recently, several deep learning algorithms have been proposed, e.g., deep Boltzmann machine (DBM) [4], [6], [7], deep belief network (DBN) [5], convolutional neural network (CNN) [3], stacked denoise autoencoder (SDAE) [8], [9], [10] and stacked sparse autoencoder (SSAE) [11], [12]. The underlying idea of deep learning is that feature extraction and classification are combined together in a unified MLNN architecture. In these algorithms, learning of connection weights is basically divided into two processes. The first one is bottom-up layer-wise pre-training through unsupervised ways with a common objective function that output and input are as close as possible between two neighboring layers. For example, DBM performs Gibbs sampling to maximize the log-likelihood of training data and SSAE performs self-taught sparse coding. The second one is top-down fine-tuning of connection weights through a supervised way based on gradient descent strategy. However, the gradient descent based pre-training and fine-tuning is likely to converge to a local optimum.
Recently, extreme learning machine (ELM) was proposed for training SLNNs [13], [14], [15]. One contribution of ELM is that the weights and bias between input and hidden layers are randomly generated such that only the weights between hidden and output layers require training. The other contribution of ELM is that it obtains an optimal output weights by minimizing not only the training errors but also the norm of output weights such that better generalization performance is achieved [16]. This objective function is solved by using Lagrange multiplier method. Theoretically, ELM can obtain a global optimum [17] and therefore it is unlikely to fall into a local optimum. In terms of computation, the training cost of ELM is much lower than other state-of-the-art learning methods.
However, it is difficult to accommodate the lateral inhibition between neurons by directly using random feature mapping in ELM. Physiological research has shown that neurons at the same layer are laterally inhibited to each other such that the outputs of each layer are sparse [18]. Therefore, this paper proposes a sparse coding ELM (ScELM) algorithm which uses sparse coding technique to map the inputs to the hidden layer instead of the random mapping used by ELM. The gradient projection (GP) based method with l1 norm optimization [19] is used in the encoding stage while the output weights between hidden and output layers are learned by using Lagrange multiplier algorithm. The contribution of this proposed ScELM is that the sparsity makes hidden-layer feature representations more salient and distinctive resulting that they can contribute more for classification.
Some pioneer work has been proposed by combining l1 norm optimization with ELM. One method uses l1 norm optimization to obtain the sparse output weights [20], but hidden-layer feature representations are not sparse. Given original features, another method first calculates their sparse representations and then use such sparse representations as the inputs of ELM based SLNN [21]. In other words, feature’s sparse coding routine is beyond the neural network. Compared with the above existing methods, this proposed ScELM algorithm uses sparse mapping instead of random mapping between input and hidden layers. It is important to note that randomness are somewhat remained in the sense that the based vectors (i.e., directory) for sparse coding are randomly assigned in the proposed ScELM.
The remainder of this paper is organized as follows. Section 2 reviews some related work in sparse coding. Section 3 presents details of this proposed ScELM algorithm. Experiment results are shown in Section 4.
Section snippets
Sparse coding algorithms
By exploring the receptive fields of simple neurons in the visual stripe cortex of cats, Hubel and Wiesel posited that the receptive field of primary visual cortex (i.e., V1 neurons) can produce a sparse representation for visual signal [22]. The electrophysiological experiments further validated the sparse coding principle existed in the visual cortex [23]. These findings inspired engineering community to develop sparse coding algorithms for signal processing.
There have been various algorithms
Architecture
As shown in Fig. 1, this proposed ScELM aims to train a single-hidden-layer neural network. Between input and hidden layers, it uses sparse coding technique to map input features into a mid-level feature space. Given an input feature vector, the hidden layer outputs its sparse representation. Since the dictionary used for sparse coding is randomly assigned using uniform distribution in this proposed ScELM algorithm, training is not required between input and hidden layers. The calculation
Experimental setup
Our experiments use a total of 16 data sets, including 8 binary-classification cases and 8 multi-classification cases, to evaluate this proposed ScELM algorithm. Most of the data sets are taken from UCI Machine Learning Repository [38]. The details of these data sets are shown in Table 1. In this table, column “Random Perm” shows whether the training and test data are randomly assigned. For each data set, there are a total of 50 collections of randomly assigned training-test data partitions. In
Conclusions
This paper proposes a new method for learning SLNNs, called ScELM. It uses sparse coding technique to map the input features to hidden feature representations such that it can improve the classification performance. This paper conducts extensive experiments on publicly available databases to evaluate this proposed ScELM algorithm and the results show that the ScELM gets better performance than ELM and SVM in terms of classification. Future work includes using other sparse coding algorithms and
Acknowledgment
This work is supported by National Natural Science Foundation of China under grant 61473089.
Yuanlong Yu received the B.Eng. degree in automatic control in 2000 from the Beijing Institute of Technology, Beijing, China, the M.Eng. degree in computer applied technology in 2003 from Tsinghua University, Beijing, and the Ph.D. degree in electrical engineering in 2010 from Memorial University of Newfoundland, St. Johns, NL, Canada. After completing his doctoral studies, he worked as a Postdoctoral Fellow at Memorial University of Newfoundland. Since September 2011, he has been with
References (39)
- et al.
Extreme learning machine: Theory and applications
Neurocomputing
(2006) - et al.
Optimization method based extreme learning machine for classification
Neurocomputing
(2010) - et al.
Sparse coding with an overcomplete basis set: A strategy employed by v1?
Vis. Res.
(1997) - et al.
Anomaly detection in traffic using l1-norm minimization extreme learning machine
Neurocomputing
(2015) - et al.
Support vector networks
Mach. Learn.
(1995) - et al.
The entire regularization path for the support vector machine
J. Mach. Learn. Res.
(2004) - et al.
Gradient-based learning applied to document recognition
Proc. IEEE
(1998) - et al.
Reducing the dimensionality of data with neural networks
Science
(2006) - et al.
A fast learning algorithm for deep belief nets
Neural Comput.
(2006) - et al.
Deep boltzmann machine
J. Mach. Learn. Res.
(2009)
An efficient learning procedure for deep boltzmann machines
Neural Comput.
Greedy layer-wise training of deep networks
Proceedings of Advances in Neural Information Processing Systems
Extracting and composing robust features with denoising autoencoders
Proceedings of International Conference on Machine Learning
Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion
J. Mach. Learn. Res.
Search machine learning repository: The importance of encoding versus training with sparse coding and vector quantization
Proceedings of International Conference on Machine Learning
Sparse deep belief net model for visual area v2
Proceedings of Advances in Neural Information Processing Systems
Extreme learning machine for regression and multiclass classification
IEEE Trans. Syst. Man Cybern. Part B Cybern.
The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network
IEEE Trans. Inf. Theory
Universal approximation using incremental constructive feedforward networks with random hidden nodes
IEEE Trans. Neural Netw.
Cited by (43)
Unsupervised Denoising Feature Learning for Classification of Corrupted Images
2022, Big Data ResearchHigh-emitter identification model establishment using weighted extreme learning machine and active sampling
2021, NeurocomputingCitation Excerpt :In CSd-ELM, the dissimilarity and cost-sensitive factors were embedded into the classifier. In [41], a sparse coding ELM algorithm, which is denoted by ScELM, was proposed that can make the feature representations of the hidden layer more relevant. In ScELM, the mapping between the inputs layer and the hidden layer used the sparse coding method, rather than the random mapping in classical ELM.
Multi-scale skeleton adaptive weighted GCN for skeleton-based human action recognition in IoT
2021, Applied Soft ComputingCitation Excerpt :Si et al. propose a human action recognition model based on hierarchical spatial reasoning and temporal stack learning network [39,40]. CNN performs well in image detection tasks [41,42]. In recent years, CNN based human action recognition has also made remarkable achievements.
Taxonomy, state-of-the-art, challenges and applications of visual understanding: A review
2021, Computer Science ReviewUsing IoT technology for computer-integrated manufacturing systems in the semiconductor industry
2020, Applied Soft Computing JournalCitation Excerpt :As shown in Fig. 4, if the users add test lot data from OMI, that information would be kept in the MES database, not RFID. Moreover, when the FOUP of the test lot is loaded onto the port, TCS would send a request to the MES DB to get test lot information [36,37]. The objective setting smart system process is a difficult one for most individuals, particularly those who have never been asked to set objectives.
A MapReduce-based K-means clustering algorithm
2022, Journal of Supercomputing
Yuanlong Yu received the B.Eng. degree in automatic control in 2000 from the Beijing Institute of Technology, Beijing, China, the M.Eng. degree in computer applied technology in 2003 from Tsinghua University, Beijing, and the Ph.D. degree in electrical engineering in 2010 from Memorial University of Newfoundland, St. Johns, NL, Canada. After completing his doctoral studies, he worked as a Postdoctoral Fellow at Memorial University of Newfoundland. Since September 2011, he has been with Dalhousie University, Halifax, NS, Canada, as a Postdoctoral Fellow. Since 2013, he worked as a Professor at Fuzhou University, China. His main interests are computer vision, pattern recognition, machine learning, visual attention, autonomous mental development and cognitive robotics.
Zhenzhen Sun received the Bachelor’s degree in computer science and technology in 2015 at Fuzhou University, Fuzhou, China. Currently, she is a master student at Fuzhou University. Her research interests include computer vision and machine learning.