Deep extreme learning machines: supervised autoencoding architecture for classification

doi:10.1016/j.neucom.2015.03.110

Neurocomputing

Volume 174, Part A, 22 January 2016, Pages 42-49

https://doi.org/10.1016/j.neucom.2015.03.110 Get rights and content

Abstract

We present a method for synthesising deep neural networks using Extreme Learning Machines (ELMs) as a stack of supervised autoencoders. We test the method using standard benchmark datasets for multi-class image classification (MNIST, CIFAR-10 and Google Streetview House Numbers (SVHN)), and show that the classification error rate can progressively improve with the inclusion of additional autoencoding ELM modules in a stack. Moreover, we found that the method can correctly classify up to 99.19% of MNIST test images, which surpasses the best error rates reported for standard 3-layer ELMs or previous deep ELM approaches when applied to MNIST. The approach simultaneously offers a significantly faster training algorithm to achieve its best performance (in the order of 5 min on a four-core CPU for MNIST) relative to a single ELM with the same total number of hidden units as the deep ELM, hence offering the best of both worlds: lower error rates and fast implementation.

Introduction

In recent years several hardware platforms optimised for neural network implementation have been developed. These implementations range from massively parallel custom-built System-on-Chip (SoC) silicon microprocessor arrays (e.g. SpiNNaker [1]), to analog VLSI processors directly emulating the ion channels of the neurons as leakage currents in CMOS subthreshold region (e.g. Neurogrid [2]). The emergence of these platforms has been accompanied by a parallel effort to develop algorithms which mimic the computational capability of the human brain, particularly in developing synthesised (engineered) neural networks. These algorithms are now utilised for both investigating brain function in computational neuroscience (for example, models of controlling eye position and working memory [3]), and for implementing computing systems in machine learning. In machine learning, an emerging algorithm is the Extreme Learning Machine (ELM) [4], [5], [6], which is known to be relatively fast to train in comparison with iterative training methods, and performs with similar accuracy to Support Vector Machines (SVMs) [7]. This current paper is motivated by recent work that has aimed to produce neuromorphic implementations of ELM [8] and related methods [9], [10], based on hardware that simulates ‘spiking neurons’. See [11] for further discussion of neuromorphic implementations. One potential limitation of hardware implementations, or implementations on resources-constrained platforms, is the number of hidden units available for concurrent activation. Our focus is on developing an ELM algorithm that enables the number of hidden units that need to be concurrently activated to be reduced, as well as offering even faster training times, whilst maintaining good performance.

The neural network architecture of standard existing ELM approaches is a three layer feedforward structure. The first layer is the input layer, the second—the hidden layer—is activated by weighted projections of the input to non-linear sigmoid neurons, and the third and final layer is the output, consisting of units with linear input–output characteristics (see Fig. 1). In ELM, the connection weights between the input and the hidden layer neurons are randomly specified and remain untrained [4], [5]. For example, the input layer connection weights may be uniformly distributed with values between −0.5 and +0.5. This is analogous with neurobiology, in the sense that a negative connection weight inhibits a neuron׳s activity, and a positive weight excites neuronal activity. After projecting the input to the hidden layer, each hidden-unit׳s non-linear sigmoid function generates responses. Then using training data, the connection weights between the hidden and the output layer is trained in a single pass by mathematical optimisation. Only this connection weight matrix is altered during training. It is calculated by a least squares regression method such as the Moore–Penrose pseudoinverse [12].

The methodology used in the above approach can be summarised as follows:

1.
Using random and fixed weights, project an input layer to a hidden layer of sigmoidal units.
2.
Using training data, numerically solve for the output weights between the hidden layer and the output layer by calculating the pseudoinverse of the matrix product of the hidden layer values for all training data, and the corresponding desired output responses.

This class of methods has been referred to as Linear Solutions of Higher Dimensional Interlayer (LSHDI) networks [11]. It is a significant deviation from classical artificial neural network training methods. In classical artificial neural networks, the input weights are iteratively trained, rather than computing the output weights only, in a single batch. This interesting property can significantly enhance the efficiency of training since the full and final solution is obtained by mathematical optimisation of a convex function, in one single step. LSHDI methods also can solve significant problems in computational neuroscience simulations of real neurobiological neurons [3]. Although widely accepted and very capable models exist at the single neuron level to mimic neurobiology, until the emergence of LSHDI, there had been no widely applicable method to synthesise (train) a network to solve multiple tasks [13]. This class of methods are now emerging as the core of a generic neural compiler for creating silicon neural systems [1].

One drawback of classical ELMs is the number of neurons in its single hidden layer are typically very large and hence training the network can be computationally impractical, given a large dataset (the algorithm order of complexity for solving for the output weights is O(KM²), where K is the number of training points and M is the number of hidden units [14]). It also makes use of batch training, meaning that the network is trained using the entire dataset at once, which usually requires large memory and processing power. In [15], [11], [16] on-line training methods (as opposed to batch training) have been proposed to overcome this, but due to the large number of neurons typically used in the single hidden layer, the training time still largely depends on the size of the network, retaining an O(M²) implementation complexity.

In this paper we introduce a different way to address the problem of large hidden layer sizes. Our approach takes inspiration from biology and the recent advances in deep learning architectures [17], [18], [19]. We show that by constructing a deep ELM network as a stack of supervised autoencoder ELM modules, and training module by module, the network training time and memory usage can be significantly improved, whilst simultaneously boosting classification error-rate performance above what can be achieved using a single ELM with the same total number of hidden units.

There have been several previous approaches to multi-layered ELM networks. Two approaches result in a similar deep network architecture to ours: (i) [20], uses unsupervised autoencoding of hidden-layer activations as a method for constructing a deep network; (ii) [21] introduces a ‘random shifts and kernalization’ method to define the input to each hidden layer in the network. Another relevant approach is that of [22], which splits the input variables amongst a cascade of multiple ELM modules, with modules beyond the first module also receiving responses from the previous module. In Discussion (Section 4) we describe how our approach fundamentally differs from these networks.

The advances made by our algorithm are a result of two key factors:

1.
Selection of the untrained input weights using our recently introduced weight shaping method known as constrained receptive field ELM (RF-C-ELM) [14] (which builds upon the Constrained ELM (C-ELM) method of [23]), rather than selecting these weights from a random distribution.
2.
We train each ELM module in the stack to both autoencode its input and classify it, and then feed both the autoencoding and the classification vectors into the subsequent module.

As we shall show, training ELM modules in the stack using both the training data and the classification results of the previous module leads to an iteratively improved classification of test data with each subsequent module. This enhancement occurs simultaneously with a reduction in the training order of complexity for the same total number of hidden units. Thus our method offers the ‘best of both worlds’: enhanced classification rates and enhanced runtime complexity.

Section snippets

Methodology

In this section, we introduce the methods that we use to construct our deep ELM network.

Experiments

We firstly describe three image classification tasks that we tested our method on, and then present results on each of these benchmarks.

Discussion

In previous work, a deep ELM structure that exploits autoencoding was proposed [20]. In that method, the initial step is to train an ELM using the training data as the target, without using any labels. Then, the transpose of these trained autoencoding output weights replace the input weights in the ELM. Then, the hidden-layer activations are trained in a similar autoencoding fashion multiple times, before finally projecting into a larger hidden layer to train as the classifier output.

This

Acknowledgements

Mark D. McDonnell׳s contribution was supported by the Australian Research Council under ARC grant DP1093425 (including an Australian Research Fellowship).

Migel D. Tissera completed his B.E. in electrical and mechatronics engineering at the University of South Australia in 2010. After completing a research internship in early 2011, he worked in industry as an electrical engineer, steadily gaining experience in areas such as mining, water, utilities and power generation. He is passionate about robotics and hardware engineering, and currently a Ph.D. research student studying machine learning and biological neural computation.

References (32)

G.-B. Huang et al.
Extreme learning machinetheory and applications
Neurocomputing
(2006)
A. Basu et al.
Silicon spiking neurons for hardware implementation of extreme learning machines
Neurocomputing
(2013)
J. Tapson et al.
Learning the pseudoinverse solution to network weights
Neural Netw.
(2013)
B. Widrow et al.
The no-prop algorithma new learning algorithm for multilayer neural networks
Neural Netw.
(2013)
J. Schmidhuber
Deep learning in neural networksan overview
Neural Netw.
(2015)
W. Yu et al.
Learning deep representations via extreme learning machines
Neurocomputing
(2015)
H.-G. Han et al.
Hierarchical extreme learning machine for feedforward neural network
Neurocomputing
(2014)
S.B. Furber, F. Galluppi, S. Temple, L.A. Plana, The SpiNNaker Project, Proc. IEEE, 102 (2014)...
B.V. Benjamin, P. Gao, E. McQuinn, S. Choudhary, A.R. Chandrasekaran, J.-M. Bussat, R. Alvarez-Icaza, J.V. Arthur, P.A....
C. Eliasmith et al.
Neural Engineering: Computation, Representation, and Dynamics in Neurobiological Systems
(2003)

G.-B. Huang et al.

Extreme learning machinesa survey

Int. J. Mach. Learn. Cybern.

(2011)

E. Cambria et al.

Extreme learning machines

IEEE Intell. Syst.

(2013)

G.-B. Huang

An insight into extreme learning machinesrandom neurons, random features and kernels

Cognit. Comput.

(2014)

F. Galluppi, S. Davies, S. Furber, T. Stewart, C. Eliasmith, Real time on-chip implementation of dynamical systems with...

S. Choudhary, S. Sloan, S. Fok, A. Neckar, E. Trautmann, P. Gao, T. Stewart, C. Eliasmith, K. Boahen, Silicon neurons...

R. Penrose, A generalized inverse for matrices, Math. Proc. Camb. Philos. Soc. 51 (1955)...

Cited by (108)

Predicting anisotropic parameters of strata by deep multiple triangular kernel extreme learning machine optimized by flower pollination algorithm
2023, Journal of Applied Geophysics
Stratigraphy in the crust is widely anisotropic. Anisotropic parameters play an important role from inversion and migration to stratigraphic interpretation and reservoir characterization. At present, under conventional geophysical methods, whether logging or seismic, do not directly measure anisotropic parameters. That is, it is difficult to obtain anisotropic parameters. However, there is a certain correlation between anisotropy parameters and other kinds of logging data, so that anisotropy parameters can be calculated from other logging curves. In view of the complexity of this relationship, a machine learning approach can be used. So, we propose a deep multiple triangular kernel extreme learning machine optimized by the flower pollination algorithm (FPA-D-MK-ELM), which is used to predict the anisotropy parameters of the strata. The accuracy and stability of the FPA-D-MK-ELM algorithm are verified by comparing the algorithm before and after optimization.
Variational quantum extreme learning machine
2022, Neurocomputing
Extreme learning machine (ELM), with fast training speed and high generalization performance, has been widely used in many fields. However, it becomes inefficient or even impossible to process data with extremely large feature spaces, which is expected to be solved by quantum computing with an exponentially large quantum state space. Here, we propose a novel variational quantum extreme learning machine (VQELM). In detail, we design a special feature mapping method to achieve nonlinear transformation of the input data, replacing the hard-to-construct activation function on quantum devices. Considering that the Harrow-Hassidim-Lloyd algorithm is difficult to solve the ELM parameters on near-term quantum devices, we adopt a variational framework to facilitate implementation on the near-term noisy intermediate scale quantum computer. On both classification and regression tasks, our proposed method outperforms classical ELM in classical simulations. Moreover, the classification tasks achieved on IBM quantum simulator also show comparable classification accuracy. The final analysis shows that our proposed algorithm has an exponential improvement over classical ones for high-dimensional data processing, and is a powerful application of quantum machine learning on near-term quantum devices.
Neuromorphic circuit based on the un-supervised learning of biologically inspired spiking neural network for pattern recognition
2022, Engineering Applications of Artificial Intelligence
One of the most sophisticated platforms for hosting intelligent systems is bio-inspired. This study proposes pattern recognition hardware using a biologically inspired Spiking Neural Network (SNN) and the new dimensionality reduction approach. The SNN model is based on real neural networks consisting of spiking neurons linked by excitatory and inhibitory synapses activated by excitatory and inhibitory neurotransmitters. Also, a semi-supervised (un-supervised STDP based learning with supervised weight initialization), spike-based learning strategy based on the learning procedure of the nervous system is used to teach the spiking output layer neurons, albeit that the hardware implementation benefits from a semi-supervised approach. The goal of this research is to accurately categorize patterns in the MNIST and CIFAR10 datasets using an SNN-based hardware platform. Due to the limitations of the latter’s resources, a dimensionality reduction based on principal component analysis (PCA) is proposed to speed up the processing procedure and reduce the hardware implementation cost. The presented pattern recognition platform is implemented using the Xilinx® VIVADO high-level synthesis platform (HLS). Finally, optimization approaches are used to improve the used space, reduce hardware implementation delay, and speed up the design process.
On computing the hyperparameter of extreme learning machines: Algorithm and application to computational PDEs, and comparison with classical and high-order finite elements
2022, Journal of Computational Physics
We consider the use of extreme learning machines (ELM) for computational partial differential equations (PDE). In ELM the hidden-layer coefficients in the neural network are assigned to random values generated on $[- R_{m}, R_{m}]$ and fixed, where $R_{m}$ is a user-provided constant, and the output-layer coefficients are trained by a linear or nonlinear least squares computation. We present a method for computing the optimal or near-optimal value of $R_{m}$ based on the differential evolution algorithm. The presented method enables us to illuminate the characteristics of the optimal $R_{m}$ for two types of ELM configurations: (i) Single-Rm-ELM, corresponding to the conventional ELM method in which a single $R_{m}$ is used for generating the random coefficients in all the hidden layers, and (ii) Multi-Rm-ELM, corresponding to a modified ELM method in which multiple $R_{m}$ constants are involved with each used for generating the random coefficients of a different hidden layer. We adopt the optimal $R_{m}$ from this method and also incorporate other improvements into the ELM implementation. In particular, here we compute all the differential operators involving the output fields of the last hidden layer by a forward-mode auto-differentiation, as opposed to the reverse-mode auto-differentiation in a previous work. These improvements significantly reduce the network training time and enhance the ELM performance. We systematically compare the computational performance of the current improved ELM with that of the finite element method (FEM), both the classical second-order FEM and the high-order FEM with Lagrange elements of higher degrees, for solving a number of linear and nonlinear PDEs. It is shown that the current improved ELM far outperforms the classical FEM. Its computational performance is comparable to that of the high-order FEM for smaller problem sizes, and for larger problem sizes the ELM markedly outperforms the high-order FEM.
Multilayer extreme learning machines and their modeling performance on dynamical systems[Formula presented]
2022, Applied Soft Computing
In this paper, two novel Multilayer Extreme Learning Machine (ML-ELM) networks are presented. We call them Improved Multilayer Extreme Learning Machines (IML-ELM). The proposed network architectures use neuron activations both during and after the training. In the first IML-ELM (IML-ELM1) network, each layer has connection weights assigned randomly as orthonormal. On the other hand, the second IML-ELM (IML-ELM2) has connection weights assigned randomly as orthonormal only in the first layer. Its following layers’ connection weights are taken from the previous layer’s output weight matrix. This assignment strategy made in the IML-ELM2 decreases the computation time even more. The networks’ modeling performances on seven benchmark dynamic systems are investigated and it is shown that the proposed IML-ELM1 and IML-ELM2 perform better modeling than the ML-ELM. They have better modeling performance of more than 70% for both training and test data sets compared to ML-ELM for some systems studied. For instance, using 100 nodes, ML-ELM, IML-ELM1 and IML-ELM2 gave average testing root mean square error results of 0.627977, 0.104272 (83%) and 0.092683 (85%) respectively for BDS 7. In addition, it has been experimentally determined that the developed networks provide improvements in terms of average training time, and this improvement exceeds 60% in some cases. These achievements clearly prove that the proposed improved multilayer extreme learning machines are efficient tools for system modeling applications.
An Edge Computing-oriented Net Power Forecasting for PV-assisted Charging Station: Model Complexity and Forecasting Accuracy Trade-off
2022, Applied Energy
Citation Excerpt :
The DA-ELM achieves fast training time without implementing a fine-tuning process whose module is exceedingly time-consuming. Apart from that, DA-ELM still maintains satisfactory forecasting performance inherited from deep learning, the characteristics of original data space is transformed into the new data space via unsupervised training layer by layer, which can effectively characterize complex functions with strong nonlinear mapping ability [29]. The hyperparameter of DA-ELM includes the number of hidden layers and hidden nodes λ={λlayer, λnode}.
The PV-assisted charging station (PVCS) aggregates the two key resources of electric vehicle (EV) charging load and photovoltaic (PV) system to maximize operation profit. In order to quantify the PVCS impact to grid and the renewable energy utilization by EV charging load, it is crucial to predict the PVCS net power and capture the most correlated factors for power variation. Given that the PVCS is located near the user side, the forecasting model complexity must be restricted to meet the online training demands on edge computing, apart from ensuring the prediction accuracy. A PVCS net power forecasting approach is proposed by simplifying model complexity from the entire cycle of training process, including the input data pruning, lightweight model training and hyperparameter optimization. Finally, a comprehensive analysis of the real data in PVCS shows that the most sensitivity factors affecting the net power of PVCS is time of use price, and the deep auto-encoded extreme learning machine (DA-ELM) can make a satisfactory compromise between prediction accuracy and model complexity for edge computing utilization. The forecasting model has outstanding prediction performance on Raspberry pi-based edge platform, which has enough significance for PVCS promotion.

View all citing articles on Scopus

Mark D. McDonnell received the B.E. and Ph.D. degrees in electronic engineering in 1998 and 2006 respectively, and a B.Sc with First Class Honours in applied mathematics in 2001, all from The University of Adelaide, Australia. He is currently Associate Research Professor, and Principal Investigator of the Computational and Theoretical Neuroscience Laboratory, at the Institute for Telecommunications Research at University of South Australia, which he joined in 2007. Prior to this, he was a Lecturer in the School of Electrical and Electronic Engineering at University of Adelaide. He is a member of the editorial board of PLOS One and Fluctuation and Noise Letters, and has served as a Guest Editor for Proceedings of the IEEE and Frontiers in Computational Neuroscience. McDonnell׳s research focuses on the use of computational and engineering methods to advance scientific knowledge about the influence of noise and random variability in brain signals and structures during neurobiological computation. His contributions to this area of computational neuroscience have been recognized by the award of a five-year Australian Research Fellowship from the Australian Research Council in 2010, and a South Australian Tall Poppy Award for Science in 2008, as well as numerous invited talks. McDonnell has published over 80 papers, including several review articles, and a book on Stochastic Resonance published by Cambridge University Press. He has served as Vice President and Secretary of the IEEE South Australia (SA) Section Joint Communications and Signal Processing Chapter, and co-founded Neuroeng: The Australian Association for Computational Neuroscientists and Neuroengineers.

View full text

Deep extreme learning machines: supervised autoencoding architecture for classification

Abstract

Introduction

Section snippets

Methodology

Experiments

Discussion

Acknowledgements

Neurocomputing

Neurocomputing

Neural Netw.

Neural Netw.

Neural Netw.

Neurocomputing

Neurocomputing

Neural Engineering: Computation, Representation, and Dynamics in Neurobiological Systems

Extreme learning machinesa survey

Int. J. Mach. Learn. Cybern.

Extreme learning machines

IEEE Intell. Syst.

An insight into extreme learning machinesrandom neurons, random features and kernels

Cognit. Comput.