An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels

Huang, Guang-Bin

doi:10.1007/s12559-014-9255-2

An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels

Published: 03 April 2014

Volume 6, pages 376–390, (2014)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Guang-Bin Huang¹

6607 Accesses
773 Citations
3 Altmetric
Explore all metrics

Abstract

Extreme learning machines (ELMs) basically give answers to two fundamental learning problems: (1) Can fundamentals of learning (i.e., feature learning, clustering, regression and classification) be made without tuning hidden neurons (including biological neurons) even when the output shapes and function modeling of these neurons are unknown? (2) Does there exist unified framework for feedforward neural networks and feature space methods? ELMs that have built some tangible links between machine learning techniques and biological learning mechanisms have recently attracted increasing attention of researchers in widespread research areas. This paper provides an insight into ELMs in three aspects, viz: random neurons, random features and kernels. This paper also shows that in theory ELMs (with the same kernels) tend to outperform support vector machine and its variants in both regression and classification applications with much easier implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Notes

Instead of the ambiguous word “randomness” such as in “random features” and “random networks,” “Extreme” here means to move beyond conventional artificial learning techniques and to move toward brain alike learning. ELM aims to break the barriers between the conventional artificial learning techniques and biological learning mechanism. “Extreme learning machine (ELM)” represents a suite of machine learning techniques in which hidden neurons need not be tuned. This includes but is not limited to random hidden nodes, it also includes kernels. On the other hand, instead of only considering network architecture such as randomness and kernels, in theory ELM also somehow unifies brain learning features, neural network theory, control theory, matrix theory, and linear system theory which were considered isolated with big gaps before. Details can be found in this paper.
We would like thank Halbert White for the fruitful discussions on ELM during our personal communications and meetings in 2011.
We would like to thank Boris Igelnik for discussing the relationship and difference between RVFL and ELM in our personal communication, and for sharing the RVFL patent information.
This is also the reason why SVM and its variants focus on kernels while ELM is valid for both kernel and non-kernel cases.
We would like to thank Johan A. K. Suykens for showing us the analysis of the role of the bias b of LS-SVM in their monograph [68] in our personal communication.
Here, we only consider ELM specially for binary classification applications which SVM and LS-SVM can handle. However, ELM solutions need not be tightened in binary cases, the same solution can be applied to multi-class cases and regression cases.
This dilemma may have existed to other random methods with biases in the output nodes [40] if the structure risks were considered in order to improve the generalization performance. In this case, Schmidt et al. [40] would provide suboptimal solutions too. Furthermore, to our best knowledge, all of those random methods [31, 40] have not considered structure risks at all and thus may become overfitting easily.
We thank Bernard Widrow for mentioning the potential links between Rosenblatt’s perceptron and ELM in our personal communications.

References

Cortes C, Vapnik V. Support vector networks. Mach Learn. 1995;20(3):273–97.
Google Scholar
Suykens JAK, Vandewalle J. Least squares support vector machine classifiers. Neural Process Lett. 1999;9(3):293–300.
Article Google Scholar
Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of international joint conference on neural networks (IJCNN2004), vol. 2, (Budapest, Hungary); 2004. p. 985–990, 25–29 July.
Li M-B, Huang G-B, Saratchandran P, Sundararajan N. Fully complex extreme learning machine. Neurocomputing 2005;68:306–14.
Article Google Scholar
Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: theory and applications. Neurocomputing. 2006;70:489–501.
Article Google Scholar
Huang G-B, Chen L, Siew C-K. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw. 2006;17(4):879–92.
Article PubMed Google Scholar
Huang G-B, Chen L. Convex incremental extreme learning machine. Neurocomputing. 2007;70:3056–62.
Article Google Scholar
Miche Y, Sorjamaa A, Bas P, Simula O, Jutten C, Lendasse A. OP-ELM: optimally pruned extreme learning machine. IEEE Trans Neural Netw. 2010;21(1):158–62.
Article PubMed Google Scholar
Frénay B, Verleysen M. Using SVMs with randomised feature spaces: an extreme learning approach. In: Proceedings of the 18th European symposium on artificial neural networks (ESANN), (Bruges, Belgium); 2010. pp. 315–320, 28–30 April.
Frénay B, Verleysen M. Parameter-insensitive kernel in extreme learning for non-linear support vector regression. Neurocomputing. 2011;74:2526–31.
Article Google Scholar
Cho JS, White H. Testing correct model specification using extreme learning machines. Neurocomputing. 2011;74(16):2552–65.
Article Google Scholar
Soria-Olivas E, Gomez-Sanchis J, Martin JD, Vila-Frances J, Martinez M, Magdalena JR, Serrano AJ. BELM: Bayesian extreme learning machine. IEEE Trans Neural Netw. 2011;22(3):505–9.
Article PubMed Google Scholar
Xu Y, Dong ZY, Meng K, Zhang R, Wong KP. Real-time transient stability assessment model using extreme learning machine. IET Gener Transm Distrib. 2011;5(3):314–22.
Article Google Scholar
Saxe AM, Koh PW, Chen Z, Bhand M, Suresh B, Ng AY. On random weights and unsupervised feature learning. In: Proceedings of the 28th international conference on machine learning, (Bellevue, USA); 2011. 28 June–2 July.
Saraswathi S, Sundaram S, Sundararajan N, Zimmermann M, Nilsen-Hamilton M. ICGA-PSO-ELM approach for accurate multiclass cancer classification resulting in reduced gene sets in which genes encoding secreted proteins are highly represented. IEEE-ACM Trans Comput Biol Bioinform. 2011;6(2):452–63.
Article Google Scholar
Minhas R, Mohammed AA, Wu QMJ. Incremental learning in human action recognition based on snippets. IEEE Trans Circuits Syst Video Technol. 2012;22(11):1529–41.
Article Google Scholar
Decherchi S, Gastaldo P, Leoncini A, Zunino R. Efficient digital implementation of extreme learning machines for classification. IEEE Trans Circuits Syst II. 2012;59(8):496–500.
Article Google Scholar
Gastaldo P, Zunino R, Cambria E, Decherchi S. Combining ELMs with random projections. IEEE Intell Syst. 2013;28(6):46–8.
Google Scholar
Lin J, Yin J, Cai Z, Liu Q, Li K, Leung VC. A secure and practical mechanism for outsourcing ELMs in cloud computing. IEEE Intell Syst. 2013;28(6):35–8.
Google Scholar
Akusok A, Lendasse A, Corona F, Nian R, Miche Y. ELMVIS: a nonlinear visualization technique using random permutations and ELMs. IEEE Intell Syst. 2013;28(6):41–6.
Google Scholar
Fletcher R. Practical methods of optimization: volume 2 constrained optimization. New York:Wiley; 1981.
Google Scholar
Werbos PJ. Beyond regression: New tools for prediction and analysis in the behavioral sciences. Ph.D. thesis, Harvord University; 1974.
Rumelhart DE, Hinton GE, Williams RJ. Learning internal representations by error propagation. In: Rumelhart DE, McClelland JL, editors. Parallel distributed processing: explorations in the microstructures of cognition, vol: foundations. Cambridge, MA: MIT Press; 1986. p. 318–62.
Google Scholar
Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagation errors. Nature. 1986;323:533–6.
Article Google Scholar
Werbos PJ. The roots of backpropagation : from ordered derivatives to neural networks and political forecasting. New York:Wiley; 1994.
Google Scholar
Huang G-B, Chen L. Enhanced random search based incremental extreme learning machine. Neurocomputing. 2008;71:3460–8.
Article Google Scholar
Sosulski DL, Bloom ML, Cutforth T, Axel R, Datta SR. Distinct representations of olfactory information in different cortical centres. Nature. 2011;472:213–6.
Article CAS PubMed Central PubMed Google Scholar
Eliasmith C, Stewart TC, Choo X, Bekolay T, DeWolf T, Tang Y, Rasmussen D. A large-scale model of the functioning brain. Science. 2012;338:1202–5.
Article CAS PubMed Google Scholar
Barak O, Rigotti M, Fusi S. The sparseness of mixed selectivity neurons controls the generalization–discrimination trade-off. J Neurosci. 2013;33(9):3844–56.
Article CAS PubMed Google Scholar
Rigotti M, Barak O, Warden MR, Wang X-J, Daw ND, Miller EK, Fusi S. The importance of mixed selectivity in complex cognitive tasks. Nature. 2013;497:585–90.
Article CAS PubMed Google Scholar
Igelnik B, Pao Y-H. Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Trans Neural Netw. 1995;6(6):1320–9.
Article CAS PubMed Google Scholar
Huang G-B, Zhou H, Ding X, Zhang R. Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B. 2012;42(2):513–29.
Article Google Scholar
Rahimi A, Recht B. Uniform approximation of functions with random bases. In: Proceedings of the 2008 46th annual allerton conference on communication, control, and computing, p. 555–561, 23–26 Sept 2008.
Huang G-B, Zhu Q-Y, Mao KZ, Siew C-K, Saratchandran P, Sundararajan N. Can threshold networks be trained directly? IEEE Trans Circuits Syst II. 2006;53(3):187–91.
Article Google Scholar
Bartlett PL. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans Inform Theory. 1998;44(2):525–36.
Article Google Scholar
Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev. 1958;65(6):386–408.
Article CAS PubMed Google Scholar
Rosenblatt F. Principles of Neurodynamics: perceptrons and the theory of brain mechanisms. New York:Spartan Books; 1962.
Google Scholar
Block HD. The perceptron: a model for brain function. I. Rev Modern Phys. 1962;34(1):123–35.
Article Google Scholar
Block HD, Knight JBW, Rosenblatt F. Analysis of a four-layer series-coupled perceptron. II. Rev Modern Phys. 1962;34(1):135–42.
Article Google Scholar
Schmidt WF, Kraaijveld MA, Duin RP. Feed forward neural networks with random weights. In: Proceedings of 11th IAPR international conference on pattern recognition methodology and systems, (Hague, Netherlands); 1992. p. 1–4.
White H. An additional hidden unit test for neglected nonlinearity in multilayer feedforward networks. In: Proceedings of the international conference on neural networks. 1989. p. 451–455.
White H. Approxiate nonlinear forecasting methods. In: Elliott G, Granger CWJ, Timmermann A, editors. Handbook of economics forecasting. New York: Elsevier; 2006. p. 460–512.
Loone SM, Irwin GW. Improving neural network training solutions using regularisation. Neurocomputing. 2001;37:71–90.
Article Google Scholar
Serre D. Matrices: theory and applications. New York:Springer; 2002.
Google Scholar
Rao CR, Mitra SK. Generalized Inverse of matrices and its applications. New York:Wiley; 1971.
Google Scholar
Fernández-Delgado M, Cernadas E, Barro S, Ribeiro J, Nevesb J. Direct kernel perceptron (DKP): Ultra-fast kernel elm-based classification with non-iterative closed-form weight calculation. Neural Netw. 2014;50(1):60–71.
Article PubMed Google Scholar
Widrow B, Greenblatt A, Kim Y, Park D. The no-prop algorithm: A new learning algorithm for multilayer neural networks. Neural Netw. 2013;37:182–8.
Article PubMed Google Scholar
Toms DJ. Training binary node feedforward neural networks by backpropagation of error. Electron Lett. 1990;26(21):1745–6.
Article Google Scholar
Corwin EM, Logar AM, Oldham WJB. An iterative method for training multilayer networks with threshold function. IEEE Trans Neural Netw. 1994;5(3):507–8.
Article CAS PubMed Google Scholar
Goodman RM, Zeng Z. A learning algorithm for multi-layer perceptrons with hard-limiting threshold units. In: Proceedings of the 1994 IEEE workshop of neural networks for signal processing. 1994. p. 219–228.
Plagianakos VP, Magoulas GD, Nousis NK, Vrahatis MN. Training multilayer networks with discrete activation functions. In: Proceedings of the IEEE international joint conference on neural networks (IJCNN’2001), Washington D.C., U.S.A.; 2001.
Huang G-B, Ding X, Zhou H. Optimization method based extreme learning machine for classification. Neurocomputing. 2010;74:155–63.
Article Google Scholar
Bai Z, Huang G-B, Wang D, Wang H, Westover MB. Sparse extreme learning machine for classification. IEEE Trans Cybern. 2014. doi:10.1109/TCYB.2014.2298235.
Pao Y-H, Park G-H, Sobajic DJ. Learning and generalization characteristics of the random vector functional-link net. Neurocomputing. 1994;6:163–80.
Article Google Scholar
Huang G, Song S, Gupta JND, Wu C. Semi-supervised and unsupervised extreme learning machines. IEEE Trans Cybern. 2014. doi:10.1109/TCYB.2014.2307349.
Huang G-B, Li M-B, Chen L, Siew C-K. Incremental extreme learning machine with fully complex hidden nodes. Neurocomputing. 2008;71:576–83.
Article Google Scholar
Lee T-H, White H, Granger CWJ. Testing for neglected nonlinearity in time series modes: a comparison of neural network methods and standard tests. J Econ. 1993;56:269–90.
Article Google Scholar
Stinchcombe MB, White H. Consistent specification testing with nuisance parameters present only under the alternative. Econ Theory. 1998;14:295–324.
Article Google Scholar
Baum E. On the capabilities of multilayer perceptrons. J Complexity. 1988;4:193–215.
Article Google Scholar
Le Q, Sarlós T, Smola A. Fastfood approximating kernel expansions in loglinear time. In: Proceedings of the 30th international conference on machine learning, (Atlanta, USA), 16–21 June 2013.
Huang P-S, Deng L, Hasegawa-Johnson M, He X. Random features for kernel deep convex network. In: Proceedings of the 38th international conference on acoustics, speech, and signal processing (ICASSP 2013), Vancouver, Canada, 26–31 May 2013.
Lin J, Yin J, Cai Z, Liu Q, Li K, Leung VC. A secure and practical mechanism for outsourcing elms in cloud computing. IEEE Intell Syst. 2013;28(6):7–10.
Google Scholar
Rahimi A, Recht B. Random features for large-scale kernel machines. In: Proceedings of the 2007 neural information processing systems (NIPS2007), 3–6 Dec 2007. p. 1177–1184.
Kasun LLC, Zhou H, Huang G-B, Vong CM. Representational learning with extreme learning machine for big data. IEEE Intell Syst 2013;28(6):31–4.
Google Scholar
Fung G, Mangasarian OL. Proximal support vector machine classifiers. In: International conference on knowledge discovery and data mining, San Francisco, California, USA, 2001. p. 77–86.
Daubechies I. Orthonormal bases of compactly supported wavelets. Commun Pure Appl Math. 1988;41:909–96.
Article Google Scholar
Daubechies I. The wavelet transform, time-frequency localization and signal analysis. IEEE Trans Inform Theory. 1990;36(5):961–1005.
Article Google Scholar
Suykens JAK, Gestel TV, Brabanter JD, Moor BD, Vandewalle J. Least squares support vector machines. Singapore: World Scientific; 2002.
Book Google Scholar
Poggio T, Mukherjee S, Rifkin R, Rakhlin A, Verri A. “b,” (A.I. Memo No. 2001–011, CBCL Memo 198, Artificial Intelligence Laboratory, Massachusetts Institute of Technology), 2001.
Steinwart I, Hush D, Scovel C. Training SVMs without offset. J Mach Learn Res .2011;12(1):141–202.
Google Scholar
Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67.
Article Google Scholar
Kaski S. Dimensionality reduction by random mapping: fast similarity computation for clustering. In: Proceedings of the 1998 IEEE international joint conference on neural networks, Anchorage, USA, 4–9 May 1998.
Pearson K. On lines and planes of closest fit to systems of points in space. Philos Mag. 1901;2:559–72.
Article Google Scholar
von Neumann J. The general and logical theory of automata. In: Jeffress LA, editor. Cerebral mechanisms in behavior. New York: Wiley; 1951. p. 1–41. 1951.
von Neumann J. Probabilistic logics and the synthesis of reliable organisms from unreliable components. In: Shannon CE, McCarthy J, editors. Automata studies. Princeton: Princeton University Press; 1956. p. 43–98.
Minhas R, Baradarani A, Seifzadeh S, Wu QMJ. Human action recognition using extreme learning machine based on visual vocabularies. Neurocomputing. 2010;73:1906–17.
Article Google Scholar
Wang J, Kumar S, Chang S-F. Semi-supervised hashing for large-scale search. IEEE Trans Pattern Anal Mach Intell. 2012;34(12):2393–406.
Article PubMed Google Scholar
He Q, Jin X, Du C, Zhuang F, Shi Z. Clustering in extreme learning machine feature space. Neurocomputing. 2014;128:88–95.
Article Google Scholar
Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y. What is the best multi-stage architecture for object recognition. In: Proceedings of the 2009 IEEE 12th international conference on computer vision, Kyoto, Japan, 29 Sept–2 Oct 2009.
Pinto N, Doukhan D, DiCarlo JJ, Cox DD. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLoS Comput Biol. 2009;5(11):1–12.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, Singapore, 639798, Singapore
Guang-Bin Huang

Authors

Guang-Bin Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guang-Bin Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, GB. An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels. Cogn Comput 6, 376–390 (2014). https://doi.org/10.1007/s12559-014-9255-2

Download citation

Received: 25 January 2014
Accepted: 13 March 2014
Published: 03 April 2014
Issue Date: September 2014
DOI: https://doi.org/10.1007/s12559-014-9255-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels

Abstract

Access this article

Similar content being viewed by others

Extreme learning machines: new trends and applications

What are Extreme Learning Machines? Filling the Gap Between Frank Rosenblatt’s Dream and John von Neumann’s Puzzle

A Connection between Extreme Learning Machine and Neural Network Kernel

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels

Abstract

Access this article

Similar content being viewed by others

Extreme learning machines: new trends and applications

What are Extreme Learning Machines? Filling the Gap Between Frank Rosenblatt’s Dream and John von Neumann’s Puzzle

A Connection between Extreme Learning Machine and Neural Network Kernel

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation