Incremental extreme learning machine with fully complex hidden nodes
Introduction
Single-hidden layer feedforward neural networks (SLFNs) have attracted extensive interest in many research and application fields due to their approximation capability [3], [24], [27]. According to the conventional learning theories [3], [24], [27], all hidden node parameters (the input weights of the connections linking the input layer to the hidden layer and the biases of the additive hidden nodes, or the centers and the impact factors of the RBF hidden nodes) need to be tuned in order to make SLFNs work as universal approximators. Several researchers [4], [9], [18], [26] have independently found that the input weights or centers need not be tuned.
- (1)
Baum [4] has claimed that (seen from simulations) one may fix the weights of the connections on one level and simply adjust the connections on the other level and no (significant) gain is possible by using an algorithm able to adjust the weights on both levels simultaneously. Baum [4] did not discuss whether all the hidden node biases should be set with the same value. Baum [4] did not discuss either whether the hidden node biases should be tuned or not.
- (2)
Lowe [26] found that from an interpolation (instead of universal approximation) point of view the centers of RBF hidden nodes can be randomly selected from the training data instead of tuning. In Lowe's learning model, the impact factor of RBF hidden nodes is not randomly selected and it depends on the spread of the training data sets. Furthermore all the impact factors are usually set with the same value [26, p. 173]. Seen from Broomhead and Lowe [5], Lowe et al. [5], [26] in fact focuses on a specific RBF network with the same impact factor b assigned to all the RBF hidden nodes: , (cf. [5, Eq. (2.2)]). If RBF centers and impact factors are selected based on the training data, it may give advantages to the training data and thus easily causes overfitting. (ELM works on generalized feedforward network [10], [11] and RBF hidden node type is just one of the specific case of ELM. Different from the RBF network presented in Lowe et al. [5], [26], the main RBF network interested by ELM is where the RBF hidden nodes are not requested to have the same impact factors .) Interestingly, RBF networks with randomly generated centers and randomly generated same values of impact factors b in fact does not generally have the universal approximation capability, in contrast, RBF networks with randomly generated centers and randomly generated impact factors does generally have the universal approximation capability [10], [11].
- (3)
Igelnik and Pao [18] proposed a random vector version of the functional-link (RVFL) net. In RVFL model, the input weights are “uniformly” drawn from a probabilistic space (d: the input dimension). The hidden node biases depend on the weights and some other parameters and : , where and are randomly generated from and . and have to be determined in the learning stage and depends on the training data distribution. However, Igelnik and Pao [18] does not show how to determine and in the learning stage.
- (4)
Ferrari and Stengel [9] also found that the input weights need not be trained, however, similar to Igelnik and Pao [18], Ferrari and Stengel [9] thought that there should have some dependence between the hidden node biases , the weights and the training data.
Different from [4], [5], [9], [18], [26], according to the best of our knowledge Tamura and Tateishi [28] first proves that from the interpolation (instead of universal approximation) point of view SLFNs with an infinite differentiable sigmoid activation function and with both randomly generated input weights and hidden node biases can approximate the training data with arbitrarily small errors. In Tamura and Tateishi's model [28] both the input weights and hidden node biases can be randomly generated fully independently from the training data. There is no necessary relationship between the input weights and the hidden node biases either. However, for SLFN cases, in order to learn N distinct training data N hidden nodes are required, which leads to overfitting and may not work well in practical applications. Furthermore, generally speaking, Tamura and Tateishi's model [28] does not have the universal approximation capability which is required by all the function approximators.
Based on these earlier works, recently, Huang et al. [10], [12], [13], [14], [15], [16], [17], [25] have proposed a series of novel learning methods called extreme learning machines (ELM) for different applications. Different from the above-mentioned semi-tuning-based learning methods [4], [5], [9], [18], [26] which, strictly speaking, only randomly select the input weights or centers instead of all parameters of the hidden nodes, ELM is fully automatically implemented and in theory no intervention is required from users, all the hidden node parameters and are randomly generated independently of the target functions and the training patterns. We found that from the function approximation point of view there is no relationship between and and the hidden node parameters can be irrelevant to the target functions and the training data. The output layer weights can then be analytically determined by using a least-squares method. Since ELM does not adjust hidden node parameters and need not find the relationship between the input weights (or RBF centers) and the hidden node bias (or impact factors) , ELM is extremely simple and can run extremely fast. Huang et al. [11] has proved the universal approximation capability of ELM in an incremental method (I-ELM). ELM with any bounded nonlinear piecewise continuous activation functions can work as universal approximators. For example, ELM can be used to train SLFNs with a hardlimit type of hidden layer which cannot be handled by all the earlier methods [4], [5], [9], [18], [26]. Huang and Chen [10] has recently extended the earlier work [11] to more generalized cases and shows that: if SLFNs (with piecewise continuous computational hidden nodes) can work as universal approximators with adjustable hidden parameters, from the function approximation point of view the hidden node parameters of such “generalized” SLFNs (including sigmoid networks, RBF networks, trigonometric networks, threshold networks, high-order networks, etc.) can actually be randomly generated according to any continuous sampling distribution. Most of these works are focused on the real domain.
Li et al. [25] have extended ELM from the real domain to the complex domain which is referred to as C-ELM, but its universal approximation capability has not been investigated yet. Different from many other complex domain learning algorithms, C-ELM can be applied in SLFNs with fully complex instead of complex-valued activation functions. Although neural networks have been successfully used in complex fields such as wireless and mobile communication applications [6], [7], [19], it faces the challenge in finding proper nonlinear fully complex activation functions to construct neural networks to process complex signal [20], [21], [22]. According to complex analysis, there may exist some conflicts between the boundedness and the differentiability of complex function in the entire complex domain [22]. A bounded analytic (differentiable at every point ) function must be a constant in the complex domain C. Recently, Kim and Adali [20] proved the approximation capability of SLFNs with tunable hidden nodes and with fully complex activation functions.
In this paper, we further extend I-ELM into complex domain, we rigorously prove that I-ELM and C-ELM with fully complex activation functions and with randomly generated hidden nodes independent of the training data can work as universal approximators. More generally, in both I-ELM and C-ELM, the hidden nodes need not be a single additive type, a multiplicative combination of multiple complex additive nodes can be used in the hidden layer.
Section snippets
Review of I-ELM in real domain
In this section, we first introduce the I-ELM [11] which in the real domain adds randomly generated hidden nodes incrementally. The hidden node parameters and in I-ELM are not only independent of each other and the training data.
Without any loss of generality, we assume that the network has only one linear output node. All the analysis can be easily extended into multi-nonlinear output nodes cases. A standard SLFNs functions with n hidden nodes can be represented by
Function approximation
In this subsection we can first show that any continuous target function can be approximated with any arbitrarily small error by an incremental fully complex ELM where the complex hidden nodes are randomly added one by one and will be fixed once added. In fact, given any complex continuous discriminatory or any complex bounded nonlinear piecewise continuous function , and any randomly generated function sequence :where and are randomly generated
Experimental verification
In the previous section, we have provided our theoretical justification for the incremental feedforward networks in the complex domain. In this section, simulation results are given to verify the theory.
For the sake of simplicity, we demonstrate the universal approximation capability of complex I-ELM with one additive hidden layer and with three fully complex activation functions: , , and , where . All
Conclusions
In this paper, we show that the complex SLFNs using the proposed incremental algorithm (I-ELM) can approximate any target continuous functions in complex domain. Each hidden node in I-ELM can be a single additive node or a multiplicative combination of additive nodes. In contrast to tuning-based learning algorithms, our tuning-free I-ELM does not requires any intervention from users. The proposed I-ELM can be applied to a wide range of complex activation functions which may be differentiable or
Guang-Bin Huang received his B.Sc. degree in applied mathematics and M.Eng. degree in computer engineering from Northeastern University, PR China, in 1991 and 1994, respectively, and Ph.D. degree in electrical engineering from Nanyang Technological University, Singapore in 1999. During his undergraduate period, he also concurrently studied in Wireless Communication, Department of Northeastern University, PR China.
From June 1998 to May 2001, he worked as Research Fellow in Singapore Institute of
References (27)
On the capabilities of multilayer perceptrons
J. Complexity
(1988)- et al.
Complex-valued radial basis function networks, part i: network architecture and learning algorithms
Signal Process.
(1994) - et al.
Complex-valued radial basis function networks, part II: application to digital communications channel equalization
Signal Process.
(1994) - et al.
Extreme learning machine: theory and applications
Neurocomputing
(2006) - et al.
Multilayer feedforward networks with a nonpolynomial activation function can approximate any function
Neural Networks
(1993) - et al.
Fully complex extreme learning machine
Neurocomputing
(2005) - et al.
Multilayer perceptrons to approximate complex valued functions
Int. J. Neural Syst.
(1995) Universal approximation bounds for superpositions of a sigmoid function
IEEE Trans. Inf. Theory
(1993)- et al.
Multivariable functional interpolation and adaptive networks
Complex Syst.
(1988) - et al.
Channel equalization using adaptive complex radial basis function networks
IEEE J. Sel. Areas Commun.
(1995)
Smooth function approximation using neural networks
IEEE Trans. Neural Networks
Convex incremental extreme learning machine
Neurocomputing
Universal approximation using incremental constructive feedforward networks with random hidden nodes
IEEE Trans. Neural Networks
Cited by (250)
The universal approximation theorem for complex-valued neural networks
2023, Applied and Computational Harmonic AnalysisGaussian-type activation function with learnable parameters in complex-valued convolutional neural network and its application for PolSAR classification
2023, NeurocomputingCitation Excerpt :The difficulty of complex-valued activation function design makes it an important issue for CV-CNN [32]. Many scholars have conducted in-depth research on complex-valued activation [33–38]. A common approach is that the real-valued activation of the real and imaginary parts of the neuron is performed separately [39–41], called the real-imaginary-type activation function (RIAF).
Green cover change detection using a modified adaptive ensemble of extreme learning machines for North-Western India
2021, Journal of King Saud University - Computer and Information SciencesDeterministic Multi-kernel based extreme learning machine for pattern classification
2021, Expert Systems with ApplicationsCitation Excerpt :Han F. et al. (Han and Huang (2006)) proposed the improved ELM that encodes priori information for fast function approximation. An incremental ELM was introduced by Huang et al. (Huang and Chen (2007), Huang, Li, Chen, and Siew (2008)) in which hidden neurons were gradually increased. One of the variant of incremental approach given by Feng et al. (Feng, Huang, Lin, and Gay (2009)) was referred as error-minimized ELM.
Guang-Bin Huang received his B.Sc. degree in applied mathematics and M.Eng. degree in computer engineering from Northeastern University, PR China, in 1991 and 1994, respectively, and Ph.D. degree in electrical engineering from Nanyang Technological University, Singapore in 1999. During his undergraduate period, he also concurrently studied in Wireless Communication, Department of Northeastern University, PR China.
From June 1998 to May 2001, he worked as Research Fellow in Singapore Institute of Manufacturing Technology (formerly known as Gintic Institute of Manufacturing Technology) where he has led/implemented several key industrial projects. From May 2001, he has been working as an Assistant Professor in the School of Electrical and Electronic Engineering, Nanyang Technological University. His current research interests include extreme learning machine, machine learning, bioinformatics and networking. He is an associate editor of IEEE Transactions on Systems, Man and Cybernetics—Part B and Neurocomputing. He is a senior member of the IEEE.
Lei Chen received his B.Sc. degree in applied mathematics and his M.Sc. degree in operational research and control theory from Northeastern University, PR China, in 1999 and 2002, respectively, and his Ph.D. degree in electrical and electronic engineering from Nanyang Technological University, Singapore, in 2007. Now he is a postdoctoral fellow in National University of Singapore, Singapore. His research interests include artificial neural networks, pattern recognition and machine learning.
Ming-Bin Li was born in Liaoning, China, in 1975. He received his B.Eng. degree from the Shenyang Institute of Technology, China in 1998 and his M.Eng. degree from Northeastern University, China, in 2001. He obtained his Ph.D. degree in Nanyang Technological University (NTU), Singapore in 2006. He is currently a research associate at Intelligent Systems Centre, Nanyang Technological University.
His main research interests include neural network, fuzzy logic, system modeling, channel equalization and dynamic control.
Chee-Kheong Siew obtained his B.Eng., M.Sc. and Ph.D. from University of Singapore, Imperial College, UK and NTU, Singapore, respectively. He is currently an Associate Professor in the School of EEE, Nanyang Technological University (NTU), Singapore. From 1995 to 2005, he served as the Head of Information Communication Institute of Singapore after he managed the transfer of ICIS to NTU and rebuilt the institute in the university environment. After 6 years in the industry, he joined NTU in 1986 and was appointed as the Head of the Institute in 1996. He has served in various conference technical program committees and also as reviewer for various journals. His current research interests include neural networks, packet scheduling, traffic shaping, admission control, service curves and admission control, QoS framework, congestion control, multipath routing and intelligent networks. He is a member of IEEE.