Elsevier

Neurocomputing

Volume 71, Issues 4–6, January 2008, Pages 576-583
Neurocomputing

Incremental extreme learning machine with fully complex hidden nodes

https://doi.org/10.1016/j.neucom.2007.07.025Get rights and content

Abstract

Huang et al. [Universal approximation using incremental constructive feedforward networks with random hidden nodes, IEEE Trans. Neural Networks 17(4) (2006) 879–892] has recently proposed an incremental extreme learning machine (I-ELM), which randomly adds hidden nodes incrementally and analytically determines the output weights. Although hidden nodes are generated randomly, the network constructed by I-ELM remains as a universal approximator. This paper extends I-ELM from the real domain to the complex domain. We show that, as long as the hidden layer activation function is complex continuous discriminatory or complex bounded nonlinear piecewise continuous, I-ELM can still approximate any target functions in the complex domain. The universal capability of the I-ELM in the complex domain is further verified by two function approximations and one channel equalization problems.

Introduction

Single-hidden layer feedforward neural networks (SLFNs) have attracted extensive interest in many research and application fields due to their approximation capability [3], [24], [27]. According to the conventional learning theories [3], [24], [27], all hidden node parameters (the input weights ai of the connections linking the input layer to the hidden layer and the biases bi of the additive hidden nodes, or the centers ai and the impact factors bi of the RBF hidden nodes) need to be tuned in order to make SLFNs work as universal approximators. Several researchers [4], [9], [18], [26] have independently found that the input weights or centers ai need not be tuned.

  • (1)

    Baum [4] has claimed that (seen from simulations) one may fix the weights of the connections on one level and simply adjust the connections on the other level and no (significant) gain is possible by using an algorithm able to adjust the weights on both levels simultaneously. Baum [4] did not discuss whether all the hidden node biases bi should be set with the same value. Baum [4] did not discuss either whether the hidden node biases bi should be tuned or not.

  • (2)

    Lowe [26] found that from an interpolation (instead of universal approximation) point of view the centers ai of RBF hidden nodes can be randomly selected from the training data instead of tuning. In Lowe's learning model, the impact factor bi of RBF hidden nodes is not randomly selected and it depends on the spread of the training data sets. Furthermore all the impact factors bi are usually set with the same value [26, p. 173]. Seen from Broomhead and Lowe [5], Lowe et al. [5], [26] in fact focuses on a specific RBF network with the same impact factor b assigned to all the RBF hidden nodes: fn(x)=i=1nβig(bx-ai), xRd (cf. [5, Eq. (2.2)]). If RBF centers and impact factors are selected based on the training data, it may give advantages to the training data and thus easily causes overfitting. (ELM works on generalized feedforward network [10], [11] and RBF hidden node type is just one of the specific case of ELM. Different from the RBF network presented in Lowe et al. [5], [26], the main RBF network interested by ELM is fn(x)=i=1nβig(bix-ai) where the RBF hidden nodes are not requested to have the same impact factors bi.) Interestingly, RBF networks fn(x)i=1nβig(bx-ai) with randomly generated centers ai and randomly generated same values of impact factors b in fact does not generally have the universal approximation capability, in contrast, RBF networks fn(x)=i=1nβig(bix-ai) with randomly generated centers ai and randomly generated impact factors bi does generally have the universal approximation capability [10], [11].

  • (3)

    Igelnik and Pao [18] proposed a random vector version of the functional-link (RVFL) net. In RVFL model, the input weights ai are “uniformly” drawn from a probabilistic space Vαd=[0,αΩ]×[-αΩ,αΩ]d-1 (d: the input dimension). The hidden node biases bi depend on the weights ai and some other parameters yi and ui: bi=-(αai·yi+ui), where yi and ui are randomly generated from [0,1]d and [-2Ω,2Ω]. α and Ω have to be determined in the learning stage and depends on the training data distribution. However, Igelnik and Pao [18] does not show how to determine α and Ω in the learning stage.

  • (4)

    Ferrari and Stengel [9] also found that the input weights ai need not be trained, however, similar to Igelnik and Pao [18], Ferrari and Stengel [9] thought that there should have some dependence between the hidden node biases bi, the weights ai and the training data.

Thus, strictly speaking, in all the previous works [4], [5], [9], [18], [26] the so-called “randomly” generated hidden node parameters are not completely independent of the training data. For example, the hidden node biases or the impact factors bi can only be generated after seeing the training data. In this sense, these works still belong to the conventional tuning-based learning models where the hidden node parameters are generated only after the training data are presented.

Different from [4], [5], [9], [18], [26], according to the best of our knowledge Tamura and Tateishi [28] first proves that from the interpolation (instead of universal approximation) point of view SLFNs with an infinite differentiable sigmoid activation function and with both randomly generated input weights ai and hidden node biases bi can approximate the training data with arbitrarily small errors. In Tamura and Tateishi's model [28] both the input weights ai and hidden node biases bi can be randomly generated fully independently from the training data. There is no necessary relationship between the input weights ai and the hidden node biases bi either. However, for SLFN cases, in order to learn N distinct training data N hidden nodes are required, which leads to overfitting and may not work well in practical applications. Furthermore, generally speaking, Tamura and Tateishi's model [28] does not have the universal approximation capability which is required by all the function approximators.

Based on these earlier works, recently, Huang et al. [10], [12], [13], [14], [15], [16], [17], [25] have proposed a series of novel learning methods called extreme learning machines (ELM) for different applications. Different from the above-mentioned semi-tuning-based learning methods [4], [5], [9], [18], [26] which, strictly speaking, only randomly select the input weights or centers ai instead of all parameters of the hidden nodes, ELM is fully automatically implemented and in theory no intervention is required from users, all the hidden node parameters ai and bi are randomly generated independently of the target functions and the training patterns. We found that from the function approximation point of view there is no relationship between ai and bi and the hidden node parameters can be irrelevant to the target functions and the training data. The output layer weights can then be analytically determined by using a least-squares method. Since ELM does not adjust hidden node parameters and need not find the relationship between the input weights (or RBF centers) ai and the hidden node bias (or impact factors) bi, ELM is extremely simple and can run extremely fast. Huang et al. [11] has proved the universal approximation capability of ELM in an incremental method (I-ELM). ELM with any bounded nonlinear piecewise continuous activation functions can work as universal approximators. For example, ELM can be used to train SLFNs with a hardlimit type of hidden layer which cannot be handled by all the earlier methods [4], [5], [9], [18], [26]. Huang and Chen [10] has recently extended the earlier work [11] to more generalized cases and shows that: if SLFNs (with piecewise continuous computational hidden nodes) can work as universal approximators with adjustable hidden parameters, from the function approximation point of view the hidden node parameters of such “generalized” SLFNs (including sigmoid networks, RBF networks, trigonometric networks, threshold networks, high-order networks, etc.) can actually be randomly generated according to any continuous sampling distribution. Most of these works are focused on the real domain.

Li et al. [25] have extended ELM from the real domain to the complex domain which is referred to as C-ELM, but its universal approximation capability has not been investigated yet. Different from many other complex domain learning algorithms, C-ELM can be applied in SLFNs with fully complex instead of complex-valued activation functions. Although neural networks have been successfully used in complex fields such as wireless and mobile communication applications [6], [7], [19], it faces the challenge in finding proper nonlinear fully complex activation functions to construct neural networks to process complex signal [20], [21], [22]. According to complex analysis, there may exist some conflicts between the boundedness and the differentiability of complex function in the entire complex domain [22]. A bounded analytic (differentiable at every point zC) function must be a constant in the complex domain C. Recently, Kim and Adali [20] proved the approximation capability of SLFNs with tunable hidden nodes and with fully complex activation functions.

In this paper, we further extend I-ELM into complex domain, we rigorously prove that I-ELM and C-ELM with fully complex activation functions and with randomly generated hidden nodes independent of the training data can work as universal approximators. More generally, in both I-ELM and C-ELM, the hidden nodes need not be a single additive type, a multiplicative combination of multiple complex additive nodes can be used in the hidden layer.

Section snippets

Review of I-ELM in real domain

In this section, we first introduce the I-ELM [11] which in the real domain adds randomly generated hidden nodes incrementally. The hidden node parameters ai and bi in I-ELM are not only independent of each other and the training data.

Without any loss of generality, we assume that the network has only one linear output node. All the analysis can be easily extended into multi-nonlinear output nodes cases. A standard SLFNs functions with n hidden nodes can be represented byfn(x)=i=1nβigi(x),xRd,

Function approximation

In this subsection we can first show that any continuous target function f:CdC can be approximated with any arbitrarily small error by an incremental fully complex ELM where the complex hidden nodes are randomly added one by one and will be fixed once added. In fact, given any complex continuous discriminatory or any complex bounded nonlinear piecewise continuous function σ:CC, and any randomly generated function sequence {gi(z)}:gi(z)=l=1siσ(ail·z+bi),where ail and bi are randomly generated

Experimental verification

In the previous section, we have provided our theoretical justification for the incremental feedforward networks in the complex domain. In this section, simulation results are given to verify the theory.

For the sake of simplicity, we demonstrate the universal approximation capability of complex I-ELM with one additive hidden layer (si=1) and with three fully complex activation functions: arcsin(z)=0zdt/(1-t)1/2, arccos(z)=0zdt/(1-t2)1/2, and arcsinh(z)=0zdt/(1+t2)1/2, where zC. All

Conclusions

In this paper, we show that the complex SLFNs using the proposed incremental algorithm (I-ELM) can approximate any target continuous functions in complex domain. Each hidden node in I-ELM can be a single additive node or a multiplicative combination of additive nodes. In contrast to tuning-based learning algorithms, our tuning-free I-ELM does not requires any intervention from users. The proposed I-ELM can be applied to a wide range of complex activation functions which may be differentiable or

Guang-Bin Huang received his B.Sc. degree in applied mathematics and M.Eng. degree in computer engineering from Northeastern University, PR China, in 1991 and 1994, respectively, and Ph.D. degree in electrical engineering from Nanyang Technological University, Singapore in 1999. During his undergraduate period, he also concurrently studied in Wireless Communication, Department of Northeastern University, PR China.

From June 1998 to May 2001, he worked as Research Fellow in Singapore Institute of

References (27)

  • S. Ferrari et al.

    Smooth function approximation using neural networks

    IEEE Trans. Neural Networks

    (2005)
  • G.-B. Huang et al.

    Convex incremental extreme learning machine

    Neurocomputing

    (2007)
  • G.-B. Huang et al.

    Universal approximation using incremental constructive feedforward networks with random hidden nodes

    IEEE Trans. Neural Networks

    (2006)
  • Cited by (250)

    • The universal approximation theorem for complex-valued neural networks

      2023, Applied and Computational Harmonic Analysis
    • Gaussian-type activation function with learnable parameters in complex-valued convolutional neural network and its application for PolSAR classification

      2023, Neurocomputing
      Citation Excerpt :

      The difficulty of complex-valued activation function design makes it an important issue for CV-CNN [32]. Many scholars have conducted in-depth research on complex-valued activation [33–38]. A common approach is that the real-valued activation of the real and imaginary parts of the neuron is performed separately [39–41], called the real-imaginary-type activation function (RIAF).

    • Deterministic Multi-kernel based extreme learning machine for pattern classification

      2021, Expert Systems with Applications
      Citation Excerpt :

      Han F. et al. (Han and Huang (2006)) proposed the improved ELM that encodes priori information for fast function approximation. An incremental ELM was introduced by Huang et al. (Huang and Chen (2007), Huang, Li, Chen, and Siew (2008)) in which hidden neurons were gradually increased. One of the variant of incremental approach given by Feng et al. (Feng, Huang, Lin, and Gay (2009)) was referred as error-minimized ELM.

    View all citing articles on Scopus

    Guang-Bin Huang received his B.Sc. degree in applied mathematics and M.Eng. degree in computer engineering from Northeastern University, PR China, in 1991 and 1994, respectively, and Ph.D. degree in electrical engineering from Nanyang Technological University, Singapore in 1999. During his undergraduate period, he also concurrently studied in Wireless Communication, Department of Northeastern University, PR China.

    From June 1998 to May 2001, he worked as Research Fellow in Singapore Institute of Manufacturing Technology (formerly known as Gintic Institute of Manufacturing Technology) where he has led/implemented several key industrial projects. From May 2001, he has been working as an Assistant Professor in the School of Electrical and Electronic Engineering, Nanyang Technological University. His current research interests include extreme learning machine, machine learning, bioinformatics and networking. He is an associate editor of IEEE Transactions on Systems, Man and Cybernetics—Part B and Neurocomputing. He is a senior member of the IEEE.

    Lei Chen received his B.Sc. degree in applied mathematics and his M.Sc. degree in operational research and control theory from Northeastern University, PR China, in 1999 and 2002, respectively, and his Ph.D. degree in electrical and electronic engineering from Nanyang Technological University, Singapore, in 2007. Now he is a postdoctoral fellow in National University of Singapore, Singapore. His research interests include artificial neural networks, pattern recognition and machine learning.

    Ming-Bin Li was born in Liaoning, China, in 1975. He received his B.Eng. degree from the Shenyang Institute of Technology, China in 1998 and his M.Eng. degree from Northeastern University, China, in 2001. He obtained his Ph.D. degree in Nanyang Technological University (NTU), Singapore in 2006. He is currently a research associate at Intelligent Systems Centre, Nanyang Technological University.

    His main research interests include neural network, fuzzy logic, system modeling, channel equalization and dynamic control.

    Chee-Kheong Siew obtained his B.Eng., M.Sc. and Ph.D. from University of Singapore, Imperial College, UK and NTU, Singapore, respectively. He is currently an Associate Professor in the School of EEE, Nanyang Technological University (NTU), Singapore. From 1995 to 2005, he served as the Head of Information Communication Institute of Singapore after he managed the transfer of ICIS to NTU and rebuilt the institute in the university environment. After 6 years in the industry, he joined NTU in 1986 and was appointed as the Head of the Institute in 1996. He has served in various conference technical program committees and also as reviewer for various journals. His current research interests include neural networks, packet scheduling, traffic shaping, admission control, service curves and admission control, QoS framework, congestion control, multipath routing and intelligent networks. He is a member of IEEE.

    View full text