Elsevier

Neural Networks

Volume 140, August 2021, Pages 237-246
Neural Networks

Bidirectional stochastic configuration network for regression problems

https://doi.org/10.1016/j.neunet.2021.03.016Get rights and content

Highlights

  • A novel bidirectional stochastic configuration network (BSCN) was proposed to solve regression problems in this paper.

  • BSCN can greatly accelerate the training efficiency of the SCN model and make the model better in generalization ability and more compact in network structure.

  • The effectiveness of BSCN has been verified on extensive experiments.

  • BSCN provides a stable and fast modeling solution for platforms with limited computing ability.

Abstract

To adapt to the reality of limited computing resources of various terminal devices in industrial applications, a randomized neural network called stochastic configuration network (SCN), which can conduct effective training without GPU, was proposed. SCN uses a supervisory random mechanism to assign its input weights and hidden biases, which makes it more stable than other randomized algorithms but also leads to time-consuming model training. To alleviate this problem, we propose a novel bidirectional SCN algorithm (BSCN) in this paper, which divides the way of adding hidden nodes into two modes: forward learning and backward learning. In the forward learning mode, BSCN still uses the supervisory mechanism to configure the parameters of the newly added nodes, which is the same as SCN. In the backward learning mode, BSCN calculates the parameters at one time based on the residual error feedback of the current model. The two learning modes are performed iteratively until the prediction error of the model reaches an acceptable level or the number of hidden nodes reaches its maximum value. This semi-random learning mechanism greatly speeds up the training efficiency of the BSCN model and significantly improves the quality of the hidden nodes. Extensive experiments on ten benchmark regression problems, two real-life air pollution prediction problems, and a classical image processing problem show that BSCN can achieve faster training speed, higher stability, and better generalization ability than SCN.

Introduction

Randomized neural networks are a special type of feed-forward neural networks and its representative algorithms include the random vector functional link network (RVFL) (Pao & Takefuji, 1992), the neural network with random weights (NNRW) (Schmidt, Kraaijveld, & Duin, 1992), etc. The most notable feature of this type of neural networks is that their input weights (i.e., the weights between the input layer and the hidden layer) and hidden biases (i.e., the thresholds of hidden nodes) are assigned randomly according to certain rules and remain unchanged throughout the training process of the model, while the output weights are obtained analytically. This non-iterative training mechanism enables them to train faster than traditional neural networks such as the back-propagation algorithm (BP) (Rumelhart, Durbin, Golden, & Chauvin, 1995) and work better on many platforms with limited hardware resources (e.g., various IoT terminals Xu, Zhang, Gao, Xue, Qi, & Dou, 2020), so it has been widely concerned and applied in many scenarios in recent years (Zhang, Li, Li, & Wang, 2020).

However, most of the existing randomized neural networks suffer from two notorious weaknesses, that is, (a) the quality of the random parameters (i.e., the input weights and hidden biases) is hard to be guaranteed and (b) the number of hidden nodes is difficult to be determined before modeling. For the former problem, several empirical guidelines are given in Li and Wang (2017), but they can only work in some specific scenarios. There are also some existing solutions targeting at the latter problem, which can be divided into two categories: constructive strategy and pruning strategy. The basic idea of the constructive strategy is to start the model with a simple network structure and then gradually increase the hidden nodes until the performance of the model reaches the preset conditions. Incremental RVFL (I-RVFL) is one of the representative algorithms using this strategy (Li & Wang, 2017). The pruning strategy starts the model with a very large network structure and then deletes the unimportant hidden nodes according to certain criteria. For example, in Ertuğrul (2018), the author sorted the importance of hidden nodes according to the output weights and the coefficient of variation of the hidden matrix, and then removed the relatively unimportant nodes. These two strategies have effectively reduced the labor of parameter tuning, but rarely consider the quality of the input parameters, which cannot guarantee the generalization ability of the corresponding models. Therefore, the above two problems still hinder the extensive application of randomized models in practice.

To alleviate the above problems, Wang and Li (2017b) proposed a constructive randomized algorithm called stochastic configuration network (SCN) in 2017. Compared with other randomized neural networks, SCN uses a supervisory random mechanism to assign the input weights and hidden biases of hidden nodes, which enables it to have better stability and generalization ability. Moreover, SCN can automatically search for the number of hidden nodes that can make the model achieve an expected accuracy, which greatly reduces the workload of parameter tuning. These advantages have made SCN quickly attracted extensive attention, and various variants have been proposed. Some notable work based on SCN include: In Wang and Li (2017b), Wang DH et al. theoretically proved that the method of generating random parameters using the supervisory mechanism can guarantee the universal approximation ability of the randomized algorithms, which lays a theoretical foundation for SCN. Later, they proposed a hybrid method by combining SCN with kernel density estimation (KDE) (Wang & Li, 2017a) and applied it to solve the uncertain data modeling problems. Moreover, they proposed an ensemble learning algorithm based on SCN from the perspective of heterogeneous features fusion, and adopted the negative correlation learning strategy (NCL) to evaluate its output weights (Wang & Cui, 2017). The new algorithm effectively improves the robustness of the model. To further improve the modeling performance of SCN, Wang and Li (2018) proposed a deep SCN with a multi-hidden layer network structure. Some interesting properties and improved modeling performance can be observed from Deep SCN.  Li and Wang (2019) extended the original SCN framework with two dimensional inputs for image data informatics, showing good potential for fast image processing. In Pratama and Wang (2019), the authors improved the original SCN to make it have the ability to deal with data stream learning problems. On this basis, they used the stacking strategy to expand it to a deep architecture to handle complex and non-stationary data stream scenarios. In Huang, Huang, and Wang (2019), Huang CQ et al. designed an adaptive power storage replica management system based on SCN to evaluate and analyze the traffic state of power data networks. In addition, SCN is applied to the data modeling in process industries (Dai, Li, Zhou, & Chai, 2019), workload forecasting in geo-distributed cloud data centers (Bi, Yuan, Zhang, & Zhang, 2019), carbon residual prediction of crude oil (Lu & Ding, 2019a), prediction of key variables in industrial process (Lu & Ding, 2019b), component concentrations forecasting in sodium aluminate liquor (Wang & Wang, 2020), and the interval prediction in the industrial process (Lu, Ding, Dai, & Chai, 2020).

Although SCN and its variants have played significant roles in many applications, they still suffer from a common weakness, that is, they spend too much time searching for candidate input parameters during the model training process. Specifically, when adding a new hidden node, they need to prepare multiple candidates that meet the preset conditions through the above-mentioned supervisory mechanism, and then select the one that can reduce the current residual error greatest as the new node. This training mechanism ensures that the error of the model is monotonically decreasing, but causes the training process to be very time-consuming, especially when the number of candidates is large or the residual error becomes small. Note that if the number of candidates were set too small, the quality of some hidden nodes may be poor, which would reduce the convergence rate of the model and could not get a compact architecture with good generalization ability.

To solve the above problem, we have optimized the process of adding hidden nodes in SCN and proposed a novel semi-random constructive algorithm called bidirectional stochastic configuration network (BSCN), which includes two learning modes: forward learning for the odd nodes and backward learning for the even nodes. Specifically, when a new hidden node is ready to be added, if its order is odd (e.g., the first node), BSCN uses a supervisory mechanism to find appropriate input parameters for it (called forward learning), which is exactly the same as SCN; otherwise (i.e., the order is even, such as the second hidden node), BSCN calculates its input parameters at one time according to the current residual error feedback (called backward learning). During training, these two learning modes proceed in turn. The forward learning naturally inherits the advantages of the original SCN, that is, the supervisory method can improve the quality of hidden nodes to a certain extent and guarantee the universal approximation ability of the model (Wang & Li, 2017b); and the backward learning can avoid the problem of excessive time consumption caused by finding a large number of candidates, and the hidden nodes obtained in each step can minimize the residual error at that time. Therefore, this bidirectional learning mechanism enables BSCN to have the following advantages:

  • (1)

    Naturally inherits the universal approximation ability possessed by the SCN model;

  • (2)

    Greatly accelerates the training efficiency of the model;

  • (3)

    The quality of hidden nodes is effectively improved, which in turn makes the trained model better in generalization ability and more compact in network structure.

We verified the effectiveness of BSCN on ten benchmark regression problems, two real-life air pollution prediction problems, and a classical problem of age estimation from a single face image. Experimental results show that, compared with SCN, BSCN has not only much faster training speed but also better generalization ability and stability. Moreover, compared with other typical constructive neural networks such as I-RVFL and constructive BP (C-BP), the experimental results show that the generalization ability and stability of the BSCN model are significantly better than them.

The remainder of this paper is organized as follows. In Section 2, we briefly review the training mechanism of SCN, I-RVFL, and C-BP. The details of our proposed BSCN algorithm and its pseudocode are given in Section 3. In Section 4, we introduce the experimental data, parameter settings, and experimental results. In Section 5, we conclude this paper.

Section snippets

Preliminaries

In this section, we briefly review the training mechanism of SCN. SCN is a constructive feed-forward neural network with a single hidden layer. Take the SCN model for regression problems as an example, whose network structure is shown in Fig. 1, where w refers to the input weights between the input layer and the hidden layer, b refers to the thresholds of hidden nodes (a.k.a., hidden biases), β refers to the output weights between the hidden layer and the output layer, d is the dimension of the

Bidirectional stochastic configuration network (BSCN)

In this section, we introduce the details of the proposed BSCN algorithm and present its pseudo-code.

Experimental setting and results

In this section, we evaluate the performance of the proposed BSCN on ten benchmark regression problems from the UCI machine learning repository,1 two real-world air pollution prediction problems, and a classical problem of age estimation from a single face image. We also compare the performance of BSCN with SCN (Wang & Li, 2017b), I-RVFL (Li & Wang, 2017), and C-BP (Kwok & Yeung, 1997) on these problems.

Conclusions

To improve the training efficiency of SCN, this paper optimized the constructive process of its hidden nodes and proposed the bidirectional SCN (BSCN), which uses two learning modes (i.e., the forward learning and the backward learning) to add the hidden nodes. Specifically, the forward learning uses the same supervisory mechanism as SCN to assign the input weights and hidden biases for the odd nodes, which can guarantee the universal approximation ability of the model; while the backward

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (33)

  • BalcanM.F. et al.

    A theory of learning with similarity functions

    Machine Learning

    (2008)
  • ChouC.H. et al.

    Long-term traffic time prediction using deep learning with integration of weather effect

  • ErtuğrulÖ.F.

    Two novel versions of randomized feed forward artificial neural networks: stochastic and pruned stochastic

    Neural Processing Letters

    (2018)
  • GedeonT.

    Stochastic bidirectional training

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE...
  • HuangC. et al.

    Stochastic configuration networks based adaptive storage replica management for power big data processing

    IEEE Transactions on Industrial Informatics

    (2019)
  • Cited by (0)

    This work was supported by National Natural Science Foundation of China (61976141, 61732011, and 61836005) and Guangdong Science and Technology Department, China (2018B010107004).

    View full text