A random-weighted plane-Gaussian artificial neural network

Yang, Xubing; Yang, Hongxin; Zhang, Fuquan; Fan, Xijian; Ye, Qiaolin; Feng, Zhe

doi:10.1007/s00521-019-04457-6

A random-weighted plane-Gaussian artificial neural network

Original Article
Published: 30 August 2019

Volume 31, pages 8681–8692, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Xubing Yang ORCID: orcid.org/0000-0001-5504-8392¹,
Hongxin Yang¹,
Fuquan Zhang¹,
Xijian Fan¹,
Qiaolin Ye¹ &
…
Zhe Feng¹

345 Accesses
9 Citations
Explore all metrics

Abstract

Multilayer perceptron (MLP) and radial basis function network (RBFN) have received considerable attentions in data classification and regression. As a bridge between MLP and RBFN, plane-Gaussian (PG) network is capable of exhibiting globality and locality simultaneously by so-called PG activation function. Due to tuning network weight values by back propagation or clustering method in the training phase, they all confront with slow convergence rate, time-consuming, and easily dropping in local minima. To speed training networks, random projection technologies, for instance, extreme learning machine (ELM), have brightened up in recent decades. In this paper, we propose a random-weighted PG network, termed as RwPG. Instead of plane clustering in PG network, our RwPG adopts random values as network weight, and then analytically calculates network output by matrix inversion. Compared to PG and ELM, the advantages of the proposed RwPG list in fourfold: (1) It will be proved that the RwPG is also a universal approximator. (2) It inherits the geometrical interpretation of PG network, and is also suitable for capturing linearity in data, especially for plane distribution cases. (3) It has comparable training speed for ELM, but significantly faster than that of PG network. (4) Owing to random-weighted technology, RwPG is probably capable of breaking through local extremum problems. Finally, experiments on artificial and benchmark datasets will show its superiorities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development and Application of Artificial Neural Network

Article 30 December 2017

A comparative analysis of gradient boosting algorithms

Article 24 August 2020

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Lu K, An X, Li J et al (2017) Efficient deep network for vision-based object detection in robotic applications. Neurocomputing 245:31–45
Article Google Scholar
Cox DD, Dean T (2014) Neural networks and neuroscience-inspired computer vision. Curr Biol 24(18):921–929
Article Google Scholar
Siniscalchi SM, Svendsen T, Lee C (2014) An artificial neural network approach to automatic speech processing. Neurocomputing 140(22):326–338
Article Google Scholar
Wu Y, Schuster M, Chen Z et al (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR. Technical report, available at http://arxiv.org/abs/1609.08144
Varshney D, kumar S, Gupta V (2017) Predicting information diffusion probabilities in social networks: a Bayesian networks based approach. Knowl-Based Syst 133:66–76
Article Google Scholar
Mishra J, Anguera JA, Gazzaley A (2016) Video games for neuro-cognitive optimization. Neuron 90(2):214–218
Article Google Scholar
Yang X, Chen S, Chen B (2012) Plane-Gaussian artificial neural network. Neural Comput Appl 21(2):305–317
Article Google Scholar
Bradley PS, Mangasarian OL (2000) k-Plane Clustering. J Glob Optim 16(1):23–32
Article MathSciNet MATH Google Scholar
Bengio Y. (2012) Practical recommendations for gradient-based training of deep architectures. In: LNCS neural networks: tricks of the trade 2nd ed. Springer, Berlin, pp 437–478
Google Scholar
Bengio Y, Courville A, Vincent P (2012) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Article Google Scholar
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceeding of the 27th international conference on machine learning (ICML). Omnipress, Haifa, Israel, Wahpeton, ND, USA, June 21–24, pp 807–814
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. NIPS 25:1097–1105
Google Scholar
Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. NIPS 28:2377–2385
Google Scholar
He KM, Zhang XY, Ren SQ et al (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR). IEEE Press, Las Vegas, NV, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
He KM, Zhang XY, Ren SQ et al (2016) Identity mapping in deep residual networks. In: Proceedings of European conference on computer vision (ECCV), Amsterdam, Netherlands, October 8–16, pp 630–645. arXiv:1603.05027v3
Guo P, Zhou XL, Wang K (2018) PILAE: a non-gradient descent learning scheme for deep feedforward neural networks. arXiv:1811.01545
Pang S, Yang X (2016) Deep convolutional extreme learning machine and its application in handwritten digit classification. Comput Intell Neurosci. https://doi.org/10.1155/2016/3049632
Article Google Scholar
Michel M, Abel G, Wellington P (2019) Deep convolutional extreme learning machines: filters combination and error model validation. Neurocomputing 329:359–369
Article Google Scholar
Tissera M, McDonnell M (2016) Deep extreme learning machines: supervised autoencoding architecture for classification. Neurocomputing 174(Part A):42–49
Article Google Scholar
Duan M, Li K, Yang C et al (2018) A hybrid deep learning CNN–ELM for age and gender classification. Neurocompuing 275:448–461
Article Google Scholar
Li J, Zhao X, Li Y et al (2018) Classification of hyperspectral imagery using a new fully convolutional neural network. IEEE Geosci Remote Sens Lett 99:1–5
Google Scholar
Cao W, Wang X, Ming Z et al (2018) A review on neural networks with random weights. Neurocomputing 275:278–287
Article Google Scholar
Schmidt W, Kraaijveld M, Duin R (1992) Feedforward neural networks with random weights. In: Proceedings of 11th IAPR international conference on pattern recognition methodology and systems, vol 2, pp 1–4
Deng C, Huang G, Xu J, Tang J (2015) Extreme learning machines: new trends and applications. Sci China Inf Sci 58(2):1–16
Article Google Scholar
Huang G, Chen L, Siew C (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892
Article Google Scholar
Huang G (2015) What are extreme learning machines? Filling the gap between Frank Rosenblatt’s dream and John von Neumann’s puzzle. Cognit Comput 7:263–278
Article Google Scholar
Igelnik B, Pao Y-H (1995) Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Trans Neural Netw 6(6):1320–1329
Article Google Scholar
Li JY, Chow W, Igenik B, Pao YH (1997) Comments on “Stochastic choice of basis functions in adaptive function approximation and the functional-link net”. IEEE Trans Neural Netw 8(2):452–454
Google Scholar
Kasun L, Zhou H, Huang G-B, Vong CM (2013) Representational learning with extreme learning machine for big data. IEEE Intell Syst 28(6):31–34
Google Scholar
Huang G, Zhou H, Ding X et al (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B 42(2):513–529
Article Google Scholar
Dua D, Taniskidou EK (2017) UCI machine learning repository (http://archive.ics.uci.edu/ml). University of California, School of Information and Computer Science, Irvine
Moore AW, Crogan ML (2005) Discriminators for use in flow-based classification. Research reports: RR-05-13, Department of Computer Science, Queen Mary, University of London
Mygdalis V, Iosifidis A, Tefas A et al (2018) Semi-supervised subclass support vector data description for image and video classification. Neurocomputing 278:51–61
Article Google Scholar
Maronidis A, Tefas A, Pitas I (2015) Subclass graph embedding and a marginal fisher analysis paradigm. Pattern Recognit 48(12):4024–4035
Article MATH Google Scholar
Wan H, Wang H, Guo G et al (2018) Seperability-oriented subclass discriminant analysis. IEEE Trans Pattern Anal Mach Intell 40(2):409–422
Article Google Scholar
Baum EB, Haussler D (2014) What size net gives valid generalization? Neural Comput 1(1):151–160
Article Google Scholar

Download references

Acknowledgements

We would thank the anonymous editors and reviewers for their valuable comments and suggestions. We would thank Dr. Liyong Fu, the professor of Chinese Academy of Forestry, for his academic advice about deep networks in our revisions. This research was supported in part by the Central Public-interest Scientific Institution Basal Research Fund (Grant No. CAFYBB2019QD003), Natural Science Foundation of China under Grant 31670554 and 61871444, the Jiangsu Science Foundation under Grant BK20161527 and BK20171453, and Postgraduate Research and Practice Innovation Program of Jiangsu Province (SJKY19_0907).

Author information

Authors and Affiliations

College of Information Science and Technology, Nanjing Forestry University, Nanjing, 210037, People’s Republic of China
Xubing Yang, Hongxin Yang, Fuquan Zhang, Xijian Fan, Qiaolin Ye & Zhe Feng

Authors

Xubing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hongxin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Fuquan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xijian Fan
View author publications
You can also search for this author in PubMed Google Scholar
Qiaolin Ye
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Feng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

XY proposed learning method and wrote manuscript. HY and ZF designed experiments. XF, FZ, and QY analyzed experimental results and gave some advice for manuscript.

Corresponding author

Correspondence to Xubing Yang.

Ethics declarations

Conflict of interest

The authors declared that they have no conflicts of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Theorem 2

It is given that a non-constant, bounded, and monotone-increasing continuous function$ G $described in Eq. (4) is dense in$ C(\varvec{I}_{d} ) $. That is, for any$ f \in C(\varvec{I}_{d} ) $and$ \varepsilon > 0 $, there exists a set of$ (\varvec{w}_{i} ,\gamma_{i} ) $, such that

$$ \left| {\sum\limits_{i = 1}^{c} {\varvec{u}_{i} G(\varvec{w}_{i} ,\gamma_{i} ,\varvec{x}) - f(\varvec{x})} } \right| \le \varepsilon $$

(6)

Proof for theorem 2

Since $ f \in C(\varvec{I}_{d} ) $, it can be described by a limit-integral representation

$$ f = \mathop {\lim }\limits_{L \to r} \int_{V} {T[f(\lambda )]} G_{\lambda ,L} (x){\text{d}}\lambda $$

(10)

where $ L $ and $ \lambda $ are one- and d-dimensional (corresponding to foresaid input space) parameters of the activation function $ G $, respectively. $ T $ is an operator defined on $ C(\varvec{I}_{d} ) $, $ r $ is a finite or infinite real number, and $ V $ is the integral domain of the $ \lambda $.

Two stages of approximation can be used to approximate the function $ f $, as described below. The first stage is to approximate the limit value by the integral

$$ f \approx \int_{V} {T[f(\lambda )]} G_{\lambda ,l} (x){\text{d}}\lambda $$

(11)

where $ l \in N(r,\varepsilon ) $($ \varepsilon $ is a fully small positive, and $ N(r,\varepsilon ) $ denotes an $ \varepsilon $ neighbor domain of $ r $. For convenience, noted it by $ l \approx r $).

The second stage is to obtain an estimate of multivariable integral with random method, typically, Monte Carlo. The given set of c random values $ \lambda = (\lambda_{1} ,\lambda_{2} , \cdots ,\lambda_{c} ) $, i.i.d, (independently and identically distributed), is drawn from the uniform distribution on $ V $. That is,

$$ f \approx \sum\limits_{k = 1}^{c} {a_{k} G_{{\lambda_{k} ,l}} (x)} $$

(12)

where $ a_{k} = (|V|/n)T[f(\lambda_{k} )] $, and $ |V| $ denotes the volume of integral domain.When n tends to infinite, the approximation error of Monte Carlo method is bounded by $ C/\sqrt n $, where the constant C is, not independent of d, determined by the variance of the integral [7, 27].

Considering random terms of activation function $ G $, parameter pairs $ (\varvec{w}_{i} ,\gamma_{i} ) $, we redefine the foresaid random term $ \lambda $ in the form of components $ \lambda_{i} = (w_{i1} , \ldots ,w_{id} ,b_{i} ) \in S_{c} (\varOmega ,\alpha ) $, where $ S_{c} (\varOmega ,\alpha ) $ denotes probabilistic space, and c denotes the number of hidden neurons, $ i = 1,2, \ldots ,c $. Then, Eq. (12) becomes

$$ f \approx \sum\limits_{k = 1}^{c} {a_{k} G(\varvec{w}_{k} ,b_{k} ,\varvec{x}} ) $$

(13)

Next, we expect the following expression holds, when c tends to infinite.

$$ \rho_{K} = \sqrt {E\int_{K} {[f(\varvec{x}) - \sum\limits_{k = 1}^{c} {a_{k} G(\varvec{w}_{k} ,b_{k} ,\varvec{x})} ]^{2} {\text{d}}\varvec{x}} \mathop \to \limits_{c \to \infty } 0} $$

(14)

where $ K( \subset I^{d} ) $ denotes a dense set (compact set), and $ E $ is the expectation w.r.t. $ S_{c} (\varOmega ,\alpha ) $.

From the definition of the PGF function defined in Eq. (3), the absolute value term $ \left| {\varvec{w}^{T} \varvec{x} - \gamma } \right|^{2} $ is not only used for interpreting geometrical meaning under the constraint of $ ||\varvec{w}|| = 1 $, but making the activation function bounded, even when $ \varvec{w} $ tends to infinite. (When $ \varvec{w} \to \infty $, the value of PGF tends to zero.) Additionally, here it equals to $ (\varvec{w}^{T} \varvec{x} - \gamma )^{2} $. Thus, in fact the PGF, $ \exp ( - (\varvec{w}^{T} \varvec{x} - \gamma )^{2} /2\sigma^{2} ) $, is a continuous and high-order differentiable. Conveniently, noted it by $ g $ as

$$ g(\varvec{w},\gamma ,\varvec{x}) = \exp ( - (\varvec{w}^{T} \varvec{x} - \gamma )^{2} /2\sigma^{2} ) $$

(15)

Since the parameter pairs $ (\varvec{w}_{i} ,\gamma_{i} ) $ in $ g $ is i.i.d, the expression in (14) can be taken in the form

$$ \begin{aligned} \sum\limits_{k = 1}^{c} {a_{k} g(\varvec{w}_{k} ,b_{k} ,\varvec{x})} = & \sum\limits_{k = 1}^{c} {a_{k} \prod\limits_{i = 1}^{d} {g(\varvec{w}_{ki} ,\gamma_{k} ,x_{i} )} } \\ = & \sum\limits_{k = 1}^{c} {a_{k} \prod\limits_{i = 1}^{d} {\exp ( - (w_{ki} x_{i} - \gamma_{k} )^{2} /2\sigma^{2} )} } \\ \end{aligned} $$

(16)

where $ w_{ki} $ and $ x_{i} $ denote the ith components of vector $ \varvec{w}_{k} $ and $ \varvec{x} $, respectively. So, for any dense set $ K $, it is easy to know that the PGF $ g $ satisfies

$$ \begin{aligned} \int_{R} {g^{2} (w_{ki} ,\gamma_{k} ,x_{i} ){\text{d}}x_{i} } \hfill \\ \quad = \int_{R} {[\exp ( - (w_{ki} x_{i} - \gamma_{k} )^{2} /2\sigma^{2} ,)]^{2} {\text{d}}x_{i} } < \infty \hfill \\ \end{aligned} $$

(17)

According to Theorem 1 in [27], there exist a sequence of $ (\varvec{w},\gamma ) $ in the probability measure space $ S_{c} (\varOmega ,\alpha ) $, such that (14) holds. That is, $ \rho_{K} $ will converge to zero, when $ c $ tends to infinite.

Here, we do not detail the rest proof, the similar description can be found in Appendix section of Ref [27].$ \square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, X., Yang, H., Zhang, F. et al. A random-weighted plane-Gaussian artificial neural network. Neural Comput & Applic 31, 8681–8692 (2019). https://doi.org/10.1007/s00521-019-04457-6

Download citation

Received: 10 December 2018
Accepted: 23 August 2019
Published: 30 August 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s00521-019-04457-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A random-weighted plane-Gaussian artificial neural network

Abstract

Access this article

Similar content being viewed by others

Development and Application of Artificial Neural Network

A comparative analysis of gradient boosting algorithms

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Theorem 2

Proof for theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A random-weighted plane-Gaussian artificial neural network

Abstract

Access this article

Similar content being viewed by others

Development and Application of Artificial Neural Network

A comparative analysis of gradient boosting algorithms

Supervised Classification Algorithms in Machine Learning: A Survey and Review

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Theorem 2

Proof for theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation