Abstract
In this paper, we propose a method to design Neural Networks with Random Weights in the presence of incomplete data. We present a method, under the general assumption that the data is missing-at-random, to estimate the weights of the output layer as a function of the uncertainty of the missing data estimates. The proposed method uses the Unscented Transform to approximate the expected values and the variances of the training examples after the hidden layer. We model the input data as a Gaussian Mixture Model with parameters estimated via a maximum likelihood approach. The validity of the proposed method is empirically assessed under a range of conditions on simulated and real problems. We conduct numerical experiments to compare the performance of the proposed method to the performance of popular, parametric and non-parametric, imputation methods. By the results observed in the experiments, we conclude that our proposed method consistently outperforms its counterparts.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abdella M, Marwala T (2005) The use of genetic algorithms and neural networks to approximate missing data in database. In: IEEE 3rd international conference on computational cybernetics ICCC 2005, pp 207–212
Braake HAT, Straten GV (1995) Random activation weight neural net (rawn) for fast non-iterative training. Eng Appl Artif Intell 8(1):71–80. https://doi.org/10.1016/0952-1976(94)00056-S
Broomhead DS, Lowe D (1988) Multivariable functional interpolation and adaptive networks. Complex Syst 2:321–355
Cai J, Candès E, Shen Z (2010) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20(4):1956–1982. https://doi.org/10.1137/080738970
Cox D, Pinto N (2011) Beyond simple features: a large-scale feature search approach to unconstrained face recognition. Face Gesture 2011:8–15. https://doi.org/10.1109/FG.2011.5771385
Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control, Signals Syst 2(4):303–314
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Ding Y, Simonoff JS (2010) An investigation of missing data methods for classification trees applied to binary response data. J Mach Learn Res 11:131–170
Eirola E, Lendasse A, Vandewalle V, Biernacki C (2014) Mixture of gaussians for distance estimation with missing data. Neurocomputing 131:32–42
Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92
Funahashi KI (1989) On the approximate realization of continuous mappings by neural networks. Neural Netw 2(3):183–192
Garcia-Laencina PJ, Sancho-Gomez JL, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review. Neural Comput Appl 19(2):263–282
Giryes R, Sapiro G, Bronstein AM (2016) Deep neural networks with random gaussian weights: a universal classification strategy? IEEE Trans Signal Process 64:3444–3457
Guo P (2018) A vest of the pseudoinverse learning algorithm. CoRR arXiv:1805.07828
Guo P, Chen PC, Sun Y (1995) An exact supervised learning for a three-layer supervised neural network. In: International conference on neural information processing (ICONIP), Beijing, pp 1041–1044
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Hulse JV, Khoshgoftaar TM (2014) Incomplete-case nearest neighbor imputation in software measurement data. Inf Sci 259:596–610
Hunt L, Jorgensen M (2003) Mixture model clustering for mixed data with missing information. Comput Stat Data Anal 41(3–4):429–440
Gheyas IA, Smith LS (2010) A neural network-based framework for the reconstruction of incomplete data sets. Neurocomputing 73(16–18):3039–3065
Igelnik B, Pao YH (1995) Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Trans Neural Netw 6(6):1320–1329. https://doi.org/10.1109/72.471375
Julier SJ, Uhlmann JK (1997) A new extension of the Kalman filter to nonlinear systems. In: SPIE aerosense symposium, pp 182–193
Julier SJ, Uhlmann JK (2004) Unscented filtering and nonlinear estimation. Proc IEEE 92(3):401–422
Kang P (2013) Locally linear reconstruction based missing value imputation for supervised learning. Neurocomputing 118:65–78
Leão BP, Yoneyama T (2011) On the use of the unscented transform for failure prognostics. In: IEEE aerospace conference. IEEE, Big Sky
Li C, Zhou H (2017) svt: Singular value thresholding in MATLAB. J Stat Softw, Code Snippets 81(2):1–13. https://doi.org/10.18637/jss.v081.c02
Li M, Wang D (2017) Insights into randomized algorithms for neural networks: practical issues and common pitfalls. Inf Sci 382–383:170–178. https://doi.org/10.1016/j.ins.2016.12.007
Li Y, Yu W (2017) A fast implementation of singular value thresholding algorithm using recycling rank revealing randomized singular value decomposition. CoRR arXiv:1704.05528
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 5 Jan 2018
Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, Hoboken
Luengo J, García S, Herrera F (2010) A study on the use of imputation methods for experimentation with radial basis function network classifiers handling missing attribute values: the good synergy between RBFNs and eventcovering method. Neural Netw 23(3):406–418
Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ecm algorithm: a general framework. Biometrika 80(2):267–278
Mesquita DP, Gomes JP, Souza AH Jr, Nobre JS (2017) Euclidean distance estimation in incomplete datasets. Neurocomputing 248:11–18. https://doi.org/10.1016/j.neucom.2016.12.081
Mesquita DP, Gomes JP, Corona F, Souza AH, Nobre JS (2019) Gaussian kernels for incomplete data. Appl Soft Comput 77:356–365. https://doi.org/10.1016/j.asoc.2019.01.022
Mesquita DPP, Gomes JPP, Souza AH Jr (2017) Epanechnikov kernel for incomplete data. Electron Lett 53(21):1408–1410. https://doi.org/10.1049/el.2017.0507
Oliveira PG, Coelho AL (2009) Genetic versus nearest-neighbor imputation of missing attribute values for RBF networks. In: Koppen M, Kasabov N, Coghill G (eds) Advances in neuro-information processing. Springer, Berlin, pp 276–283
Pao YH, Phillips SM, Sobajic DJ (1992) Neural-net computing and the intelligent control of systems. Int J Control 56(2):263–289
Pao YH, Park GH, Sobajic DJ (1994) Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 6(2):163–180. https://doi.org/10.1016/0925-2312(94)90053-1
Pelckmans K, Brabanter JD, Suykens J, Moor BD (2005) Handling missing values in support vector machine classifiers. Neural Netw 18(5–6):684–692
Pinto N, Doukhan D, DiCarlo JJ, Cox DD (2009) A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLOS Comput Biol 5(11):1–12. https://doi.org/10.1371/journal.pcbi.1000579
Rudi A, Rosasco L (2017) Generalization properties of learning with random features. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30. Curran Associates, Inc., pp 3215–3225. http://papers.nips.cc/paper/6914-generalization-properties-of-learning-with-random-features.pdf
Saxe AM, Koh PW, Chen Z, Bhand M, Suresh B, Ng AY (2011) On random weights and unsupervised feature learning. In: Proceedings of the 28th international conference on machine learning ICML’11. Omnipress, Madison, pp 1089–1096
Scardapane S, Wang D (2017) Randomness in neural networks: an overview. Wiley Interdisc Rev: Data Min Knowl Discov 7:e1200
Schmidt WF, Kraaijveld MA, Duin RPW (1992) Feedforward neural networks with random weights. In: Proceedings, 11th IAPR international conference on pattern recognition, conference B: pattern recognition methodology and systems, vol 2, pp 1–4
Smola AJ, Vishwanathan SVN, Hofmann T (2005) Kernel methods for missing variables. In: Proceedings of the tenth international workshop on artificial intelligence and statistics, pp 325–332
Stosica D, Stosic D, Zanchettin C, Ludermir T, Stosic B (2017) QRNN: \(q\)-generalized random neural network. IEEE Trans Neural Netw Learn Syst 28(2):383–390
Suganthan PN (2018) Letter: on non-iterative learning algorithms with closed-form solution. Appl Soft Comput 70:1078–1082. https://doi.org/10.1016/j.asoc.2018.07.013
Vidya L, Vivekanand V, Shyamkumar U, Mishra D (2015) RBF-network based sparse signal recovery algorithm for compressed sensing reconstruction. Neural Netw 63:66–78
Wang D, Li M (2017) Deep stochastic configuration networks: universal approximation and learning representation. CoRR arXiv:1702.05639
Wang D, Li M (2017) Stochastic configuration networks: fundamentals and algorithms. IEEE Trans Cyber 47(10):3466–3479. https://doi.org/10.1109/TCYB.2017.2734043
Yu Q, Miche Y, Eirola E, van Heeswijk M, SÃl’verin E, Lendasse A (2013) Regularized extreme learning machine for regression with missing data. Neurocomputing 102:45–51
Ding Z, Fu Y (2018) Deep domain generalization with structured low-rank constraint. IEEE Trans Image Process 27(1):304–313. https://doi.org/10.1109/TIP.2017.2758199
Zhang L, Suganthan P (2016) A survey of randomized algorithms for training neural networks. Inf Sci 364–365:146–155. https://doi.org/10.1016/j.ins.2016.01.039
Acknowledgements
The authors would like to thank the Brazilian National Council for Scientific and Technological Development (CNPq) for the financial support (Grant No. 305048/2016-3)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Unscented Transform (UT)
Appendix: Unscented Transform (UT)
Given a D-dimensional random variable X, we are interested in estimating statistical moments of \(\psi \), which results from an application of a non-linear function \(h(\cdot )\) to X. These values could be obtained via standard sampling procedures or numerical integration methods. However, such procedures can be computationally intensive and depend on many factors such as proper initialization, stop criteria, etc. The Unscented Transform (UT) provides a scheme to estimate the moments of \(\psi \) using a small set of deterministically chosen samples, referred to as sigma points (SPs), from the space of X.
There are different possible ways to choose the SPs. A common approach is to use a symmetric set of \(S = 2D + 1\) SPs as described in Eqs. (32) to (34).
where \(\left[ \sqrt{D \, \varSigma _{}}\right] _s\) denotes the s-th row of the matrix square root of \(D \, \varSigma _{}\), which is the covariance matrix \(\varSigma _{}\) of X.
Given the SPs and a set of weights \(\{k_s\}^S_{s=1} \subset {\mathbb {R}}\), we can approximate the moments of \(\psi \) using a simple set of rules. For instance \({\mathbb {E}}[\psi ]\) and \(\mathrm {Cov}(\psi )\) can then be approximated using the following equations:
Although there is no restriction on their sign, the weights \(k_1, \ldots , k_S\) must respect the convexity constraint.
to provide an unbiased estimate [22]. In this paper, we set \(k_1 = k_2 = \cdots = k_S = 1/S\).
Rights and permissions
About this article
Cite this article
Mesquita, D.P.P., Gomes, J.P.P. & Rodrigues, L.R. Artificial Neural Networks with Random Weights for Incomplete Datasets. Neural Process Lett 50, 2345–2372 (2019). https://doi.org/10.1007/s11063-019-10012-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-019-10012-0