Skip to main content
Log in

Artificial Neural Networks with Random Weights for Incomplete Datasets

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

In this paper, we propose a method to design Neural Networks with Random Weights in the presence of incomplete data. We present a method, under the general assumption that the data is missing-at-random, to estimate the weights of the output layer as a function of the uncertainty of the missing data estimates. The proposed method uses the Unscented Transform to approximate the expected values and the variances of the training examples after the hidden layer. We model the input data as a Gaussian Mixture Model with parameters estimated via a maximum likelihood approach. The validity of the proposed method is empirically assessed under a range of conditions on simulated and real problems. We conduct numerical experiments to compare the performance of the proposed method to the performance of popular, parametric and non-parametric, imputation methods. By the results observed in the experiments, we conclude that our proposed method consistently outperforms its counterparts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abdella M, Marwala T (2005) The use of genetic algorithms and neural networks to approximate missing data in database. In: IEEE 3rd international conference on computational cybernetics ICCC 2005, pp 207–212

  2. Braake HAT, Straten GV (1995) Random activation weight neural net (rawn) for fast non-iterative training. Eng Appl Artif Intell 8(1):71–80. https://doi.org/10.1016/0952-1976(94)00056-S

    Article  Google Scholar 

  3. Broomhead DS, Lowe D (1988) Multivariable functional interpolation and adaptive networks. Complex Syst 2:321–355

    MathSciNet  MATH  Google Scholar 

  4. Cai J, Candès E, Shen Z (2010) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20(4):1956–1982. https://doi.org/10.1137/080738970

    Article  MathSciNet  MATH  Google Scholar 

  5. Cox D, Pinto N (2011) Beyond simple features: a large-scale feature search approach to unconstrained face recognition. Face Gesture 2011:8–15. https://doi.org/10.1109/FG.2011.5771385

    Article  Google Scholar 

  6. Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control, Signals Syst 2(4):303–314

    Article  MathSciNet  Google Scholar 

  7. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  8. Ding Y, Simonoff JS (2010) An investigation of missing data methods for classification trees applied to binary response data. J Mach Learn Res 11:131–170

    MathSciNet  MATH  Google Scholar 

  9. Eirola E, Lendasse A, Vandewalle V, Biernacki C (2014) Mixture of gaussians for distance estimation with missing data. Neurocomputing 131:32–42

    Article  Google Scholar 

  10. Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86–92

    Article  MathSciNet  Google Scholar 

  11. Funahashi KI (1989) On the approximate realization of continuous mappings by neural networks. Neural Netw 2(3):183–192

    Article  Google Scholar 

  12. Garcia-Laencina PJ, Sancho-Gomez JL, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review. Neural Comput Appl 19(2):263–282

    Article  Google Scholar 

  13. Giryes R, Sapiro G, Bronstein AM (2016) Deep neural networks with random gaussian weights: a universal classification strategy? IEEE Trans Signal Process 64:3444–3457

    Article  MathSciNet  Google Scholar 

  14. Guo P (2018) A vest of the pseudoinverse learning algorithm. CoRR arXiv:1805.07828

  15. Guo P, Chen PC, Sun Y (1995) An exact supervised learning for a three-layer supervised neural network. In: International conference on neural information processing (ICONIP), Beijing, pp 1041–1044

  16. Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366

    Article  Google Scholar 

  17. Hulse JV, Khoshgoftaar TM (2014) Incomplete-case nearest neighbor imputation in software measurement data. Inf Sci 259:596–610

    Article  Google Scholar 

  18. Hunt L, Jorgensen M (2003) Mixture model clustering for mixed data with missing information. Comput Stat Data Anal 41(3–4):429–440

    Article  MathSciNet  Google Scholar 

  19. Gheyas IA, Smith LS (2010) A neural network-based framework for the reconstruction of incomplete data sets. Neurocomputing 73(16–18):3039–3065

    Article  Google Scholar 

  20. Igelnik B, Pao YH (1995) Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Trans Neural Netw 6(6):1320–1329. https://doi.org/10.1109/72.471375

    Article  Google Scholar 

  21. Julier SJ, Uhlmann JK (1997) A new extension of the Kalman filter to nonlinear systems. In: SPIE aerosense symposium, pp 182–193

  22. Julier SJ, Uhlmann JK (2004) Unscented filtering and nonlinear estimation. Proc IEEE 92(3):401–422

    Article  Google Scholar 

  23. Kang P (2013) Locally linear reconstruction based missing value imputation for supervised learning. Neurocomputing 118:65–78

    Article  Google Scholar 

  24. Leão BP, Yoneyama T (2011) On the use of the unscented transform for failure prognostics. In: IEEE aerospace conference. IEEE, Big Sky

  25. Li C, Zhou H (2017) svt: Singular value thresholding in MATLAB. J Stat Softw, Code Snippets 81(2):1–13. https://doi.org/10.18637/jss.v081.c02

    Article  Google Scholar 

  26. Li M, Wang D (2017) Insights into randomized algorithms for neural networks: practical issues and common pitfalls. Inf Sci 382–383:170–178. https://doi.org/10.1016/j.ins.2016.12.007

    Article  Google Scholar 

  27. Li Y, Yu W (2017) A fast implementation of singular value thresholding algorithm using recycling rank revealing randomized singular value decomposition. CoRR arXiv:1704.05528

  28. Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 5 Jan 2018

  29. Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, Hoboken

    Book  Google Scholar 

  30. Luengo J, García S, Herrera F (2010) A study on the use of imputation methods for experimentation with radial basis function network classifiers handling missing attribute values: the good synergy between RBFNs and eventcovering method. Neural Netw 23(3):406–418

    Article  Google Scholar 

  31. Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ecm algorithm: a general framework. Biometrika 80(2):267–278

    Article  MathSciNet  Google Scholar 

  32. Mesquita DP, Gomes JP, Souza AH Jr, Nobre JS (2017) Euclidean distance estimation in incomplete datasets. Neurocomputing 248:11–18. https://doi.org/10.1016/j.neucom.2016.12.081

    Article  Google Scholar 

  33. Mesquita DP, Gomes JP, Corona F, Souza AH, Nobre JS (2019) Gaussian kernels for incomplete data. Appl Soft Comput 77:356–365. https://doi.org/10.1016/j.asoc.2019.01.022

    Article  Google Scholar 

  34. Mesquita DPP, Gomes JPP, Souza AH Jr (2017) Epanechnikov kernel for incomplete data. Electron Lett 53(21):1408–1410. https://doi.org/10.1049/el.2017.0507

    Article  Google Scholar 

  35. Oliveira PG, Coelho AL (2009) Genetic versus nearest-neighbor imputation of missing attribute values for RBF networks. In: Koppen M, Kasabov N, Coghill G (eds) Advances in neuro-information processing. Springer, Berlin, pp 276–283

    Chapter  Google Scholar 

  36. Pao YH, Phillips SM, Sobajic DJ (1992) Neural-net computing and the intelligent control of systems. Int J Control 56(2):263–289

    Article  MathSciNet  Google Scholar 

  37. Pao YH, Park GH, Sobajic DJ (1994) Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 6(2):163–180. https://doi.org/10.1016/0925-2312(94)90053-1

    Article  Google Scholar 

  38. Pelckmans K, Brabanter JD, Suykens J, Moor BD (2005) Handling missing values in support vector machine classifiers. Neural Netw 18(5–6):684–692

    Article  Google Scholar 

  39. Pinto N, Doukhan D, DiCarlo JJ, Cox DD (2009) A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLOS Comput Biol 5(11):1–12. https://doi.org/10.1371/journal.pcbi.1000579

    Article  MathSciNet  Google Scholar 

  40. Rudi A, Rosasco L (2017) Generalization properties of learning with random features. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30. Curran Associates, Inc., pp 3215–3225. http://papers.nips.cc/paper/6914-generalization-properties-of-learning-with-random-features.pdf

  41. Saxe AM, Koh PW, Chen Z, Bhand M, Suresh B, Ng AY (2011) On random weights and unsupervised feature learning. In: Proceedings of the 28th international conference on machine learning ICML’11. Omnipress, Madison, pp 1089–1096

  42. Scardapane S, Wang D (2017) Randomness in neural networks: an overview. Wiley Interdisc Rev: Data Min Knowl Discov 7:e1200

    Google Scholar 

  43. Schmidt WF, Kraaijveld MA, Duin RPW (1992) Feedforward neural networks with random weights. In: Proceedings, 11th IAPR international conference on pattern recognition, conference B: pattern recognition methodology and systems, vol 2, pp 1–4

  44. Smola AJ, Vishwanathan SVN, Hofmann T (2005) Kernel methods for missing variables. In: Proceedings of the tenth international workshop on artificial intelligence and statistics, pp 325–332

  45. Stosica D, Stosic D, Zanchettin C, Ludermir T, Stosic B (2017) QRNN: \(q\)-generalized random neural network. IEEE Trans Neural Netw Learn Syst 28(2):383–390

    Article  Google Scholar 

  46. Suganthan PN (2018) Letter: on non-iterative learning algorithms with closed-form solution. Appl Soft Comput 70:1078–1082. https://doi.org/10.1016/j.asoc.2018.07.013

    Article  Google Scholar 

  47. Vidya L, Vivekanand V, Shyamkumar U, Mishra D (2015) RBF-network based sparse signal recovery algorithm for compressed sensing reconstruction. Neural Netw 63:66–78

    Article  Google Scholar 

  48. Wang D, Li M (2017) Deep stochastic configuration networks: universal approximation and learning representation. CoRR arXiv:1702.05639

  49. Wang D, Li M (2017) Stochastic configuration networks: fundamentals and algorithms. IEEE Trans Cyber 47(10):3466–3479. https://doi.org/10.1109/TCYB.2017.2734043

    Article  Google Scholar 

  50. Yu Q, Miche Y, Eirola E, van Heeswijk M, SÃl’verin E, Lendasse A (2013) Regularized extreme learning machine for regression with missing data. Neurocomputing 102:45–51

    Article  Google Scholar 

  51. Ding Z, Fu Y (2018) Deep domain generalization with structured low-rank constraint. IEEE Trans Image Process 27(1):304–313. https://doi.org/10.1109/TIP.2017.2758199

    Article  MathSciNet  MATH  Google Scholar 

  52. Zhang L, Suganthan P (2016) A survey of randomized algorithms for training neural networks. Inf Sci 364–365:146–155. https://doi.org/10.1016/j.ins.2016.01.039

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Brazilian National Council for Scientific and Technological Development (CNPq) for the financial support (Grant No. 305048/2016-3)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to João Paulo P. Gomes.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Unscented Transform (UT)

Appendix: Unscented Transform (UT)

Given a D-dimensional random variable X, we are interested in estimating statistical moments of \(\psi \), which results from an application of a non-linear function \(h(\cdot )\) to X. These values could be obtained via standard sampling procedures or numerical integration methods. However, such procedures can be computationally intensive and depend on many factors such as proper initialization, stop criteria, etc. The Unscented Transform (UT) provides a scheme to estimate the moments of \(\psi \) using a small set of deterministically chosen samples, referred to as sigma points (SPs), from the space of X.

There are different possible ways to choose the SPs. A common approach is to use a symmetric set of \(S = 2D + 1\) SPs as described in Eqs. (32) to (34).

$$\begin{aligned} \gamma _1&= {\mathbb {E}}[X]&\quad&\end{aligned}$$
(32)
$$\begin{aligned} \gamma _s&= \gamma _1 + \left[ \sqrt{D \, \varSigma _{}}\right] _{s-1}&\quad&\forall \, 1 < s \le D + 1 \end{aligned}$$
(33)
$$\begin{aligned} \gamma _s&= \gamma _1 - \left[ \sqrt{D \, \varSigma _{}}\right] _{s - (D + 1)}&\quad&\forall \, D+1 < s \le 2D + 1 \end{aligned}$$
(34)

where \(\left[ \sqrt{D \, \varSigma _{}}\right] _s\) denotes the s-th row of the matrix square root of \(D \, \varSigma _{}\), which is the covariance matrix \(\varSigma _{}\) of X.

Given the SPs and a set of weights \(\{k_s\}^S_{s=1} \subset {\mathbb {R}}\), we can approximate the moments of \(\psi \) using a simple set of rules. For instance \({\mathbb {E}}[\psi ]\) and \(\mathrm {Cov}(\psi )\) can then be approximated using the following equations:

$$\begin{aligned}&\delta _s \leftarrow h \left( \gamma _s\right) \quad \forall \, 1 \le s \le S, \end{aligned}$$
(35)
$$\begin{aligned}&{\mathbb {E}}[\psi ] \approx \sum \limits _{s=1}^S k_s \delta _s \end{aligned}$$
(36)
$$\begin{aligned}&\mathrm {Cov}({\psi }) \approx \sum \limits _{s=1}^S k_s \left( \delta _s - {\mathbb {E}}[\psi ]\right) \left( \delta _s - {\mathbb {E}}[{\psi }]\right) ^T. \end{aligned}$$
(37)

Although there is no restriction on their sign, the weights \(k_1, \ldots , k_S\) must respect the convexity constraint.

$$\begin{aligned} \sum \limits _{s=1}^S k_s = 1, \end{aligned}$$
(38)

to provide an unbiased estimate [22]. In this paper, we set \(k_1 = k_2 = \cdots = k_S = 1/S\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mesquita, D.P.P., Gomes, J.P.P. & Rodrigues, L.R. Artificial Neural Networks with Random Weights for Incomplete Datasets. Neural Process Lett 50, 2345–2372 (2019). https://doi.org/10.1007/s11063-019-10012-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-019-10012-0

Keywords

Navigation