Balanced Gradient Training of Feed Forward Networks

Nguyen, Son; Manry, Michael T.

doi:10.1007/s11063-021-10474-1

Balanced Gradient Training of Feed Forward Networks

Published: 05 March 2021

Volume 53, pages 1823–1844, (2021)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

190 Accesses
Explore all metrics

Abstract

We show that there are infinitely many valid scaled gradients which can be used to train a neural network. A novel training method is proposed that finds the best scaled gradients in each training iteration. The method’s implementation uses first order derivatives which makes it scalable and suitable for deep learning and big data. In simulations, the proposed method has similar or less testing error than conjugate gradient and Levenberg Marquardt. The method reaches the final network utilizing fewer multiplies than the other two algorithms. It also works better than conjugate gradient in convolutional neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mitigating Vanishing Gradient in SGD Optimization in Neural Networks

Fast Conjugate Gradient Algorithm for Feedforward Neural Networks

Weight and Gradient Centralization in Deep Neural Networks

References

Akram M Usman, Usman Anam (2011) Computer aided system for brain tumor detection and segmentation. In: 2011 International Conference on Computer Networks and Information Technology (ICCNIT). pp 299–302 IEEE
Atkinson PM, Tatnall ARL (1997) Introduction neural networks in remote sensing. Int J Remote Sens 18(4):699–709
Article Google Scholar
Auddy S S, Tyagi K, Nguyen S, Manry M (2016) Discriminant vector tranformations in neural network classifiers. In: 2016 International Joint Conference on Neural Networks (IJCNN)
Baxt WG (1991) Use of an artificial neural network for the diagnosis of myocardial infarction. Ann Intern Med 115(11):843–848
Article Google Scholar
Beck C, Weinan E, Jentzen A (2019) Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations. J Nonlinear Sci 29(4):1563–1619
Article MathSciNet Google Scholar
Bhandarkar SM, Koh J, Suk M (1997) Multiscale image segmentation using a hierarchical self-organizing map. Neurocomputing 14(3):241–272
Article Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
MATH Google Scholar
Blackard JA, Dean DJ (1999) Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Comput Electron Agric 24(3):131–151
Article Google Scholar
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, England
Book Google Scholar
Brause Rüdiger W (2001) Medical analysis and diagnosis by neural networks. In: International Symposium on Medical Data Analysis. pp 1–13 Springer
Dai T, Cai J, Zhang Y, Xia ST, Zhang L (2019) Second-order attention network for single image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 11065–11074
Eapi GR (2015) Comprehensive neural network forecasting system for ground level ozone in multiple regions. Ph.D. dissertation, The University of Texas at Arlington
Economou G-PK, Spiropoulos C, Economopoulos NM, Charokopos N, Lymberopoulos D, Spiliopoulou M, Haralambopulu E, Goutis CE (1994) Medical diagnosis and artificial neural networks: a medical expert system applied to pulmonary diseases. In: Neural Networks for Signal Processing [1994] IV. Proceedings of the 1994 IEEE Workshop. pp 482–489 IEEE
Egmont-Petersen M, de Ridder D, Handels H (2002) Image processing with neural networks a review. Pattern Recogn 35(10):2279–2301
Article Google Scholar
Gill PE, Murray W (1979) Conjugate-Gradient methods for large-scale nonlinear optimization. Technical report, Standford Univ Calif Systems Optimization LAB
Goodfellow I, Bengio Y, Courville A (2016) Deep Learn. MIT press, USA
MATH Google Scholar
Gore RG, Li J, Manry M, Liu L-M, Changhua Yu, Wei J (2005) Iterative design of neural network classifiers through regression. Int J Artif Intell Tools 14(01n02):281–301
Article Google Scholar
Hamidieh K (2018) A data-driven statistical model for predicting the critical temperature of a superconductor. Comput Mater Sci 154:346–354
Article Google Scholar
Ho Y-C, Kashyap RL (1965) An algorithm for linear inequalities and its applications. IEEE Transactions on Electronic Computers 5:683–688
Article Google Scholar
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Article Google Scholar
Kavzoglu T, Mather PM (1999) Pruning artificial neural networks: an example using land cover classification of multi-sensor images. Int J Remote Sens 20(14):2787–2803
Article Google Scholar
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, Citeseer
Kulluk S, Ozbakir L, Baykasoglu A (2012) Training neural networks with harmony search algorithms for classification problems. Eng Appl Artif Intell 25(1):11–19
Article Google Scholar
Le QV, Ngiam J, Coates A, Lahiri A, Prochnow Bobby, Ng Andrew Y (2011) On optimization methods for deep learning. In: Proceedings of the 28th International Conference on International Conference on Machine Learning. pp 265–272 Omnipress
LeCun Y, Bengio Y et al (1995) Convolutional networks for images, speech, and time series. Handb Brain Theor Neural Netw 3361(10):1995
Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
LeCun Y A, Bottou Léon, Orr Genevieve B, Müller Klaus-Robert (2012) Efficient backprop. In: Neural networks: Tricks of the trade. pp 9–48 Springer
Lee KY, Cha YT, Park JH (1992) Short-term load forecasting using an artificial neural network. IEEE Trans Power Syst 7(1):124–132
Article Google Scholar
Levenberg K (1944) A method for the solution of certain non-linear problems in least squares. Q Appl Math 2(2):164–168
Article MathSciNet Google Scholar
Lin JT, Inigo R (1991) Hand written zip code recognition by back propagation neural network. In: IEEE Proceedings of Southeastcon’91. pp 731–735 IEEE
Liu K, Subbarayan S, Shoults RR, Manry M, Kwan C, Lewis FI, Naccarino J (1996) Comparison of very short-term load forecasting techniques. IEEE Trans Power Syst 11(2):877–882
Article Google Scholar
Liu LM, Manry M, Amar F, Dawson MS, Fung AK (1994) Image classification in remote sensing using functional link neural networks. In: Proceedings of the IEEE southwest symposium on image analysis and interpretation. pp 54–58 IEEE
Luxhøj JT (1998) An artificial neural network for nonlinear estimation of the turbine flow-meter coefficient. Eng Appl Artif Intell 11(6):723–734
Article Google Scholar
Manry M, Dawson MS, Fung AK, Apollo SJ, Allen LS, Lyle WD, Gong W (1994) Fast training of neural networks for remote sensing. Remote Sens Rev 9(1–2):77–96
Article Google Scholar
Morgan N, Bourlard HA (1995) Neural networks for statistical recognition of continuous speech. Proc IEEE 83(5):742–772
Article Google Scholar
Nazeer Shahrin Azuan, Omar Nazaruddin, Marzuki Khalid (2007) Face recognition system using artificial neural networks approach. In: 2007 International Conference on Signal Processing, Communications and Networking. pp 420–425 IEEE
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning. NIPS Workshop on Deep Learning and Unsupervised Feature Learning
Nguyen S (2019) Affine invariance in multilayer perceptron training. Ph.D. dissertation, The University of Texas at Arlington
Nguyen Son, Tyagi Kanishka, Kheirkhah Parastoo, Manry Michael (2016) Partially affine invariant back propagation. In: 2016 International Joint Conference on Neural Networks (IJCNN). pp 811–818 IEEE
Yisok O, Sarabandi K, Ulaby FT (1992) An empirical model and an inversion technique for radar scattering from bare soil surfaces. IEEE Trans Geosci Remote Sens 30(2):370–381
Article Google Scholar
Osawa K, Tsuji Y, Ueno Y, Naruse A, Yokota R, Matsuoka S (2019) Large-scale distributed second-order optimization using kronecker-factored approximate curvature for deep convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp 12359–12367
Oz C, Leu MC (2011) American sign language word recognition with a sensory glove using artificial neural networks. Eng Appl Artif Intell 24(7):1204–1213
Article Google Scholar
Parisini T, Zoppoli R (1994) Neural networks for nonlinear state estimation. Int J Robust Nonlinear Control 4(2):231–248
Article MathSciNet Google Scholar
Patra JC, Panda G, Baliarsingh R (1994) Artificial neural network-based nonlinearity estimation of pressure sensors. IEEE Trans Instrum Meas 43(6):874–881
Article Google Scholar
Polak S, Skowron A, Brandys J, Mendyk A (2008) Artificial neural networks based modeling for pharmacoeconomics application. Appl Math Comput 203(2):482–492
MathSciNet MATH Google Scholar
Raudys S (2012) Statistical and neural classifiers: an integrated approach to design. Springer, Berlin
MATH Google Scholar
Robinson MD, Manry M, Malalur SS, Changhua Yu (2017) Properties of a batch training algorithm for feedforward networks. Neural Process Lett 45(3):841–854
Article Google Scholar
Rosenbrock HH (1960) An automatic method for finding the greatest or least value of a function. Comput J 3(3):175–184
Article MathSciNet Google Scholar
Rui Yong, El-Keib AA (1995) A review of ann-based short-term load forecasting models. In: Proceedings of the Twenty-Seventh Southeastern Symposium on System Theory, 1995. pp 78–82 IEEE
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Article Google Scholar
Saifullah Y, Manry M (1993) Classification-based segmentation of zip codes. IEEE Trans Syst, Man, Cybern 23(5):1437–1443
Article Google Scholar
Shepherd AJ (1996) Second-order methods for neural networks fast and reliable training methods for multi-layer perceptrons, chapter 1. Multi-layer perceptron training, 1st edn. Springer, Berlin, pp 1–22
Google Scholar
Tyagi K, Manry M (2018) Multi-step training of a generalized linear classifier. Neural Process Lett 50(2):1341–1360
Article Google Scholar
Tyagi K, Nguyen S, Rawat R, Manry M (2019) Second order training and sizing for the multilayer perceptron. Neural Process Lett 51(1):963–991
Article Google Scholar
Voultsidou M, Dodel S, Herrmann JM (2005) Neural networks approach to clustering of activity in fmri data. IEEE Trans Med Imaging 24(8):987–996
Article Google Scholar
Wang J, Huang J (2001) Neural network enhanced output regulation in nonlinear systems. Automatica 37(8):1189–1200
Article MathSciNet Google Scholar
Werbos P (1974) Beyond regression: new tools for prediction and analysis in the behavioral sciences. Ph.D. dissertation, Harvard University

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, The University of Texas at Arlington, Arlington, TX, USA
Son Nguyen & Michael T. Manry

Authors

Son Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Michael T. Manry
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Son Nguyen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, S., Manry, M.T. Balanced Gradient Training of Feed Forward Networks. Neural Process Lett 53, 1823–1844 (2021). https://doi.org/10.1007/s11063-021-10474-1

Download citation

Accepted: 24 February 2021
Published: 05 March 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11063-021-10474-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Balanced Gradient Training of Feed Forward Networks

Abstract

Access this article

Similar content being viewed by others

Mitigating Vanishing Gradient in SGD Optimization in Neural Networks

Fast Conjugate Gradient Algorithm for Feedforward Neural Networks

Weight and Gradient Centralization in Deep Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Balanced Gradient Training of Feed Forward Networks

Abstract

Access this article

Similar content being viewed by others

Mitigating Vanishing Gradient in SGD Optimization in Neural Networks

Fast Conjugate Gradient Algorithm for Feedforward Neural Networks

Weight and Gradient Centralization in Deep Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation