research-article

Are saddles good enough for neural networks

Authors:

Adepu Ravi Sankar,

Vineeth N BalasubramanianAuthors Info & Claims

CODS-COMAD '18: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

Pages 37 - 45

https://doi.org/10.1145/3152494.3152498

Published: 11 January 2018 Publication History

Abstract

Recent years have seen a growing interest in understanding neural networks from an optimization perspective. It is understood now that converging to low-cost local minima is sufficient for such models to become effective in practice. However, in this work, we propose a new hypothesis based on recent theoretical findings and empirical studies that neural network models actually converge to saddle points with high degeneracy. Our findings from this work are new, and can have a significant impact on the development of gradient descent based methods for training neural networks. We validated our hypotheses using an extensive experimental evaluation on standard datasets such as MNIST and CIFAR-10, and also showed that recent efforts that attempt to escape saddles finally converge to saddles with high degeneracy, which we define as 'good saddles'. We also verified the famous Wigner's Semicircle Law in our experimental results.

References

[1]

Animashree Anandkumar and Rong Ge. 2016. Efficient approaches for escaping higher order saddle points in non-convex optimization. In Proceedings of the 29th Conference on Learning Theory, COLT 2016, New York, USA, June 23-26, 2016. 81--102.

[2]

Vijay Badrinarayanan, Bamdev Mishra, and Roberto Cipolla. 2015. Understanding symmetries in deep networks. arXiv preprint arXiv:1511.01029 (2015).

[3]

P. Baldi and K. Hornik. 1989. Neural Networks and Principal Component Analysis: Learning from Examples Without Local Minima. Neural Netw. (1989).

Digital Library

[4]

Alan J. Bray and David S. Dean. 2007. Statistics of critical points of Gaussian fields on large-dimensional spaces. Physical Review Letters 98 (2007), 150201. https://hal.archives-ouvertes.fr/hal-00124320 5 pages.

[5]

Anna Choromanska, Mikael Henaff, Michaël Mathieu, Gérard Ben Arous, and Yann LeCun. 2015. The Loss Surfaces of Multilayer Networks. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2015, San Diego, California, USA, May 9-12, 2015.

[6]

Yann N. Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. 2014. Identifying and Attacking the Saddle Point Problem in High-dimensional Non-convex Optimization. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS'14).

Digital Library

[7]

John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Mach. Learn. Res. 12 (July 2011),2121--2159.

Digital Library

[8]

Rong Ge, Furong Huang, Chi Jin, and Yang Yuan. 2015. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition. In Proceedings of The 28th Conference on Learning Theory, COLT 2015, Paris, France, July 3-6, 2015. 797--842.

[9]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATSâĂ&Zacute;10). Society for Artificial Intelligence and Statistics.

[10]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT Press.

Digital Library

[11]

Ian J Goodfellow, Oriol Vinyals, and Andrew M Saxe. 2014. Qualitatively characterizing neural network optimization problems. arXiv preprint arXiv:1412.6544 (2014).

[12]

Moritz Hardt, Ben Recht, and Yoram Singer. 2016. Train faster, generalize better: Stability of stochastic gradient descent. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016. 1225--1234.

Digital Library

[13]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In: arXiv:abs/1502.01852 {cs.CV. (2015).

[14]

Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M. Kakade, and Michael I. Jordan. 2017. How to Escape Saddle Points Efficiently. Arxiv. https://arxiv.org/pdf/1703.00887

[15]

Kenji Kawaguchi. 2016. Deep Learning without Poor Local Minima. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U.V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 586--594.

Digital Library

[16]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014).

[17]

Jason D. Lee, Max Simchowitz, Michael I. Jordan, and Benjamin Recht. 2016. Gradient Descent Only Converges to Minimizers. In Proceedings of the 29th Conference on Learning Theory, COLT 2016, New York, USA, June 23-26, 2016. 1246--1257.

[18]

Razvan Pascanu, Yann N Dauphin, Surya Ganguli, and Yoshua Bengio. 2014. On the saddle point problem for non-convex optimization. arXiv preprint arXiv:1405.4604 (2014).

[19]

Andrew M Saxe, James L McClelland, and Surya Ganguli. 2013. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120 (2013).

[20]

Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin A. Riedmiller. 2014. Striving for Simplicity: The All Convolutional Net. CoRR abs/1412.6806 (2014). http://arxiv.org/abs/1412.6806

[21]

Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. 2013. On the Importance of Initialization and Momentum in Deep Learning. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (ICML'13).

Digital Library

[22]

S. Watanabe. 2007. Almost All Learning Machines are Singular. In 2007 IEEE Symposium on Foundations of Computational Intelligence. 383--388.

[23]

Eugene P. Wigner. 1958. On the Distribution of the Roots of Certain Symmetric Matrices. The Annals of Mathematics 67 (1958).

[24]

Stephen Wright and Jorge Nocedal. 1999. Numerical optimization. Springer Science 35 (1999), 67--68.

[25]

Sergey Zagoruyko and Nikos Komodakis. 2016. Wide Residual Networks. In BMVC.

Cited By

Zadachyn VBebiya M(2024)Combined methods for solving degenerate unconstrained optimization problemsUkrains’kyi Matematychnyi Zhurnal10.3842/umzh.v76i5.739576:5(695-718)Online publication date: 2-Jun-2024
https://doi.org/10.3842/umzh.v76i5.7395
Nasiri Mahd ZKokabi AFallahzadeh MNaghibi Z(2024)Exploring nonlinear correlations among transition metal nanocluster properties using deep learning: a comparative analysis with LOO-CV method and cosine similarityNanotechnology10.1088/1361-6528/ad892c36:4(045701)Online publication date: 4-Nov-2024
https://doi.org/10.1088/1361-6528/ad892c
Zadachyn VBebiya M(2024)Combined Methods for Solving Degenerate Unconstrained Optimization ProblemsUkrainian Mathematical Journal10.1007/s11253-024-02353-476:5(777-804)Online publication date: 26-Oct-2024
https://doi.org/10.1007/s11253-024-02353-4

Recommendations

On the approximation of functions by tanh neural networks
Abstract
We derive bounds on the error, in high-order Sobolev norms, incurred in the approximation of Sobolev-regular as well as analytic functions by neural networks with the hyperbolic tangent activation function. These bounds provide explicit estimates ...
Highlights
- Explicit bounds for function approximation in Sobolev norms by tanh neural networks.
- Tanh networks with 2 hidden layers are at least as expressive as deeper ReLU networks.
- Improved convergence rate for neural network approximation ...
Artificial neural networks: learning algorithms, performance evaluation, and applications
An exponential penalty method for nondifferentiable minimax problems with general constraints

A well-known approach to constrained minimization is via a sequence of unconstrained optimization computations applied to a penalty function. This paper shows how it is possible to generalize Murphy's penalty method for differentiable problems of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CODS-COMAD '18: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

January 2018

379 pages

ISBN:9781450363419

DOI:10.1145/3152494

Conference Chair:
Sayan Ranu
IIT Delhi
,
General Chairs:
Niloy Ganguly
IIT Kharagpur
,
Raghu Ramakrishnan
Microsoft
,
Program Chairs:
Sunita Sarawagi
IIT Bombay
,
Shourya Roy
American Express Big Data Labs

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 January 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Intel Technology India Pvt Ltd
Ministry of Human Resource Development, Govt of India

Conference

CoDS-COMAD '18

CoDS-COMAD '18: The ACM India Joint International Conference on Data Science & Management of Data

January 11 - 13, 2018

Goa, India

Acceptance Rates

CODS-COMAD '18 Paper Acceptance Rate 50 of 150 submissions, 33%;

Overall Acceptance Rate 197 of 680 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
142
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zadachyn VBebiya M(2024)Combined methods for solving degenerate unconstrained optimization problemsUkrains’kyi Matematychnyi Zhurnal10.3842/umzh.v76i5.739576:5(695-718)Online publication date: 2-Jun-2024
https://doi.org/10.3842/umzh.v76i5.7395
Nasiri Mahd ZKokabi AFallahzadeh MNaghibi Z(2024)Exploring nonlinear correlations among transition metal nanocluster properties using deep learning: a comparative analysis with LOO-CV method and cosine similarityNanotechnology10.1088/1361-6528/ad892c36:4(045701)Online publication date: 4-Nov-2024
https://doi.org/10.1088/1361-6528/ad892c
Zadachyn VBebiya M(2024)Combined Methods for Solving Degenerate Unconstrained Optimization ProblemsUkrainian Mathematical Journal10.1007/s11253-024-02353-476:5(777-804)Online publication date: 26-Oct-2024
https://doi.org/10.1007/s11253-024-02353-4

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten