How Shrinking Gradient Noise Helps the Performance of Neural Networks | IEEE Conference Publication | IEEE Xplore

How Shrinking Gradient Noise Helps the Performance of Neural Networks


Abstract:

In the big data era, neural networks are the workhorse behind many applications. As an increasing size of training data are available, training complex architectures such...Show More

Abstract:

In the big data era, neural networks are the workhorse behind many applications. As an increasing size of training data are available, training complex architectures such as neural networks, is possible. Adding gradient noise has been widely used in training models with gradient descent, for example, in non-convex optimization and robust training. Previous works have experimentally shown that adding suitable step-wise shrinking noise will promote the test performance of highly over-parameterized neural networks. However, it is not well understood as to why adding suitable step-wise shrinking noise can help the test performance. In this paper, we study the role of gradient noise towards understanding such phenomenon. Motivated by the observation that there is an inherent tension between training accuracy and generalization ability due to the size and shrinkage rate of the added noise, we study the training dynamics and generalization gaps of noisy gradient descent of shallow and wide neural networks. By introducing a time dependent kernel and theoretically characterizing such inherent tension, our analysis sheds light on why gradient Langevin dynamics cannot reach good test accuracy, when compared with training without noise, while suitably shrinking gradient noise can yield good and even better test accuracy.
Date of Conference: 15-18 December 2021
Date Added to IEEE Xplore: 13 January 2022
ISBN Information:
Conference Location: Orlando, FL, USA

Contact IEEE to Subscribe

References

References is not available for this document.