Conferences >2023 IEEE International Confe...

NormSoftmax: Normalizing the Input of Softmax to Accelerate and Stabilize Training

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Softmax is a basic function that normalizes a vector to a probability distribution and is widely used in machine learning, most notably in cross-entropy loss function and...Show More

Metadata

Abstract:

Softmax is a basic function that normalizes a vector to a probability distribution and is widely used in machine learning, most notably in cross-entropy loss function and dot product attention operations. However, the optimization of softmax-based models is sensitive to the input statistics change. We observe that the input of softmax changes significantly during the initial training stage, causing slow and unstable convergence when training the model from scratch. To remedy the optimization difficulty of softmax, we propose a simple yet effective substitution, named NormSoftmax, where the input vector is first normalized to unit variance and then fed to the standard softmax function. Similar to other existing normalization layers in machine learning models, NormSoftmax can stabilize and accelerate the training process, and also increase the robustness of the training procedure against hyperparameters. Experiments on Transformer-based models and convolutional neural networks validate that our proposed NormSoftmax is an effective plug-and-play module to stabilize and speed up the optimization of neural networks with cross-entropy loss or dot-product attention operations.

Published in: 2023 IEEE International Conference on Omni-layer Intelligent Systems (COINS)

Date of Conference: 23-25 July 2023

Date Added to IEEE Xplore: 27 July 2023

ISBN Information:

DOI: 10.1109/COINS57856.2023.10189242

Conference Location: Berlin, Germany