Impact Statement:With the advent of large pretrained models, low resource tasks have gained significant industry and research relevance, as acquiring and annotating large datasets at scal...Show More
Abstract:
Large pretrained models, like BERT, GPT, and Wav2Vec, have demonstrated their ability to learn transferable representations for various downstream tasks. However, obtaini...Show MoreMetadata
Impact Statement:
With the advent of large pretrained models, low resource tasks have gained significant industry and research relevance, as acquiring and annotating large datasets at scale poses challenges. Numerous approaches have been explored to leverage large pretrained models for various tasks, such as fine-tuning, linear probing, and prompt tuning. Normalization is a fundamental framework in the field of deep learning, known for its impressive contributions to various areas, including faster convergence and the ability to work with larger learning rates. Batch normalization (Ioffe et al., 2015) layer normalization (Ba et al., 2016) instance normalization (Ulyanov et al., 2016) adaptive instance normalization (Huang et al., 2017) and group normalization (Wu et al., 2018) are among the popular normalization techniques that have demonstrated success in different domains. However, despite their success in high resource settings, traditional normalization methods like batch normalization, layer normal...
Abstract:
Large pretrained models, like BERT, GPT, and Wav2Vec, have demonstrated their ability to learn transferable representations for various downstream tasks. However, obtaining a substantial amount of supervised data remains a challenge due to resource and time limitations. As a solution, researchers have turned their attention to using large pretrained datasets via techniques like fine tuning, linear probing, or prompt tuning in low-resource settings. Normalization techniques play a crucial role in speeding up training, style transfer, object detection, recurrent neural networks, and improving the generalization of deep neural networks. Despite their success in various domains, their effectiveness in low-resource NLP and speech tasks has been limited. A notable reason for this limitation is the difficulty in capturing expressiveness using affine parameters of normalization. To address this issue, we propose a novel approach called Kullback–Leibler (KL) regularized normalization or KL-Norm...
Published in: IEEE Transactions on Artificial Intelligence ( Volume: 5, Issue: 6, June 2024)