Journals & Magazines >IEEE Transactions on Artifici... >Volume: 4 Issue: 3

Sparsing and Smoothing for the seq2seq Models

Download PDF
Download References
Request Permissions
Save to
Alerts

Impact Statement:Softmax is a popular activation function in deep learning. It has proven useful for various NLP tasks; one unsatisfying aspect is its dense output, which is wasteful. The...Show More

Abstract:

Current neural language models are trained to minimize cross-entropy and use softmax to compute the locally normalized probabilities over the target. While this setup pro...Show More

Metadata

Impact Statement:

Softmax is a popular activation function in deep learning. It has proven useful for various NLP tasks; one unsatisfying aspect is its dense output, which is wasteful. The novel deep learning algorithm constructed in this article is called T-softmax, and it alleviates the dense output. The T-softmax significantly improves model performance without sacrificing speed. The effectiveness and flexibility of our T-softmax approach have been demonstrated in extensive evaluations over several NLP tasks. All experimental results point to the fact that T-softmax can be used as an alternative way for softmax. The algorithm is ready to support a wide variety of NLP applications, including summary generation, question answers, and math word problems.

Abstract:

Current neural language models are trained to minimize cross-entropy and use softmax to compute the locally normalized probabilities over the target. While this setup provides solid results in several natural language processing (NLP) tasks, one unsatisfying aspect is its dense output. This density is wasteful, making models hard to interpret and assigning probability mass to many implausible outputs. To overcome this problem, we propose T-softmax, a simple but effective method to draw considerably sparse probability out of neural language models than softmax. Our method avoids dense output by truncating the unreliable tail of the probability distribution to improve the model's performance. In addition, we generalize logits with temperature, a critical regularization technique, from the softmax to T-softmax. To show our approach as a drop-in replacement for softmax, we evaluate them on three NLP tasks: summary generation, question answer, and math word problem. Experimental results show that our proposed model significantly improves performance without sacrificing speed; notably, in all experiments, our method outperforms the softmax.

Published in: IEEE Transactions on Artificial Intelligence ( Volume: 4, Issue: 3, June 2023)

Page(s): 464 - 472

Date of Publication: 20 September 2022

Electronic ISSN: 2691-4581

DOI: 10.1109/TAI.2022.3207982

Funding Agency:

Contents

References is not available for this document.

Sparsing and Smoothing for the seq2seq Models

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Sparsing and Smoothing for the seq2seq Models

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?