Loading [a11y]/accessibility-menu.js
Sparsing and Smoothing for the seq2seq Models | IEEE Journals & Magazine | IEEE Xplore

Sparsing and Smoothing for the seq2seq Models


Impact Statement:Softmax is a popular activation function in deep learning. It has proven useful for various NLP tasks; one unsatisfying aspect is its dense output, which is wasteful. The...Show More

Abstract:

Current neural language models are trained to minimize cross-entropy and use softmax to compute the locally normalized probabilities over the target. While this setup pro...Show More
Impact Statement:
Softmax is a popular activation function in deep learning. It has proven useful for various NLP tasks; one unsatisfying aspect is its dense output, which is wasteful. The novel deep learning algorithm constructed in this article is called T-softmax, and it alleviates the dense output. The T-softmax significantly improves model performance without sacrificing speed. The effectiveness and flexibility of our T-softmax approach have been demonstrated in extensive evaluations over several NLP tasks. All experimental results point to the fact that T-softmax can be used as an alternative way for softmax. The algorithm is ready to support a wide variety of NLP applications, including summary generation, question answers, and math word problems.

Abstract:

Current neural language models are trained to minimize cross-entropy and use softmax to compute the locally normalized probabilities over the target. While this setup provides solid results in several natural language processing (NLP) tasks, one unsatisfying aspect is its dense output. This density is wasteful, making models hard to interpret and assigning probability mass to many implausible outputs. To overcome this problem, we propose T-softmax, a simple but effective method to draw considerably sparse probability out of neural language models than softmax. Our method avoids dense output by truncating the unreliable tail of the probability distribution to improve the model's performance. In addition, we generalize logits with temperature, a critical regularization technique, from the softmax to T-softmax. To show our approach as a drop-in replacement for softmax, we evaluate them on three NLP tasks: summary generation, question answer, and math word problem. Experimental results show that our proposed model significantly improves performance without sacrificing speed; notably, in all experiments, our method outperforms the softmax.
Published in: IEEE Transactions on Artificial Intelligence ( Volume: 4, Issue: 3, June 2023)
Page(s): 464 - 472
Date of Publication: 20 September 2022
Electronic ISSN: 2691-4581

Funding Agency:


References

References is not available for this document.