Loading web-font TeX/Main/Regular
ToEx: Accelerating Generation Stage of Transformer-Based Language Models via Token-Adaptive Early Exit | IEEE Journals & Magazine | IEEE Xplore

ToEx: Accelerating Generation Stage of Transformer-Based Language Models via Token-Adaptive Early Exit


Abstract:

Transformer-based language models have recently gained popularity in numerous natural language processing (NLP) applications due to their superior performance compared to...Show More

Abstract:

Transformer-based language models have recently gained popularity in numerous natural language processing (NLP) applications due to their superior performance compared to traditional algorithms. These models involve two execution stages: summarization and generation. The generation stage accounts for a significant portion of the total execution time due to its auto-regressive property, which necessitates considerable and repetitive off-chip accesses. Consequently, our objective is to minimize off-chip accesses during the generation stage to expedite transformer execution. To achieve the goal, we propose a token-adaptive early exit (ToEx) that generates output tokens using fewer decoders, thereby reducing off-chip accesses for loading weight parameters. Although our approach has the potential to minimize data communication, it brings two challenges: 1) inaccurate self-attention computation, and 2) significant overhead for exit decision. To overcome these challenges, we introduce a methodology that facilitates accurate self-attention by lazily performing computations for previously exited tokens. Moreover, we mitigate the overhead of exit decision by incorporating a lightweight output embedding layer. We also present a hardware design to efficiently support the proposed work. Evaluation results demonstrate that our work can reduce the number of decoders by 2.6\times on average. Accordingly, it achieves 3.2\times speedup on average compared to transformer execution without our work.
Published in: IEEE Transactions on Computers ( Volume: 73, Issue: 9, September 2024)
Page(s): 2248 - 2261
Date of Publication: 21 May 2024

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.