ESAformer: Enhanced Self-Attention for Automatic Speech Recognition | IEEE Journals & Magazine | IEEE Xplore

ESAformer: Enhanced Self-Attention for Automatic Speech Recognition


Abstract:

In this letter, an Enhanced Self-Attention (ESA) module has been put forward for feature extraction. The proposed ESA is integrated with the recursive gated convolution a...Show More

Abstract:

In this letter, an Enhanced Self-Attention (ESA) module has been put forward for feature extraction. The proposed ESA is integrated with the recursive gated convolution and self-attention mechanism. In particular, the former is used to capture multi-order feature interaction and the latter is for global feature extraction. In addition, the location of interest that is suitable for inserting the ESA is also worth being explored. In this letter, the ESA is embedded into the encoder layer of the Transformer network for automatic speech recognition (ASR) tasks, and this newly proposed model is named ESAformer. The effectiveness of the ESAformer has been validated using three datasets, that are Aishell-1, HKUST and WSJ. Experimental results show that, compared with the Transformer network, 0.8% CER, 1.2% CER and 0.7%/0.4% WER, improvement for these three mentioned datasets, respectively, can be achieved.
Published in: IEEE Signal Processing Letters ( Volume: 31)
Page(s): 471 - 475
Date of Publication: 25 January 2024

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.