Journals & Magazines >IEEE Signal Processing Letters >Volume: 31

ESAformer: Enhanced Self-Attention for Automatic Speech Recognition

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In this letter, an Enhanced Self-Attention (ESA) module has been put forward for feature extraction. The proposed ESA is integrated with the recursive gated convolution a...Show More

Metadata

Abstract:

In this letter, an Enhanced Self-Attention (ESA) module has been put forward for feature extraction. The proposed ESA is integrated with the recursive gated convolution and self-attention mechanism. In particular, the former is used to capture multi-order feature interaction and the latter is for global feature extraction. In addition, the location of interest that is suitable for inserting the ESA is also worth being explored. In this letter, the ESA is embedded into the encoder layer of the Transformer network for automatic speech recognition (ASR) tasks, and this newly proposed model is named ESAformer. The effectiveness of the ESAformer has been validated using three datasets, that are Aishell-1, HKUST and WSJ. Experimental results show that, compared with the Transformer network, 0.8% CER, 1.2% CER and 0.7%/0.4% WER, improvement for these three mentioned datasets, respectively, can be achieved.

Published in: IEEE Signal Processing Letters ( Volume: 31)

Page(s): 471 - 475

Date of Publication: 25 January 2024

ISSN Information:

DOI: 10.1109/LSP.2024.3358754

Funding Agency:

Contents

References is not available for this document.

ESAformer: Enhanced Self-Attention for Automatic Speech Recognition

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

ESAformer: Enhanced Self-Attention for Automatic Speech Recognition

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?