Conferences >ICASSP 2022 - 2022 IEEE Inter...

End-To-End Multi-Modal Speech Recognition with Air and Bone Conducted Speech

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Improving the performance of automatic speech recognition (ASR) in adverse acoustic environments is a long-term tough task. Although many robust ASR systems based on conv...Show More

Metadata

Abstract:

Improving the performance of automatic speech recognition (ASR) in adverse acoustic environments is a long-term tough task. Although many robust ASR systems based on conventional microphones have been developed, their performance with air-conducted (AC) speech is still far from satisfactory in low signal-to-noise-ratio (SNR) environments. Bone-conducted (BC) speech is relatively insensitive to ambient noise, and has a potential of promoting the ASR performance at such low SNR environments as an auxiliary source. In this paper, we propose a conformer-based multi-modal speech recognition system. It uses a conformer encoder and a transformer-based truncated decoder to extract the semantic information from AC and BC channels respectively. The semantic information of the two channels are re-weighted and integrated by a novel multi-modal transducer. Experimental results show the effectiveness of the proposed method. For example, given a 0 dB SNR environment, it yields a character error rate of over 59.0% lower than a noise-robust baseline conducted on AC channel only, and over 12.7% lower than a multi-modal baseline that takes the concatenated features of AC and BC speech as the input.

Published in: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 23-27 May 2022

Date Added to IEEE Xplore: 27 April 2022

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP43922.2022.9747306

Conference Location: Singapore, Singapore

Funding Agency:

Contents

References is not available for this document.

End-To-End Multi-Modal Speech Recognition with Air and Bone Conducted Speech

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

End-To-End Multi-Modal Speech Recognition with Air and Bone Conducted Speech

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?