Conferences >2017 IEEE International Confe...

Multi-accent speech recognition with hierarchical grapheme based models

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We train grapheme-based acoustic models for speech recognition using a hierarchical recurrent neural network architecture with connectionist temporal classification (CTC)...Show More

Metadata

Abstract:

We train grapheme-based acoustic models for speech recognition using a hierarchical recurrent neural network architecture with connectionist temporal classification (CTC) loss. The models learn to align utterances with phonetic transcriptions in a lower layer and graphemic transcriptions in the final layer in a multi-task learning setting. Using the grapheme predictions from a hierarchical model trained on 3 million US English utterances results in 6.7% relative word error rate (WER) increase when compared to using the phoneme-based acoustic model trained on the same data. However, we show that hierarchical grapheme-based models trained on larger acoustic data (12 million utterances) jointly for grapheme and phoneme prediction task outperform phoneme only model by 6.9% relative WER. We train a single multi-dialect model using a combined US, British, Indian and Australian English data set and then adapt the model using US English data only. This adapted multi-accent model outperforms a model exclusively trained on US English. This process is repeated for phoneme-based and grapheme-based acoustic models for all four dialects and larger improvements are obtained with grapheme models. Additionally using a multi-accent grapheme model, we observe large recognition accuracy improvements for Indian-accented utterances in Google VoiceSearch US traffic with a 40% relative WER reduction.

Published in: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 05-09 March 2017

Date Added to IEEE Xplore: 19 June 2017

ISBN Information:

Electronic ISSN: 2379-190X

DOI: 10.1109/ICASSP.2017.7953071

Conference Location: New Orleans, LA, USA

Contents

References is not available for this document.

Multi-accent speech recognition with hierarchical grapheme based models

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Multi-accent speech recognition with hierarchical grapheme based models

Alerts

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?