Loading [a11y]/accessibility-menu.js
Multi-accent speech recognition with hierarchical grapheme based models | IEEE Conference Publication | IEEE Xplore

Multi-accent speech recognition with hierarchical grapheme based models


Abstract:

We train grapheme-based acoustic models for speech recognition using a hierarchical recurrent neural network architecture with connectionist temporal classification (CTC)...Show More

Abstract:

We train grapheme-based acoustic models for speech recognition using a hierarchical recurrent neural network architecture with connectionist temporal classification (CTC) loss. The models learn to align utterances with phonetic transcriptions in a lower layer and graphemic transcriptions in the final layer in a multi-task learning setting. Using the grapheme predictions from a hierarchical model trained on 3 million US English utterances results in 6.7% relative word error rate (WER) increase when compared to using the phoneme-based acoustic model trained on the same data. However, we show that hierarchical grapheme-based models trained on larger acoustic data (12 million utterances) jointly for grapheme and phoneme prediction task outperform phoneme only model by 6.9% relative WER. We train a single multi-dialect model using a combined US, British, Indian and Australian English data set and then adapt the model using US English data only. This adapted multi-accent model outperforms a model exclusively trained on US English. This process is repeated for phoneme-based and grapheme-based acoustic models for all four dialects and larger improvements are obtained with grapheme models. Additionally using a multi-accent grapheme model, we observe large recognition accuracy improvements for Indian-accented utterances in Google VoiceSearch US traffic with a 40% relative WER reduction.
Date of Conference: 05-09 March 2017
Date Added to IEEE Xplore: 19 June 2017
ISBN Information:
Electronic ISSN: 2379-190X
Conference Location: New Orleans, LA, USA

Contact IEEE to Subscribe

References

References is not available for this document.