Improving spontaneous English ASR using a joint-sequence pronunciation model | IEEE Conference Publication | IEEE Xplore

Improving spontaneous English ASR using a joint-sequence pronunciation model


Abstract:

The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to occurring multiple pronunciation variants in t...Show More

Abstract:

The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to occurring multiple pronunciation variants in the utterances. Previous approaches address the multiple pronunciation problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence are not considered yet. In this paper we attempt to recover the original word sequence from the spontaneous phoneme sequence by applying a joint sequence pronunciation model. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this preliminary study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the joint-sequence approach will map from the phoneme to the word level. Our experiments use Buckeye as spontaneous speech corpus. The results show that the proposed method improves the word accuracy consistently over the conventional recognition system. The most improved system achieves up to 12.1% relative improvement to the baseline speech recognition.
Date of Conference: 18-19 October 2010
Date Added to IEEE Xplore: 13 December 2010
ISBN Information:
Conference Location: Beijing, China

Contact IEEE to Subscribe

References

References is not available for this document.