Abstract:
The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to occurring multiple pronunciation variants in t...Show MoreMetadata
Abstract:
The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to occurring multiple pronunciation variants in the utterances. Previous approaches address the multiple pronunciation problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence are not considered yet. In this paper we attempt to recover the original word sequence from the spontaneous phoneme sequence by applying a joint sequence pronunciation model. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this preliminary study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the joint-sequence approach will map from the phoneme to the word level. Our experiments use Buckeye as spontaneous speech corpus. The results show that the proposed method improves the word accuracy consistently over the conventional recognition system. The most improved system achieves up to 12.1% relative improvement to the baseline speech recognition.
Published in: 2010 4th International Universal Communication Symposium
Date of Conference: 18-19 October 2010
Date Added to IEEE Xplore: 13 December 2010
ISBN Information: