Loading [a11y]/accessibility-menu.js
Tibetan-Mandarin bilingual speech recognition based on end-to-end framework | IEEE Conference Publication | IEEE Xplore

Tibetan-Mandarin bilingual speech recognition based on end-to-end framework


Abstract:

Tibetan-Mandarin bilingual speech recognition is addressed in this paper. Because there is a great difference between the phoneme sets of these languages, it is difficult...Show More

Abstract:

Tibetan-Mandarin bilingual speech recognition is addressed in this paper. Because there is a great difference between the phoneme sets of these languages, it is difficult to find a universal phoneme set for the bilingual acoustic model (AM) in the conventional hidden Markov model (HMM) framework. The end-to-end framework based on connectionist temporal classification (CTC) loss function is proposed to solve this problem by using the character as the modeling unit instead of the phoneme. However, the sparseness problem of model units is an intractable and ineluctable fact in CTC model training, particularly under low-resource conditions. This paper explores two methods to address this problem. First, different model units are selected. The Tibetan characters and the Mandarin non-tonal syllables are used as the CTC output units. Second, an adding noise algorithm is applied to the bilingual part of the training corpus to augment Mandarin speech. The experiments are carried out on the hybrid IFLYTEK Tibetan-Mandarin corpus. Obvious improvements can be observed by using the proposed methods.
Date of Conference: 12-15 December 2017
Date Added to IEEE Xplore: 08 February 2018
ISBN Information:
Conference Location: Kuala Lumpur, Malaysia

References

References is not available for this document.