IEICE Trans - ATR Parallel Decoding Based Speech Recognition System Robust to Noise and Speaking Styles

ATR Parallel Decoding Based Speech Recognition System Robust to Noise and Speaking Styles

Shigeki MATSUDA
Takatoshi JITSUHIRO
Konstantin MARKOV
Satoshi NAKAMURA

Publication
IEICE TRANSACTIONS on Information and Systems Vol.E89-D No.3 pp.989-997
Publication Date: 2006/03/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e89-d.3.989
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Statistical Modeling for Speech Processing)
Category: Speech Recognition
Keyword:
automatic speech recognition, parallel decoding, multiple acoustic models, fast noise adaptation, speaking style, hyper-articulated speech,

Full Text: PDF(644KB)>>

Summary:
In this paper, we describe a parallel decoding-based ASR system developed of ATR that is robust to noise type, SNR and speaking style. It is difficult to recognize speech affected by various factors, especially when an ASR system contains only a single acoustic model. One solution is to employ multiple acoustic models, one model for each different condition. Even though the robustness of each acoustic model is limited, the whole ASR system can handle various conditions appropriately. In our system, there are two recognition sub-systems which use different features such as MFCC and Differential MFCC (DMFCC). Each sub-system has several acoustic models depending on SNR, speaker gender and speaking style, and during recognition each acoustic model is adapted by fast noise adaptation. From each sub-system, one hypothesis is selected based on posterior probability. The final recognition result is obtained by combining the best hypotheses from the two sub-systems. On the AURORA-2J task used widely for the evaluation of noise robustness, our system achieved higher recognition performance than a system which contains only a single model. Also, our system was tested using normal and hyper-articulated speech contaminated by several background noises, and exhibited high robustness to noise and speaking styles.

open access publishing via