Robust Speech Recognition Using Discrete-Mixture HMMs

Tetsuo KOSAKA
Masaharu KATOH
Masaki KOHDA

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E88-D    No.12    pp.2811-2818
Publication Date: 2005/12/01
Online ISSN: 
DOI: 10.1093/ietisy/e88-d.12.2811
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Speech and Hearing
Keyword: 
robust speech recognition,  HMM,  MAP estimation,  impulsive noise,  

Full Text: PDF(452.7KB)>>
Buy this Article



Summary: 
This paper introduces new methods of robust speech recognition using discrete-mixture HMMs (DMHMMs). The aim of this work is to develop robust speech recognition for adverse conditions that contain both stationary and non-stationary noise. In particular, we focus on the issue of impulsive noise, which is a major problem in practical speech recognition system. In this paper, two strategies were utilized to solve the problem. In the first strategy, adverse conditions are represented by an acoustic model. In this case, a large amount of training data and accurate acoustic models are required to present a variety of acoustic environments. This strategy is suitable for recognition in stationary or slow-varying noise conditions. The second is based on the idea that the corrupted frames are treated to reduce the adverse effect by compensation method. Since impulsive noise has a wide variety of features and its modeling is difficult, the second strategy is employed. In order to achieve those strategies, we propose two methods. Those methods are based on DMHMM framework which is one type of discrete HMM (DHMM). First, an estimation method of DMHMM parameters based on MAP is proposed aiming to improve trainability. The second is a method of compensating the observation probabilities of DMHMMs by threshold to reduce adverse effect of outlier values. Observation probabilities of impulsive noise tend to be much smaller than those of normal speech. The motivation in this approach is that flooring the observation probability reduces the adverse effect caused by impulsive noise. Experimental evaluations on Japanese LVCSR for read newspaper speech showed that the proposed method achieved the average error rate reduction of 48.5% in impulsive noise conditions. Also the experimental results in adverse conditions that contain both stationary and impulsive noises showed that the proposed method achieved the average error rate reduction of 28.1%.


open access publishing via