IEICE Trans - Robust Speech Recognition by Using Compensated Acoustic Scores

Robust Speech Recognition by Using Compensated Acoustic Scores

Shoei SATO
Kazuo ONOE
Akio KOBAYASHI
Toru IMAI

Publication
IEICE TRANSACTIONS on Information and Systems Vol.E89-D No.3 pp.915-921
Publication Date: 2006/03/01
Online ISSN: 1745-1361
DOI: 10.1093/ietisy/e89-d.3.915
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Statistical Modeling for Speech Processing)
Category: Speech Recognition
Keyword:
speech recognition, noisy environment, acoustic score,

Full Text: PDF(307.5KB)>>

Summary:
This paper proposes a new compensation method of acoustic scores in the Viterbi search for robust speech recognition. This method introduces noise models to represent a wide variety of noises and realizes robust decoding together with conventional techniques of subtraction and adaptation. This method uses likelihoods of noise models in two ways. One is to calculate a confidence factor for each input frame by comparing likelihoods of speech models and noise models. Then the weight of the acoustic score for a noisy frame is reduced according to the value of the confidence factor for compensation. The other is to use the likelihood of noise model as an alternative that of a silence model when given noisy input. Since a lower confidence factor compresses acoustic scores, the decoder rather relies on language scores and keeps more hypotheses within a fixed search depth for a noisy frame. An experiment using commentary transcriptions of a broadcast sports program (MLB: Major League Baseball) showed that the proposed method obtained a 6.7% relative word error reduction. The method also reduced the relative error rate of key words by 17.9%, and this is expected lead to an improvement metadata extraction accuracy.

open access publishing via