Abstract:
This paper presents the contribution of energy normalization technique in automatic speech recognition in babble noise, where machine assumes that speech and noise have t...Show MoreMetadata
Abstract:
This paper presents the contribution of energy normalization technique in automatic speech recognition in babble noise, where machine assumes that speech and noise have the same level of energy, therefore loudness. Similarly, loudness of target speech and noise is an important contributing factor while recognizing speech by humans in everyday conditions. Louder speech is better recognized than non louder speech by humans, even if they are approaching to the listeners at a same signal to noise ratio (SNR). This phenomenon has been tested over the machines and the recognition performance roughly varies from 75% to 90% across a wide range of SNRs. In exchange, human recognition performance is more SNR-dependent: it varies from 30% to 95%. By using energy normalization, the machines have a poor recognition rate in average in comparison to the performance of humans in less noisy conditions (positive SNR), but tend to outperform humans in high noisy conditions (negative SNR like -4dB, -6dB). It is also confirmed by this study that formant processing has no significant effect in recognizing speech in noise. Subsequently, it implies that formant based vocal tract length normalization is unable to improve the performance of machines in noise.
Published in: 2011 IEEE 7th International Conference on Intelligent Computer Communication and Processing
Date of Conference: 25-27 August 2011
Date Added to IEEE Xplore: 20 October 2011
ISBN Information: