Abstract
The paper analyses suitable features for distorted speech recognition. The aim is to explore the application of command ASR system when the speech is recorded with far-distance microphones with a possible strong additive and convolutory noise. The paper analyses feasible contribution of basic spectral subtraction coupled with cepstral mean normalization in minimizing of the influence of present distortion in such far-talk channel. The results are compared with reference close-talk speech recognition system. The results show the improvement in WER for channels with low or medium SNR. Using the combination of these basic techniques WERR of 55.6% was obtained for medium distance channel and WERR of 22.5% for far distance channel.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ircing, P., Krbec, P., Hajic, J., Psutka, J., Khudanpur, S., Jelinek, F., Byrne, W.: On large vocabulary continuous speech recognition of highly inflectional language - Czech. In: INTERSPEECH, pp. 487ā490 (2001)
Newton Media: Newton Dictate Home page (2013), http://www.diktovani.cz
Nouza, J., ŽÄĆ”nskĆ½, J., David, P.: Fully Automated Approach to Broadcast News Transcription in Czech Language. In: Sojka, P., KopeÄek, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 401ā408. Springer, Heidelberg (2004)
VanÄk, J., Psutka, J.V.: Gender-dependent acoustic models fusion developed for automatic subtitling of parliament meetings broadcasted by the czech TV. In: Sojka, P., HorĆ”k, A., KopeÄek, I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp. 431ā438. Springer, Heidelberg (2010)
Chaloupka, J., Nouza, J., Zdansky, J., Cerva, P., Silovsky, J., Kroul, M.: Voice Technology Applied for Building a Prototype Smart Room. In: Esposito, A., Hussain, A., Marinaro, M., Martone, R. (eds.) Multimodal Signals. LNCS (LNAI), vol. 5398, pp. 104ā111. Springer, Heidelberg (2009)
Rajnoha, J., PollĆ”k, P.: ASR systems in noisy environment: Analysis and solutions for increasing noise robustness. Radioengineering 20(1), 74ā84 (2011)
Nouza, J., Silovsky, J.: Fast keyword spotting in telephone speech. Radioengineering 18(4), 665ā670 (2009)
Schuller, B., Wƶllmer, M., Moosmayr, T., Rigoll, G.: Speech recognition in noisy environments using a switching linear dynamic model for feature enhancement. In: INTERSPEECH 2008, pp. 1789ā1792 (2008)
Kermorvant, C.: A comparison of noise reduction techniques for robust speech recognition. Idiap-RR Idiap-RR-10-1999, IDIAP, IDIAP-RR 99-10 (1999)
Wang, L., Odani, K., Kai, A.: Evaluation of hands-free large vocabulary continuous speech recognition by blind dereverberation based on spectral subtraction by multi-channel LMS algorithm. In: Habernal, I., MatouÅ”ek, V. (eds.) TSD 2011. LNCS, vol. 6836, pp. 131ā138. Springer, Heidelberg (2011)
Sovka, P., Pollak, P., Kybic, J.: Extended spectral subtraction. In: EUSIPCO 1996, Trieste (September 1996)
Junqua, J.C., Haton, J.P.: Asr of noisy, stressed, and channel distorted speech. In: Robustness in Automatic Speech Recognition. The Kluwer International Series in Engineering and Computer Science, vol. 341, pp. 273ā323. Springer, US (1996)
Droppo, J., Acero, A.: Environmental robustness. In: Springer Handbook of Speech Processing, pp. 653ā680. Springer (2008)
Young, S., et al.: The HTK Book, Version 3.4.1, Cambridge (2009)
Fousek, P., Mizera, P., Pollak, P.: CtuCopy feature extraction tool (2013), http://noel.feld.cvut.cz/speechlab/start.php?page=download&lang=en
PollĆ”k, P., ÄernockĆ½, J.: Czech SPEECON adult database. Technical report (November 2003), http://www.speechdat.org/speecon
Boril, H., Fousek, P., Pollak, P.: Data-driven design of front-end filter bank for Lombard speech recognition. In: Proc. of Interspeech 2006, Pitssburgh (September 2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Borsky, M., Mizera, P., Pollak, P. (2013). Noise and Channel Normalized Cepstral Features for Far-speech Recognition. In: ŽeleznĆ½, M., Habernal, I., Ronzhin, A. (eds) Speech and Computer. SPECOM 2013. Lecture Notes in Computer Science(), vol 8113. Springer, Cham. https://doi.org/10.1007/978-3-319-01931-4_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-01931-4_32
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01930-7
Online ISBN: 978-3-319-01931-4
eBook Packages: Computer ScienceComputer Science (R0)