Abstract
Noise estimation and detection algorithms must adapt to a changing environment quickly, so they use a least mean square (LMS) filter. However, there is a downside. An LMS filter is very low, and it consequently lowers speech recognition rates. In order to overcome such a weak point, we propose a method to establish a robust speech recognition clustering model for noisy environments. Since this proposed method allows the cancelation of noise with an average estimator least mean square (AELMS) filter in a noisy environment, a robust speech recognition clustering model can be established. With the AELMS filter, which can preserve source features of speech and decrease the degradation of speech information, noise in a contaminated speech signal gets canceled, and a Gaussian state model is clustered as a method to make noise more robust. By composing a Gaussian clustering model, which is a robust speech recognition clustering model, in a noisy environment, recognition performance was evaluated. The study shows that the signal-to-noise ratio of speech, which was improved by canceling environment noise that kept changing, was enhanced by 2.8 dB on average, and recognition rate improved by 4.1 %.
Similar content being viewed by others
References
Wu BF, Wang KC (2005) Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments. IEEE Trans Speech Audio Process 13(5):762–775
Elmezain M, Al-Hamadi A, Appenrodt J, Michaelis B (2008) A hidden markov model-based continuous gesture recognition system for hand motion trajectory. ICPR 2008, pp 1–4
Homer J, Mareels I (2004) LS detection guided NLMS estimation of sparse system. Proceedings of the IEEE 2004 international conference on acoustic. Speech and signal processing (ICASSP). Montreal, Quebec, Canada
Han JS, Chung KY, Kim GJ (2013) Policy on literature content based on software as service. Multimed Tools Appl. doi:10.1007/s11042-013-1664-9
Li Q, Zheng J, Tsai A, Zhou Q (2002) Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Trans Speech Audio Process 10(3):146–157
ETSI standard document (2003) Speech processing, transmission and quality aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms, ETSI ES 202 050 v.1.1.3 (2003-11)
Ahmed B, Holmes PH (2004) A voice activity detector using the Chi square test. In: Acoustics, speech, and signal processing, 2004. Proceedings. Royal Melbourne Institute of Technology, Victoria, pp I-625-8
Yamagishi J, Kobayashi T, Nakano Y, Ogata K, Isogai J (2009) Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans Audio Speech Lang Process 17(1):66–83
Nose T, Yamagishi J, Kobayashi T (2007) A style control technique for HMM-based expressive speech synthesis. IEICE Trans Inf Syst E90-D(9):1406–1413
Zen H, Tokuda K, Masuko T, Kobayashi T, Kitamura T (2007) A hidden semi-Markov model-based speech synthesis system. IEICE Trans Inf Syst E90-D(5):825–834
Yamagishi J, Nose T, Zen H, Toda T, Ling Z-H, Toda T, Tokuda K, King S, Renals S (2009) A robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE Trans Audio Speech Lang Process 17(6):1208–1230
Oh SY, Chung KY (2013) Target speech feature extraction using non-parametric correlation coefficient. Cluster Comput. doi:10.1007/s10586-013-0284-5
Kim GH, Kim YG, Chung KY (2013) Towards virtualized and automated software performance test architecture. Multimed Tools Appl. doi:10.1007/s11042-013-1536-3
Kang SK, Chung KY, Lee JH (2013) Development of head detection and tracking systems for visual surveillance. Pers Ubiquit Comput. doi:10.1007/s00779-013-0668-9
Kim JH, Chung KY (2013) Ontology-based healthcare context information model to implement ubiquitous environment. Multimed Tools Appl. doi:10.1007/s11042-011-0919-6
Chung KY (2013) Effect of facial makeup style recommendation on visual sensibility. Multimed Tools Appl. doi:10.1007/s11042-013-1355-6
Kim SH, Chung KY (2013) 3D simulator for stability analysis of finite slope causing plane activity. Multimed Tools Appl. doi:10.1007/s11042-013-1356-5
Tuske Z, Mihajlik P, Tobler Z, Fegyo T (2005) Robust voice activity detection based on the entropy of noise suppressed spectrum, interspeech 2005, Lisbon Portugal, pp 245–248
Baek SJ, Han JS, Chung KY (2013) Dynamic reconfiguration based on goal-scenario by adaptation strategy. Wireless Pers Commun. doi:10.1007/s11277-013-1239-0
Kim SH, Chung KY (2013) Medical information service system based on human 3D anatomical model. Multimed Tools Appl. doi:10.1007/s11042-013-1584-8
Ko JW, Chung KY, Han JS (2013) Model transformation verification using similarity and graph comparison algorithm. Multimed Tools Appl. doi:10.1007/s11042-013-1581-y
Kozel D, Apostoaia C (2007) Colored noise reduction using bark scale spectral subtraction, statistics, and multiple time frames. IEEE EIT proceedings 2007, Chicago USA, pp 416–421
Wang KC, Tsai YH (2008) Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy. Second international symposium on universal communication 2008, pp 423–428
Acknowledgments
This work was supported by the Gachon University research fund of 2013 (GCU-2013-R235).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ahn, CS., Oh, SY. Robust vocabulary recognition clustering model using an average estimator least mean square filter in noisy environments. Pers Ubiquit Comput 18, 1295–1301 (2014). https://doi.org/10.1007/s00779-013-0732-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00779-013-0732-5