Skip to main content
Log in

Robust vocabulary recognition clustering model using an average estimator least mean square filter in noisy environments

  • Original Article
  • Published:
Personal and Ubiquitous Computing Aims and scope Submit manuscript

Abstract

Noise estimation and detection algorithms must adapt to a changing environment quickly, so they use a least mean square (LMS) filter. However, there is a downside. An LMS filter is very low, and it consequently lowers speech recognition rates. In order to overcome such a weak point, we propose a method to establish a robust speech recognition clustering model for noisy environments. Since this proposed method allows the cancelation of noise with an average estimator least mean square (AELMS) filter in a noisy environment, a robust speech recognition clustering model can be established. With the AELMS filter, which can preserve source features of speech and decrease the degradation of speech information, noise in a contaminated speech signal gets canceled, and a Gaussian state model is clustered as a method to make noise more robust. By composing a Gaussian clustering model, which is a robust speech recognition clustering model, in a noisy environment, recognition performance was evaluated. The study shows that the signal-to-noise ratio of speech, which was improved by canceling environment noise that kept changing, was enhanced by 2.8 dB on average, and recognition rate improved by 4.1 %.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Wu BF, Wang KC (2005) Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments. IEEE Trans Speech Audio Process 13(5):762–775

    Article  Google Scholar 

  2. Elmezain M, Al-Hamadi A, Appenrodt J, Michaelis B (2008) A hidden markov model-based continuous gesture recognition system for hand motion trajectory. ICPR 2008, pp 1–4

  3. Homer J, Mareels I (2004) LS detection guided NLMS estimation of sparse system. Proceedings of the IEEE 2004 international conference on acoustic. Speech and signal processing (ICASSP). Montreal, Quebec, Canada

  4. Han JS, Chung KY, Kim GJ (2013) Policy on literature content based on software as service. Multimed Tools Appl. doi:10.1007/s11042-013-1664-9

    Google Scholar 

  5. Li Q, Zheng J, Tsai A, Zhou Q (2002) Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Trans Speech Audio Process 10(3):146–157

    Article  Google Scholar 

  6. ETSI standard document (2003) Speech processing, transmission and quality aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms, ETSI ES 202 050 v.1.1.3 (2003-11)

  7. Ahmed B, Holmes PH (2004) A voice activity detector using the Chi square test. In: Acoustics, speech, and signal processing, 2004. Proceedings. Royal Melbourne Institute of Technology, Victoria, pp I-625-8

  8. Yamagishi J, Kobayashi T, Nakano Y, Ogata K, Isogai J (2009) Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans Audio Speech Lang Process 17(1):66–83

    Article  Google Scholar 

  9. Nose T, Yamagishi J, Kobayashi T (2007) A style control technique for HMM-based expressive speech synthesis. IEICE Trans Inf Syst E90-D(9):1406–1413

    Article  Google Scholar 

  10. Zen H, Tokuda K, Masuko T, Kobayashi T, Kitamura T (2007) A hidden semi-Markov model-based speech synthesis system. IEICE Trans Inf Syst E90-D(5):825–834

    Article  Google Scholar 

  11. Yamagishi J, Nose T, Zen H, Toda T, Ling Z-H, Toda T, Tokuda K, King S, Renals S (2009) A robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE Trans Audio Speech Lang Process 17(6):1208–1230

    Google Scholar 

  12. Oh SY, Chung KY (2013) Target speech feature extraction using non-parametric correlation coefficient. Cluster Comput. doi:10.1007/s10586-013-0284-5

    Google Scholar 

  13. Kim GH, Kim YG, Chung KY (2013) Towards virtualized and automated software performance test architecture. Multimed Tools Appl. doi:10.1007/s11042-013-1536-3

    Google Scholar 

  14. Kang SK, Chung KY, Lee JH (2013) Development of head detection and tracking systems for visual surveillance. Pers Ubiquit Comput. doi:10.1007/s00779-013-0668-9

    Google Scholar 

  15. Kim JH, Chung KY (2013) Ontology-based healthcare context information model to implement ubiquitous environment. Multimed Tools Appl. doi:10.1007/s11042-011-0919-6

    Google Scholar 

  16. Chung KY (2013) Effect of facial makeup style recommendation on visual sensibility. Multimed Tools Appl. doi:10.1007/s11042-013-1355-6

    Google Scholar 

  17. Kim SH, Chung KY (2013) 3D simulator for stability analysis of finite slope causing plane activity. Multimed Tools Appl. doi:10.1007/s11042-013-1356-5

    Google Scholar 

  18. Tuske Z, Mihajlik P, Tobler Z, Fegyo T (2005) Robust voice activity detection based on the entropy of noise suppressed spectrum, interspeech 2005, Lisbon Portugal, pp 245–248

  19. Baek SJ, Han JS, Chung KY (2013) Dynamic reconfiguration based on goal-scenario by adaptation strategy. Wireless Pers Commun. doi:10.1007/s11277-013-1239-0

    Google Scholar 

  20. Kim SH, Chung KY (2013) Medical information service system based on human 3D anatomical model. Multimed Tools Appl. doi:10.1007/s11042-013-1584-8

    Google Scholar 

  21. Ko JW, Chung KY, Han JS (2013) Model transformation verification using similarity and graph comparison algorithm. Multimed Tools Appl. doi:10.1007/s11042-013-1581-y

    Google Scholar 

  22. Kozel D, Apostoaia C (2007) Colored noise reduction using bark scale spectral subtraction, statistics, and multiple time frames. IEEE EIT proceedings 2007, Chicago USA, pp 416–421

  23. Wang KC, Tsai YH (2008) Voice activity detection algorithm with low signal-to-noise ratios based on spectrum entropy. Second international symposium on universal communication 2008, pp 423–428

Download references

Acknowledgments

This work was supported by the Gachon University research fund of 2013 (GCU-2013-R235).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sang-Yeob Oh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ahn, CS., Oh, SY. Robust vocabulary recognition clustering model using an average estimator least mean square filter in noisy environments. Pers Ubiquit Comput 18, 1295–1301 (2014). https://doi.org/10.1007/s00779-013-0732-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00779-013-0732-5

Keywords

Navigation