Skip to main content
Log in

A New Biologically Inspired Fuzzy Expert System-Based Voiced/Unvoiced Decision Algorithm for Speech Enhancement

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

In this paper, we propose a speech enhancement approach for a single-microphone system. The main idea is to apply a specific transformation on the speech signal depending on the voicing state of the signal. We apply a voiced/unvoiced algorithm based on the multi-scale product analysis with the use of fuzzy logic to make more cognitively inspired use of speech information. A comb filtering is applied on the voiced frames of the noisy speech signal, and a spectral subtraction is operated on the unvoiced frames of the same signal. Further, the harmonics are enhanced by performing a designed comb filtering using an adjustable bandwidth. The comb filter is tuned by an accurate fundamental frequency estimation method. The fundamental frequency estimation method is based on computing the multi-scale product analysis of the noisy speech. Experimental results show that the proposed approach is capable of reducing noise in adverse noise environments with little speech degradation and outperforms several competitive methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Hussain A, Chetouani M, Squartini S, Bastari A, Piazza F. Nonlinear speech enhancement: an overview. In: Stylianou Y, Faundez-Zanuy M, Esposito A, editors. LNCS 4391. Berlin: Springer; 2007. p. 217–48.

    Google Scholar 

  2. Boll SF. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoust Speech Signal Process. 1979;27:113–8.

    Article  Google Scholar 

  3. Hu HT, Kuo FJ, Wang HJ. Supplementary schemes to spectral subtraction for speech enhancement. Speech Commun. 2002;36:205–14.

    Article  Google Scholar 

  4. Lu Y, Loizou PC. A geometric approach to spectral subtraction. Speech Commun. 2008;50:453–514.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Cadore J, Valverde-Albacete FJ, Gallardo-Antolín A, Peláez-Moreno C. Auditory-inspired morphological processing of speech spectrograms: applications in automatic speech recognition and speech enhancement. Cognit Comput. 2013;5:426–516.

    Article  Google Scholar 

  6. Hu Y, Loizou PC. Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Trans Speech Audio Process. 2004;12:59–69.

    Article  Google Scholar 

  7. Ding GH, Huang T, Xu B. Suppression of additive noise using a power spectral density MMSE estimator. IEEE Trans Signal Process Lett. 2004;11:585–604.

    Article  Google Scholar 

  8. Cohen I. Speech enhancement using a noncausal a priori SNR estimator. IEEE Trans Signal Process Lett. 2004;11:725–34.

    Article  Google Scholar 

  9. Lee KY, Jung S. Time-domain approach using multiple Kalman filters and EM algorithm to speech enhancement with nonstationary noise. IEEE Trans Speech Audio Process. 2000;8:282–310.

    Article  Google Scholar 

  10. Zavarehei E, Vaseghi S. Speech enhancement in temporal DFT trajectories using Kalman filters. In: Interspeech, Lisbon; 2005.

  11. Huag F, Lee T, Kleijn WB. Transform-domain wiener filter for speech periodicity. In: IEEE International Conference Acoustic Speech Signal Processing (ICASSP); 2012. p. 4577–84.

  12. Hu Y, Loizou PC. A subspace approach for enhancing speech corrupted by colored noise. IEEE Signal Process Lett. 2002;9:204–13.

    Article  Google Scholar 

  13. Hardwick J, Yoo CD, Lim JS. Speech enhancement using the dual excitation model. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 1993; 367–74.

  14. Dubost S, Cappe O. Enhancement of speech based on non-parametric estimation of a time varying harmonic representation. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 2000. p. 1859–64.

  15. Deisher ME, Spanias AS. HMM-based speech enhancement using harmonic modeling. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 1997; 1175–84.

  16. Jensen J, Hansen JHL. Speech enhancement using a constrained iterative sinusoidal model. IEEE Trans Speech Audio Process. 2001;9:731–810.

    Article  Google Scholar 

  17. Squartini S, Schuller B, Hussain A. Cognitive and emotional information processing for human–machine interaction. Cognit Comput. 2012;4(4):383–93.

    Article  Google Scholar 

  18. Espinosa-Duro V, Faundez-Zanuy M, Mekyska J. Beyond cognitive signals. Cognit Comput. 2011;3(2):374–8.

    Article  Google Scholar 

  19. Esposito A. The perceptual and cognitive role of visual and auditory channels in conveying emotional information. Cognit Comput. 2009;1(3):268–311.

    Article  Google Scholar 

  20. Abel A, Hussain A. Novel two-stage audiovisual speech filtering in noisy environments. Cognit Comput. 2014;6:200–18.

    Article  Google Scholar 

  21. Abel A, Hussain A. Cognitively inspired audiovisual speech filtering: towards an intelligent, fuzzy based, multimodal, two-stage speech enhancement system. Springer Briefs in Cognitive Computation, Springer International Publishing; 2015.

  22. Rotili R, Principi E, Squartini S, Schuller B. A Real-time speech enhancement framework in noisy and reverberated acoustic scenarios. Cognit Comput. 2013;5:504–13.

    Article  Google Scholar 

  23. Narayanan A, Wang DL. Ideal ratio mask estimation using deep neural networks for robust speech recognition. In: Proceedings of ICASSP; 2013. pp. 1520–6149.

  24. Xu Y, Du J, Dai L, Lee C. An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process Lett. 2014;21:65–74.

    Article  Google Scholar 

  25. Cho E, Smith JO, Widrow B. Exploiting the harmonic structure for speech enhancement. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 2012.

  26. George E, Smith M. Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model. IEEE Trans Speech Audio Process. 1997;5:389–418.

    Article  Google Scholar 

  27. Nehorai A, Porat B. Adaptive comb filtering for harmonic signal enhancement. IEEE Trans Acoust Speech Signal Process. 1986;34:1124–215.

    Article  Google Scholar 

  28. Chen JH, Gersho A. Adaptive postfiltering for quality enhancement of coded speech. IEEE Trans Speech Audio Process. 1995;3:59–113.

    Article  Google Scholar 

  29. Grancharov V, Plasberg JH, Samuelsson J, Kleijn WB. Generalized postfilter for speech quality enhancement. IEEE Trans Audio Speech Lang Process. 2008;16:57–8.

    Article  Google Scholar 

  30. Jin W, Liu X, Scordilis MS. Speech enhancement using harmonic emphasis and comb filtering. IEEE Trans Audio Speech Lang Process. 2010;18:356–413.

    Article  Google Scholar 

  31. Ahmadi S, Spanias A. Cepstrum-based pitch detection using a new statistical V/UV classification algorithm. IEEE Trans Speech Audio Process. 1999;7:333–6.

    Article  Google Scholar 

  32. Fisher E, Tabrikian J, Dubnov S. Generalized likelihood ratio test for voiced–unvoiced decision in noisy speech using the harmonic model. IEEE Trans Audio Speech Lang Process. 2006;14:502–9.

    Article  Google Scholar 

  33. Nakatani T, Amano S, Irino T, Ishizuka K, Kondo T. A method for fundamental frequency estimation and voicing decision: application to infant utterances recorded in real acoustical environments. Speech Commun. 2008;50:203–12.

    Article  Google Scholar 

  34. Talkin D. A robust algorithm for pitch tracking (RAPT). In: Talkin D, editor. Speech coding and synthesis. Amsterdam: Elsevier; 1995. p. 495–518.

    Google Scholar 

  35. de Cheveigné A, Kawahara H. YIN, a fundamental frequency estimator for speech and music. J Acoust Soc Am. 2002;111:1917–2014.

    Article  PubMed  Google Scholar 

  36. Beritelli F, Casale S, Russo S, Serrano S. Adaptive V/UV speech detection based on characterization of background noise. EURASIP J Audio Speech Music Process. 2009;. doi:10.1155/2009/965436.

    Google Scholar 

  37. Ben Messaoud MA, Bouzid A, Ellouze, N. Estimation du pitch et décision de voisement par compression spectrale de l’autocorrélation du produit multi-échelle. In: Proceedings of Journée d’Etude de la parole (JEP-TALN-RECITAL 2012); 2012; pp. 201–8.

  38. Bouzid A, Ellouze N. Electroglottographic measures based on GCI and GOI detection using multiscale product. Int J Comput Commun Control. 2008;3:21–32.

    Article  Google Scholar 

  39. Ben Messaoud MA, Bouzid A, Ellouze N. Using multi-scale product spectrum for single and multi-pitch estimation. IET Signal Process J. 2011;5:344–412.

    Article  Google Scholar 

  40. Xu Y, Weaver JB, Healy DM, Lu J. Wavelet transform domain filters: a spatially selective noise filtration technique. IEEE Trans Image Process. 1994;3:747–812.

    Article  CAS  PubMed  Google Scholar 

  41. Sadler BM, Swami A. Analysis of multi-scale products for step detection and estimation. IEEE Trans Inf Theory. 1999;45:1043–9.

    Article  Google Scholar 

  42. Mallat S. A wavelet tour of signal processing. 3rd ed. San Diego: Academic Press; 2008.

    Google Scholar 

  43. Touzi A, Ben Messaoud MA. New approach for conception and implementation of object oriented expert system using UML. Int Arab J Inf Technol. 2009;6:99–108.

    Google Scholar 

  44. Ben Messaoud MA, Bouzid A, Ellouze N. An efficient method for fundamental frequency determination of noisy speech. In: Drugman T, Dutoit T, editors. LNCS 7911. Springer: Berlin; 2013. p. 33–41.

    Google Scholar 

  45. Hu Y, Loizou PC. Subjective comparison and evaluation of speech enhancement algorithms. Speech Commun. 2007;49:588–614.

    Article  PubMed  PubMed Central  Google Scholar 

  46. ITU-T P.862. Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. In: ITU-T Recommendation; 2000; p. 862.

  47. Camacho A, Harris JG. A sawtooth waveform inspired pitch estimator for speech and music. J Acoust Soc Am. 2008;124:1638–715.

    Article  PubMed  Google Scholar 

  48. Ben Messaoud MA, Bouzid A, Ellouze N. Autocorrelation of the speech multi-scale product for voicing decision and pitch estimation. Cognit Comput. 2010;2:151–9.

    Article  Google Scholar 

  49. Loizou PC. Speech enhancement: theory and practice. Dallas: CRC Press; 2007.

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank Dr. A. Abel for his help throughout the revision of paper by her thesis.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. A. Ben Messaoud.

Ethics declarations

Conflict of Interest

M. A. Ben Messaoud, A. Bouzid and N. Ellouze declare that they have no conflict of interest.

Informed Consent

All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008 (5). Additional informed consent was obtained from all patients for which identifying information is included in this article.

Human and Animal Rights

This article does not contain any studies with human or animal subjects performed by the any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ben Messaoud, M.A., Bouzid, A. & Ellouze, N. A New Biologically Inspired Fuzzy Expert System-Based Voiced/Unvoiced Decision Algorithm for Speech Enhancement. Cogn Comput 8, 478–493 (2016). https://doi.org/10.1007/s12559-015-9376-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-015-9376-2

Keywords

Navigation