Skip to main content
Log in

A Classification-Based Non-local Means Adaptive Filtering for Speech Enhancement and Its FPGA Prototype

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Non-local mean (NLM) adaptive filtering is a well-explored technique for the denoising of images and electrocardiogram signals. In NLM filtering, the signal value at a particular sample point is estimated by a weighted average of sample points over a search neighborhood. The NLM filter effectively removes the noise when there are similarities among the samples of the signal over the search neighborhood. Due to the time-varying nature of the vocal-tract system and excitation source, the magnitude and frequency of the speech signal vary over the time. Consequently, NLM filtering is not effective in removing the noise components from the speech signal. The similarity among the sample points can be improved by classifying the speech signal into different categories depending on the magnitude and frequency components. In a given speech signal, the vowel-like speech (VLS) are high-magnitude regions compared to the other non-VLS. The vowel, semivowel and diphthong sound units are collectively termed as VLS. In this work, at the first level, the noisy speech signal is classified into VLS and non-VLS for improving similarity. Next, the non-local similarity present within the VLS and the non-VLS is exploited separately for an effective speech enhancement through NLM filtering. The experimental results presented in this study show that the proposed approach provides better denoising performance when compared with the NLM filtering without speech classification as well as recently reported speech enhancement methods. The hardware architecture of the proposed approach is also designed and prototyped on FPGA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. M. Berouti, R. Schwartz, J. Makhoul, Enhancement of speech corrupted by acoustic noise, in IEEE International Conference on Acoustics, Speech and Signal Processing vol. 4 (Washington, 1979), pp. 208–211

  2. D. Bhoyar, S. Bera, C. Dethe, M. Mushrif, FPGA implementation of adaptive filter for noise cancellation, in 2014 International Conference on Electronics and Communication Systems (ICECS) (2014), pp. 1–5

  3. S. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979)

    Article  Google Scholar 

  4. A. Buades, B. Coll, J.M. Morel, A review of image denoising algorithms, with a new one. Multiscale Model. Simul. 4(2), 490–530 (2005)

    Article  MathSciNet  Google Scholar 

  5. N. Chatlani, J.J. Soraghan, Emd-based filtering (EMDF) of low-frequency noise for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 20(4), 1158–1166 (2012)

    Article  Google Scholar 

  6. I. Cohen, Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Trans. Speech Audio Process. 11(5), 466–475 (2003)

    Article  Google Scholar 

  7. G. Dahl, D. Yu, L. Deng, A. Acero, Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)

    Article  Google Scholar 

  8. K. Deepak, S.M. Prasanna, Foreground speech segmentation and enhancement using glottal closure instants and mel cepstral coefficients. IEEE/ACM Trans. Audio Speech Lang. Process. 24(7), 1205–1219 (2016)

    Article  Google Scholar 

  9. V. Digalakis, D. Rtischev, L. Neumeyer, Speaker adaptation using constrained estimation of Gaussian mixtures. IEEE Trans. Audio Speech Lang. Process. 3(5), 357–366 (1995)

    Article  Google Scholar 

  10. Y. Ephraim, D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)

    Article  Google Scholar 

  11. Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)

    Article  Google Scholar 

  12. M.J.F. Gales, Semi-tied covariance matrices for hidden Markov models. IEEE Trans. Audio Speech Lang. Process. 7(3), 272–281 (1999)

    Article  Google Scholar 

  13. J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S.Pallett, DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM, NIST speech disc 1-1.1. NASA STI/Recon Tech. Rep. 93 (1993)

  14. T. Gerkmann, R.C. Hendriks, Unbiased MMSE-based noise power estimation with low complexity and low tracking delay. IEEE Trans. Audio Speech Lang. Process. 20(4), 1383–1393 (2012)

    Article  Google Scholar 

  15. P. Goel, M. Chandra, VLSI implementations of retimed high speed adaptive filter structures for speech enhancement. Microsyst. Technol. 24, 4799–4806 (2018)

    Article  Google Scholar 

  16. Y. Hu, P.C. Loizou, Evaluation of objective measures for speech enhancement, in Ninth International Conference on Spoken Language Processing (2006)

  17. Y. Hu, P.C. Loizou, Subjective comparison of speech enhancement algorithms, in IEEE International Conference on Acoustics Speech and Signal Processing Proceedings vol. 1 (2006), pp. I–I

  18. Y. Hu, P.C. Loizou, Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 229–238 (2008)

    Article  Google Scholar 

  19. Q. Jin, A. Waibel, Application of LDA to speaker recognition, in Proceedings of the Interspeech (2000), pp. 250–253

  20. K. Khaldi, A.O. Boudraa, A. Bouchikhi, M.T.H. Alouane, Speech enhancement via EMD. EURASIP J. Adv. Signal Process. 2008, 873204 (2008)

    Article  Google Scholar 

  21. K. Khaldi, A.O. Boudraa, A. Komaty, Speech enhancement using empirical mode decomposition and the Teager–Kaiser energy operator. J. Acoust. Soc. Am. 135(1), 451–459 (2014)

    Article  Google Scholar 

  22. P. Krishnamoorthy, S.M. Prasanna, Enhancement of noisy speech by temporal and spectral processing. Speech Commun. 53(2), 154–174 (2011)

    Article  Google Scholar 

  23. J. Li, L. Deng, Y. Gong, R. Haeb-Umbach, An overview of noise-robust automatic speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 745–777 (2014)

    Article  Google Scholar 

  24. P.C. Loizou, Speech Enhancement: Theory and Practice (CRC Press, Boca Raton, 2013)

    Book  Google Scholar 

  25. Y. Lu, P.C. Loizou, Estimators of the magnitude-squared spectrum and methods for incorporating SNR uncertainty. IEEE Trans. Audio Speech Lang. Process. 19(5), 1123–1137 (2011)

    Article  Google Scholar 

  26. U. Mahbub, T. Rahman, A. Rashid, FPGA implementation of real time acoustic noise suppression by spectral subtraction using dynamic moving average method, in IEEE Symposium on Industrial Electronics and Applications, vol. 1 (2009), pp. 365–370

  27. R. Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process. 9(5), 504–512 (2001)

    Article  Google Scholar 

  28. J. Ming, T.J. Hazen, J.R. Glass, D.A. Reynolds, Robust speaker recognition in noisy conditions. IEEE Trans. Audio Speech Lang. Process. 15(5), 1711–1723 (2007)

    Article  Google Scholar 

  29. M. Mukherjee, M. Maitra, et al. Reconfigurable architecture of adaptive median filter—an FPGA based approach for impulse noise suppression, in Third International Conference on Computer, Communication, Control and Information Technology (C3IT) (2015), pp. 1–6

  30. S.J. Pinto, G. Panda, R. Peesapati, An implementation of hybrid control strategy for distributed generation system interface using Xilinx system generator. IEEE Trans. Ind. Inform. 13(5), 2735–2745 (2017)

    Article  Google Scholar 

  31. D. Povey, L. Burget, M. Agarwal, P. Akyazi, F. Kai, A. Ghoshal, O. Glembek, N. Goel, M. Karafiát, A. Rastrow, R.C. Rose, P. Schwarz, S. Thomas, The subspace Gaussian mixture model—a structured model for speech recognition. Comput. Speech Lang. 25(2), 404–439 (2011)

    Article  Google Scholar 

  32. G. Pradhan, B.C. Haris, S.R.M. Prasanna, R. Sinha, Speaker verification in sensor and acoustic environment mismatch conditions. Int. J. Speech Technol. 15, 381–392 (2012)

    Article  Google Scholar 

  33. G. Pradhan, S.M. Prasanna, Speaker verification by vowel and nonvowel like segmentation. IEEE Trans. Audio Speech Lang. Process. 21(4), 854–867 (2013)

    Article  Google Scholar 

  34. P. Singh, G. Pradhan, Exploring the non-local similarity present in variational mode functions for effective ECG denoising, in International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018), pp. 861–865

  35. P. Singh, G. Pradhan, S. Shahnawazuddin, Denoising of ECG signal by non-local estimation of approximation coefficients in DWT. Biocybern. Biomed. Eng. 37(3), 599–610 (2017)

    Article  Google Scholar 

  36. P. Singh, S. Shahnawazuddin, G. Pradhan, An efficient ECG denoising technique based on non-local means estimation and modified empirical mode decomposition. Circuits Syst. Signal Process. 37(10), 4527–4547 (2018)

    Article  Google Scholar 

  37. N. Srinivas, P.K. Kumar, A fast carry chain adder using instantiation design entry on virtex-5 FPGA, in International Conference on Electrical, Computer and Electronics Engineering (2016), pp. 106–109

  38. N. Srinivas, P.K. Kumar, G. Pradhan, Low latency architecture design and implementation for short-time fourier transform algorithm on FPGA, in International Conference on Microwaves, Antennas, Communications and Electronic Systems (2017), pp. 1–5

  39. N. Srinivas, G. Pradhan, P.K. Kumar, An efficient hardware architecture for detection of vowel-like regions in speech signal. Integration 63, 185–195 (2018)

    Article  Google Scholar 

  40. N. Srinivas, G. Pradhan, P.K. Kumar, Detection of vowel-like speech: an efficient hardware architecture and it’s FPGA prototype. Microsyst. Technol. 25(4), 1333–1343 (2019)

    Article  Google Scholar 

  41. N. Srinivas, G. Pradhan, S. Shahnawazuddin, Enhancement of noisy speech signal by non-local means estimation of variational mode functions. Proc. Interspeech 2018, 1156–1160 (2018)

    Article  Google Scholar 

  42. R. Tavares, R. Coelho, Speech enhancement with nonstationary acoustic noise detection in time domain. IEEE Signal Process. Lett. 23(1), 6–10 (2016)

    Article  Google Scholar 

  43. B.H. Tracey, E.L. Miller, Nonlocal means denoising of ECG signals. IEEE Trans. Biomed. Eng. 59(9), 2383–2386 (2012)

    Article  Google Scholar 

  44. A. Upadhyay, R. Pachori, Speech enhancement based on mEMD-VMD method. Electron. Lett. 53(7), 502–504 (2017)

    Article  Google Scholar 

  45. D. Van De Ville, M. Kocher, Sure-based non-local means. IEEE Signal Process. Lett. 16(11), 973–976 (2009)

    Article  Google Scholar 

  46. A. Varga, H.J. Steeneken, Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)

    Article  Google Scholar 

  47. N. Virag, Single channel speech enhancement based on masking properties of the human auditory system. IEEE Trans. Speech Audio Process. 7(2), 126–137 (1999)

    Article  Google Scholar 

  48. B. Yegnanarayana, C. Avendano, H. Hermansky, P.S. Murthy, Speech enhancement using linear prediction residual. Speech Commun. 28(1), 25–42 (1999)

    Article  Google Scholar 

  49. L. Zao, R. Coelho, P. Flandrin, Speech enhancement with EMD and hurst-based mode selection. IEEE/ACM Trans. Audio Speech Lang. Process. 22(5), 899–911 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

This research work is a sub-module of the project “Development of Speech Based Person Authentication System on FPGA” under SMDP-C2SD (9(I)/2014-MDD) program and is supported by the Ministry of Electronics and Information Technology (Meity), Government of India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nagapuri Srinivas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Srinivas, N., Pradhan, G. & Kumar, P.K. A Classification-Based Non-local Means Adaptive Filtering for Speech Enhancement and Its FPGA Prototype. Circuits Syst Signal Process 39, 2489–2506 (2020). https://doi.org/10.1007/s00034-019-01267-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-019-01267-y

Keywords

Navigation