Skip to main content

Advertisement

Log in

Single channel speech enhancement using iterative constrained NMF based adaptive wiener gain

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

We propose a novel single channel speech enhancement algorithm using iterative constrained Non-negative matrix factorization (NMF) based adaptive Wiener gain for non-stationary noise. In the recent past, NMF-based Wiener filtering methods were used for speech enhancement. The Wiener filter performance depends on the adaptive gain factor value. The adaptive gain factor (\(\alpha \)) value is constant regardless of noise type and signal to noise ratio (SNR), so it will affect speech enhancement performance. To overcome this, the adaptive factor value is calculated using a genetic algorithm (GA). Here, the GA adjusts the adaptive Wiener gain based on noise type and SNR level. The GA-based adaptive Wiener gain minimizes Wiener filter estimation errors and improves speech quality by adjusting the base vector weights of noise and speech. Additionally, we use the iterative constraints NMF (IC-NMF) method for calculating the priors from noisy speech magnitudes. We select the Erlang, Inverse Gamma, Students-t, and Inverse Nakagami distributions for speech priors and Gaussian distributions for noise priors. Noise and speech samples are well correlated with those distributions. This provides accurate estimation of the necessary statistics of these distributions to regularize the NMF criterion. So, we combine an iterative constrained NMF and a genetic algorithm-based adaptive Wiener filtering method for speech enhancement. The proposed method outperforms other benchmark algorithms in terms of source to distortion ratio (SDR), short-time objective intelligibility (STOI), and perceptual evaluation of speech quality (PESQ).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1

Similar content being viewed by others

Availability of data

The data that support the findings of this study are available in NOIZEUS: A noisy speech corpus for evaluation of speech enhancement algorithms. “http://ecs.utdallas.edu/loizou/speech/noizeus/

References

  1. Andrew AM (1993) Systems: An introductory analysis with applications to biology, control, and artificial intelligence, by john h. holland mit press (bradford books), cambridge, mass., 1992, xiv+ 211 pp.(paperback£ 13.50, cloth£ 26.95). Robotica 11(5):489–489

  2. Babaee M, Tsoukalas S, Rigoll G et al (2016) Immersive visualization of visual data using nonnegative matrix factorization. Neurocomputing 173:245–255

    Article  Google Scholar 

  3. Barnett V (1975) Applied linear statistical models

  4. Berry MW, Browne M, Langville AN et al (2007) Algorithms and applications for approximate nonnegative matrix factorization. Comput Stat Data Anal 52(1):155–173

    Article  MathSciNet  Google Scholar 

  5. Bryan N, Mysore G (2013) An efficient posterior regularized latent variable model for interactive sound source separation. In: International conference on machine learning, PMLR, pp 208–216

  6. Chen WS, Zhao Y, Pan B et al (2016) Supervised kernel nonnegative matrix factorization for face recognition. Neurocomputing 205:165–181

    Article  Google Scholar 

  7. Cichocki A, Cruces S, Si Amari (2011) Generalized alpha-beta divergences and their application to robust nonnegative matrix factorization. Entropy 13(1):134–170

    Article  ADS  Google Scholar 

  8. Cruces-Alvarez SA, Cichocki A, Si Amari (2004) From blind signal extraction to blind instantaneous signal separation: criteria, algorithms, and stability. IEEE Trans Neural Netw 15(4):859–873

    Article  PubMed  Google Scholar 

  9. Fakhry M, Poorjam AH, Christensen MG (2018) Speech enhancement by classification of noisy signals decomposed using nmf and wiener filtering. In: 2018 26th European signal processing conference (EUSIPCO), IEEE, pp 16–20

  10. Févotte C, Bertin N, Durrieu JL (2009) Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis. Neural Comput 21(3):793–830

    Article  PubMed  Google Scholar 

  11. Han M, Liu B (2015) Ensemble of extreme learning machine for remote sensing image classification. Neurocomputing 149:65–70

    Article  Google Scholar 

  12. Hoyer PO (2002) Non-negative sparse coding. In: Proceedings of the 12th IEEE workshop on neural networks for signal processing, IEEE, pp 557–565

  13. Hu H, Krasoulis A, Lutman M, et al (2013) Development of a real time sparse non-negative matrix factorization module for cochlear implants by using xpc target. Sensors 13(10):13,861–13,878

  14. Kubo Y, Takamune N, Kitamura D et al (2020) Blind speech extraction based on rank-constrained spatial covariance matrix estimation with multivariate generalized gaussian distribution. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28:1948–1963

    Article  Google Scholar 

  15. Lai YH, Wang SS, Chen CH, et al (2019) Adaptive wiener gain to improve sound quality on nonnegative matrix factorization-based noise reduction system. IEEE Access 7:43,286–43,297

  16. Lee D, Seung HS (2000) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13

  17. Li J, Sakamoto S, Hongo S et al (2011) Two-stage binaural speech enhancement with wiener filter for high-quality speech communication. Speech Commun 53(5):677–689

    Article  Google Scholar 

  18. Lin CJ (2007) On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Trans Neural Netw 18(6):1589–1596

    Article  Google Scholar 

  19. Liu H, Wu Z, Li X et al (2011) Constrained nonnegative matrix factorization for image representation. IEEE Trans Pattern Anal Mach Intell 34(7):1299–1311

    Article  PubMed  Google Scholar 

  20. Louzada F, Ramos PL, Nascimento D (2018) The inverse nakagami-m distribution: A novel approach in reliability. IEEE Trans Reliability 67(3):1030–1042

    Article  Google Scholar 

  21. Paliwal K, Schwerin B, Wójcicki K (2012) Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator. Speech Commun 54(2):282–305

    Article  Google Scholar 

  22. Recommendation IT (2001) Perceptual evaluation of speech quality (pesq): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Rec ITU-T P 862

  23. Rehr R, Gerkmann T (2017) On the importance of super-gaussian speech priors for machine-learning based speech enhancement. IEEE/ACM Trans Audio Speech Lang Process 26(2):357–366

    Article  Google Scholar 

  24. Salehi H, Vahidi J (2021) A novel hybrid filter for image despeckling based on improved adaptive wiener filter, bilateral filter and wavelet filter. Int J Image Graphics 21(03):2150,036

  25. Taal CH, Hendriks RC, Heusdens R et al (2011) An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans Audio Speech Lang Process 19(7):2125–2136

    Article  Google Scholar 

  26. Tukey JW (1949) Comparing individual means in the analysis of variance. Biometrics pp 99–114

  27. Vincent E, Gribonval R, Févotte C (2006) Performance measurement in blind audio source separation. IEEE Trans Audio Speech Lang Process 14(4):1462–1469

    Article  Google Scholar 

  28. Yamaguchi Y, Okamura H, Dohi T (2010) A variational bayesian approach for estimating parameters of a mixture of erlang distribution. Commun Stat-Theory Methods 39(13):2333–2350

    Article  MathSciNet  Google Scholar 

  29. Yechuri S, Vanambathina SD (2023) An iterative posterior regularized nmf-based adaptive wiener filter for speech enhancement. In: Machine learning, image processing, network security and data sciences: select proceedings of 3rd international conference on MIND 2021, Springer, pp 575–586

  30. Yoshii K, Itoyama K, Goto M (2016) Student’s t nonnegative matrix factorization and positive semidefinite tensor factorization for single-channel audio source separation. 2016 IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), IEEE, pp 51–55

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sivaramakrishna Yechuri.

Ethics declarations

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yechuri, S., Vanambathina, S. Single channel speech enhancement using iterative constrained NMF based adaptive wiener gain. Multimed Tools Appl 83, 26233–26254 (2024). https://doi.org/10.1007/s11042-023-16480-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16480-w

Keywords

Navigation