Single channel speech enhancement using iterative constrained NMF based adaptive wiener gain

Yechuri, Sivaramakrishna; Vanambathina, Sunnydayal

doi:10.1007/s11042-023-16480-w

Single channel speech enhancement using iterative constrained NMF based adaptive wiener gain

Published: 29 August 2023

Volume 83, pages 26233–26254, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Sivaramakrishna Yechuri¹ &
Sunnydayal Vanambathina¹

84 Accesses
Explore all metrics

Abstract

We propose a novel single channel speech enhancement algorithm using iterative constrained Non-negative matrix factorization (NMF) based adaptive Wiener gain for non-stationary noise. In the recent past, NMF-based Wiener filtering methods were used for speech enhancement. The Wiener filter performance depends on the adaptive gain factor value. The adaptive gain factor (\(\alpha \)) value is constant regardless of noise type and signal to noise ratio (SNR), so it will affect speech enhancement performance. To overcome this, the adaptive factor value is calculated using a genetic algorithm (GA). Here, the GA adjusts the adaptive Wiener gain based on noise type and SNR level. The GA-based adaptive Wiener gain minimizes Wiener filter estimation errors and improves speech quality by adjusting the base vector weights of noise and speech. Additionally, we use the iterative constraints NMF (IC-NMF) method for calculating the priors from noisy speech magnitudes. We select the Erlang, Inverse Gamma, Students-t, and Inverse Nakagami distributions for speech priors and Gaussian distributions for noise priors. Noise and speech samples are well correlated with those distributions. This provides accurate estimation of the necessary statistics of these distributions to regularize the NMF criterion. So, we combine an iterative constrained NMF and a genetic algorithm-based adaptive Wiener filtering method for speech enhancement. The proposed method outperforms other benchmark algorithms in terms of source to distortion ratio (SDR), short-time objective intelligibility (STOI), and perceptual evaluation of speech quality (PESQ).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Weibull and Nakagami speech priors based regularized NMF with adaptive wiener filter for speech enhancement

Article 02 February 2023

An Iterative Posterior Regularized NMF-Based Adaptive Wiener Filter for Speech Enhancement

Minimum mean square error estimator for speech enhancement in additive noise assuming Weibull speech priors and speech presence uncertainty

Article 13 November 2020

Availability of data

The data that support the findings of this study are available in NOIZEUS: A noisy speech corpus for evaluation of speech enhancement algorithms. “http://ecs.utdallas.edu/loizou/speech/noizeus/”

References

Andrew AM (1993) Systems: An introductory analysis with applications to biology, control, and artificial intelligence, by john h. holland mit press (bradford books), cambridge, mass., 1992, xiv+ 211 pp.(paperback£ 13.50, cloth£ 26.95). Robotica 11(5):489–489
Babaee M, Tsoukalas S, Rigoll G et al (2016) Immersive visualization of visual data using nonnegative matrix factorization. Neurocomputing 173:245–255
Article Google Scholar
Barnett V (1975) Applied linear statistical models
Berry MW, Browne M, Langville AN et al (2007) Algorithms and applications for approximate nonnegative matrix factorization. Comput Stat Data Anal 52(1):155–173
Article MathSciNet Google Scholar
Bryan N, Mysore G (2013) An efficient posterior regularized latent variable model for interactive sound source separation. In: International conference on machine learning, PMLR, pp 208–216
Chen WS, Zhao Y, Pan B et al (2016) Supervised kernel nonnegative matrix factorization for face recognition. Neurocomputing 205:165–181
Article Google Scholar
Cichocki A, Cruces S, Si Amari (2011) Generalized alpha-beta divergences and their application to robust nonnegative matrix factorization. Entropy 13(1):134–170
Article ADS Google Scholar
Cruces-Alvarez SA, Cichocki A, Si Amari (2004) From blind signal extraction to blind instantaneous signal separation: criteria, algorithms, and stability. IEEE Trans Neural Netw 15(4):859–873
Article PubMed Google Scholar
Fakhry M, Poorjam AH, Christensen MG (2018) Speech enhancement by classification of noisy signals decomposed using nmf and wiener filtering. In: 2018 26th European signal processing conference (EUSIPCO), IEEE, pp 16–20
Févotte C, Bertin N, Durrieu JL (2009) Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis. Neural Comput 21(3):793–830
Article PubMed Google Scholar
Han M, Liu B (2015) Ensemble of extreme learning machine for remote sensing image classification. Neurocomputing 149:65–70
Article Google Scholar
Hoyer PO (2002) Non-negative sparse coding. In: Proceedings of the 12th IEEE workshop on neural networks for signal processing, IEEE, pp 557–565
Hu H, Krasoulis A, Lutman M, et al (2013) Development of a real time sparse non-negative matrix factorization module for cochlear implants by using xpc target. Sensors 13(10):13,861–13,878
Kubo Y, Takamune N, Kitamura D et al (2020) Blind speech extraction based on rank-constrained spatial covariance matrix estimation with multivariate generalized gaussian distribution. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28:1948–1963
Article Google Scholar
Lai YH, Wang SS, Chen CH, et al (2019) Adaptive wiener gain to improve sound quality on nonnegative matrix factorization-based noise reduction system. IEEE Access 7:43,286–43,297
Lee D, Seung HS (2000) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13
Li J, Sakamoto S, Hongo S et al (2011) Two-stage binaural speech enhancement with wiener filter for high-quality speech communication. Speech Commun 53(5):677–689
Article Google Scholar
Lin CJ (2007) On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Trans Neural Netw 18(6):1589–1596
Article Google Scholar
Liu H, Wu Z, Li X et al (2011) Constrained nonnegative matrix factorization for image representation. IEEE Trans Pattern Anal Mach Intell 34(7):1299–1311
Article PubMed Google Scholar
Louzada F, Ramos PL, Nascimento D (2018) The inverse nakagami-m distribution: A novel approach in reliability. IEEE Trans Reliability 67(3):1030–1042
Article Google Scholar
Paliwal K, Schwerin B, Wójcicki K (2012) Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator. Speech Commun 54(2):282–305
Article Google Scholar
Recommendation IT (2001) Perceptual evaluation of speech quality (pesq): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Rec ITU-T P 862
Rehr R, Gerkmann T (2017) On the importance of super-gaussian speech priors for machine-learning based speech enhancement. IEEE/ACM Trans Audio Speech Lang Process 26(2):357–366
Article Google Scholar
Salehi H, Vahidi J (2021) A novel hybrid filter for image despeckling based on improved adaptive wiener filter, bilateral filter and wavelet filter. Int J Image Graphics 21(03):2150,036
Taal CH, Hendriks RC, Heusdens R et al (2011) An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans Audio Speech Lang Process 19(7):2125–2136
Article Google Scholar
Tukey JW (1949) Comparing individual means in the analysis of variance. Biometrics pp 99–114
Vincent E, Gribonval R, Févotte C (2006) Performance measurement in blind audio source separation. IEEE Trans Audio Speech Lang Process 14(4):1462–1469
Article Google Scholar
Yamaguchi Y, Okamura H, Dohi T (2010) A variational bayesian approach for estimating parameters of a mixture of erlang distribution. Commun Stat-Theory Methods 39(13):2333–2350
Article MathSciNet Google Scholar
Yechuri S, Vanambathina SD (2023) An iterative posterior regularized nmf-based adaptive wiener filter for speech enhancement. In: Machine learning, image processing, network security and data sciences: select proceedings of 3rd international conference on MIND 2021, Springer, pp 575–586
Yoshii K, Itoyama K, Goto M (2016) Student’s t nonnegative matrix factorization and positive semidefinite tensor factorization for single-channel audio source separation. 2016 IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), IEEE, pp 51–55
Google Scholar

Download references

Author information

Authors and Affiliations

SENSE, VIT-AP University, Amaravati, India
Sivaramakrishna Yechuri & Sunnydayal Vanambathina

Authors

Sivaramakrishna Yechuri
View author publications
You can also search for this author in PubMed Google Scholar
Sunnydayal Vanambathina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sivaramakrishna Yechuri.

Ethics declarations

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yechuri, S., Vanambathina, S. Single channel speech enhancement using iterative constrained NMF based adaptive wiener gain. Multimed Tools Appl 83, 26233–26254 (2024). https://doi.org/10.1007/s11042-023-16480-w

Download citation

Received: 15 November 2021
Revised: 03 June 2023
Accepted: 08 August 2023
Published: 29 August 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11042-023-16480-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Single channel speech enhancement using iterative constrained NMF based adaptive wiener gain

Abstract

Access this article

Similar content being viewed by others

Weibull and Nakagami speech priors based regularized NMF with adaptive wiener filter for speech enhancement

An Iterative Posterior Regularized NMF-Based Adaptive Wiener Filter for Speech Enhancement

Minimum mean square error estimator for speech enhancement in additive noise assuming Weibull speech priors and speech presence uncertainty

Availability of data

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Single channel speech enhancement using iterative constrained NMF based adaptive wiener gain

Abstract

Access this article

Similar content being viewed by others

Weibull and Nakagami speech priors based regularized NMF with adaptive wiener filter for speech enhancement

An Iterative Posterior Regularized NMF-Based Adaptive Wiener Filter for Speech Enhancement

Minimum mean square error estimator for speech enhancement in additive noise assuming Weibull speech priors and speech presence uncertainty

Availability of data

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation