Skip to main content
Log in

Sparse NMF based speech enhancement with bases update

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, a combination of methods based on statistical modelling and Non-negative Matrix Factorization (NMF) for speech enhancement using speech and noise bases with on-line update is proposed. Template-based approaches are known to be more robust in the presence of non-stationary noises than methods based on statistical modeling. However, template-based approaches depend on a-priori information. The drawbacks of both the approaches can be avoided by combining them. In NMF approach, speech bases and noise bases are simultaneously adapted to further improve the performance. The proposed method outperforms other benchmark algorithms in terms of perceptual evaluation of speech quality (PESQ) and source-to-distortion ratio (SDR) in stationary and non-stationary noise environment conditions with matched and mismatched noise basis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Benaroya, L., Mcdonagh, L., Bimbot, F., & Gribonval, R. (2003). Non negative sparse representation for Wiener based source separation with a single sensor. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2003) Vol. 6, pp. VI-613-616.

  • Berry, M. W., Browne, M., Langville, A. N., Pauca, V. P., & Plemmons, R. J. (2007). Algorithms and applications for approximate nonnegative matrix factorization. Computational statistics and data analysis, 52(1), 155–173.

    Article  MathSciNet  MATH  Google Scholar 

  • Bhargava, S., Blättler, F., Kollmorgen, S., Liu, S. C., & Hahnloser, R. H. (2015). Linear methods for efficient and fast separation of two sources recorded with a single microphone. Neural computation. doi:10.1162/NECO_a_00776.

    Google Scholar 

  • Cabras, G., Canazza, S., Montessoro, P. L., & Rinaldo, R. (2010). Restoration of audio documents with low SNR: A NMF parameter estimation and perceptually motivated Bayesian suppression rule. In Proc. Sound and Music Computing Conference, pp. 314–321.

  • Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121.

    Article  Google Scholar 

  • Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445.

    Article  Google Scholar 

  • Févotte, C., Bertin, N., & Durrieu, J. L. (2009). Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music analysis. Neural Computation, 21(3), 793–830.

    Article  MATH  Google Scholar 

  • Févotte, C., Le Roux, J., & Hershey, J. R. (2013). Non-negative dynamical system with application to speech and audio. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3158–3162.

  • Garofolo, J. S. (1988). Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database. National Institute of Standards and Technology (NIST), Gaithersburgh, MD, 107.

  • Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.

    Article  Google Scholar 

  • Kwon, K., Shin, J. W., & Kim, N. S. (2015). NMF-based speech enhancement using bases update. IEEE Signal Processing Letters, 22(4), 450–454.

    Article  Google Scholar 

  • Kwon, K., Shin, J. W., Sonowat, S., Choi, I., & Kim, N. S. (2014). Speech enhancement combining statistical models and NMF with update of speech and noise bases. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7053–7057.

  • Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–791.

    Article  Google Scholar 

  • Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In Advances in neural information processing systems (pp. 556–562). Cambridge: MIT Press.

    Google Scholar 

  • Mohammadiha, N., Gerkmann, T., & Leijon, A. (2011). A new linear MMSE filter for single channel speech enhancement based on nonnegative matrix factorization. In 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 45–48.

  • Mohammadiha, N., Smaragdis, P., & Leijon, A. (2013). Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Transactions on Audio, Speech, and Language Processing, 21(10), 2140–2151.

    Article  Google Scholar 

  • Raj, B., & Smaragdis, P. (2005). Latent variable decomposition of spectrograms for single channel speaker separation. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005, pp. 17–20.

  • Rangachari, S., & Loizou, P. C. (2006). A noise-estimation algorithm for highly non-stationary environments. Speech Communication, 48(2), 220–231.

    Article  Google Scholar 

  • Rebhan, S., Sharif, W., & Eggert, J. (2008). Incremental learning in the non-negative matrix factorization. In International Conference on Neural Information Processing (pp. 960–969). Berlin Heidelberg: Springer.

    Google Scholar 

  • Schmidt, M. N., Larsen, J., & Hsiao, F. T. (2007). Wind noise reduction using non-negative sparse coding. In 2007 IEEE Workshop on Machine Learning for Signal Processing, pp. 431–436.

  • Smaragdis, P., & Brown, J. C. (2003). Non-negative matrix factorization for polyphonic music transcription. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 177–180.

  • Smaragdis, P., Raj, B., & Shashanka, M. (2006). A probabilistic latent variable model for acoustic modeling. Advances in Models for Acoustic Processing, NIPS, 148, 1–8.

    Google Scholar 

  • Smaragdis, P., Raj, B., & Shashanka, M. (2007, September). Supervised and semi-supervised separation of sounds from single-channel mixtures. In International Conference on Independent Component Analysis and Signal Separation. Berlin Heidelberg: Springer, pp. 414–421.

    Chapter  Google Scholar 

  • Varga, A., & Steeneken, H. J. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251.

    Article  Google Scholar 

  • Vincent, E., Gribonval, R., & Févotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1462–1469.

    Article  Google Scholar 

  • Virtanen, T. (2007). Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1066–1074.

    Article  Google Scholar 

  • Wilson, K. W., Raj, B., & Smaragdis, P. (2008). Regularized non-negative matrix factorization with temporal dependencies for speech denoising. In Interspeech, pp. 411–414.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. Sunnydayal.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sunnydayal, V., Siva Prasad, N., Ravishankar, S. et al. Sparse NMF based speech enhancement with bases update. Int J Speech Technol 20, 443–454 (2017). https://doi.org/10.1007/s10772-017-9418-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-017-9418-0

Keywords

Navigation