Skip to main content

Speech Enhancement Using Non-negative Low-Rank Modeling with Temporal Continuity and Sparseness Constraints

  • Conference paper
  • First Online:
Advances in Multimedia Information Processing - PCM 2016 (PCM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9917))

Included in the following conference series:

  • 2551 Accesses

Abstract

Conventional sparse and low-rank decomposition based speech enhancement algorithms seldom simultaneously consider the non-negativity and continuity of the enhanced speech spectrum. In this paper, an unsupervised algorithm for enhancing the noisy speech in a single channel recording is presented. The algorithm can be viewed as an extension of non-negative matrix factorization (NMF) which approximates the magnitude spectrum of noisy speech using the superposition of a low-rank non-negative matrix and a sparse non-negative matrix. The temporal continuity of speech is also considered by incorporating the sum of squared differences between the adjacent frames to the cost function. We prove that by iteratively updating parameters using the derived multiplicative update rules, the cost function finally converges to a local minimum. Simulation experiments on NOIZEUS database with various noise types demonstrate that the proposed algorithm outperforms recently proposed state-of-the-art methods under low signal-to-noise ratio (SNR) conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Loizou, P.C.: Speech Enhancement: Theory and Practice. CRC Press, Boca Raton (2007)

    Google Scholar 

  2. Mohammadiha, N., Smaragdis, P., Leijon, A.: Supervised and unsupervised speech enhancement using non-negative matrix factorization. IEEE Trans. Audio Speech Lang. Process. 21(10), 2140–2151 (2013)

    Article  Google Scholar 

  3. Smaragdis, P., Fevotte, C., Mysore, G.J., Mohammadiha, N.: Static and dynamic source separation using nonnegative matrix factorizations: a unified view. IEEE Sig. Process. Mag. 31(3), 66–75 (2014)

    Article  Google Scholar 

  4. Virtanen, T.: Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)

    Article  Google Scholar 

  5. Wilson, K.W., Raj, B., Smaragdis, P.: Regularized non-negative matrix factorization with temporal dependencies for speech denoising. In: INTERSPEECH, pp. 411–414 (2008)

    Google Scholar 

  6. Duan, Z., Mysore, G.J., Smaragdis, P.: Online PLCA for real-time semi-supervised source separation. In: Theis, F., Cichocki, A., Yeredor, A., Zibulevsky, M. (eds.) LVA/ICA 2012. LNCS, vol. 7191, pp. 34–41. Springer, Heidelberg (2012). doi:10.1007/978-3-642-28551-6_5

    Chapter  Google Scholar 

  7. Huang, P., Chen, S.D., Smaragdis, P., Hasegawa-Johnson, M.: Sing-voice separation from monaural recording using robust principal component analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 57–60. IEEE Press, Kyoto (2012)

    Google Scholar 

  8. Rafii, Z., Pardo, B.: Online repet-sim for real-time speech enhancement. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 848–852. IEEE Press, Vancouver (2013)

    Google Scholar 

  9. Sun, C., Zhu, Q., Wan, M.: A novel speech enhancement method based on constrained low-rank and sparse matrix decomposition. Speech Commun. 60, 44–55 (2014)

    Article  Google Scholar 

  10. Chen, Z., Eills, D.P.W.: Speech enhancement by sparse, low-rank, and dictionary spectrogram decomposition. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 1–4. IEEE Press, New Paltz (2013)

    Google Scholar 

  11. Li, Y., Zhang, X., Sun, M., Min, G., Yang, J.: Adaptive extraction of repeating non-negative temporal patterns for single channel speech enhancement. In: IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), pp. 494–498. IEEE Press, Shanghai (2016)

    Google Scholar 

  12. Candes, E.J., Li, X., Ma, Y., Wright, J.: Robust principle component analysis? J. ACM. 58(3), 1–37 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  13. Rafii, Z., Pardo, B.: Music/voice separation using the similarity matrix. In: 13th International Society for Music Information Retrieval, Porto, Portugal, pp. 583–588 (2013)

    Google Scholar 

  14. Sun, M., Li, Y., Gemmeke, J.F., Zhang, X.: Speech enhancement under low SNR conditions via noise estimation using sparse and low-rank NMF with Kullback-Leibler divergence. IEEE Trans. Audio Speech Lang. Process. 23(7), 1233–1242 (2015)

    Article  Google Scholar 

  15. Fevotte, C., Bertin, N., Durrieu, J.-L.: Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis. Neural Comput. 21(3), 793–830 (2009)

    Article  MATH  Google Scholar 

  16. Fevotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the beta-divergence. Neural Comput. 23(9), 2421–2456 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  17. Varga, A., Steeneken, H.: Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)

    Article  Google Scholar 

  18. Vincent, E., Gribonval, R., Fvotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)

    Article  Google Scholar 

  19. Li, Y., Zhang, X., Meng, M., Min, G.: Speech enhancement based on robust NMF solved by alternating direction method of multipliers. In: IEEE International Workshop on Multimedia Signal Processing, pp. 1–5. IEEE Press, Xiamen (2015)

    Google Scholar 

Download references

Acknowledgments

This work is supported by NSF of China (Grant No. 61471394, 61402519) and NSF of Jiangsu Province (Grant No. BK20140071, BK20140074).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yinan Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Li, Y., Zhang, X., Sun, M., Chen, X., Qiao, L. (2016). Speech Enhancement Using Non-negative Low-Rank Modeling with Temporal Continuity and Sparseness Constraints. In: Chen, E., Gong, Y., Tie, Y. (eds) Advances in Multimedia Information Processing - PCM 2016. PCM 2016. Lecture Notes in Computer Science(), vol 9917. Springer, Cham. https://doi.org/10.1007/978-3-319-48896-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48896-7_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48895-0

  • Online ISBN: 978-3-319-48896-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics