Abstract
Conventional sparse and low-rank decomposition based speech enhancement algorithms seldom simultaneously consider the non-negativity and continuity of the enhanced speech spectrum. In this paper, an unsupervised algorithm for enhancing the noisy speech in a single channel recording is presented. The algorithm can be viewed as an extension of non-negative matrix factorization (NMF) which approximates the magnitude spectrum of noisy speech using the superposition of a low-rank non-negative matrix and a sparse non-negative matrix. The temporal continuity of speech is also considered by incorporating the sum of squared differences between the adjacent frames to the cost function. We prove that by iteratively updating parameters using the derived multiplicative update rules, the cost function finally converges to a local minimum. Simulation experiments on NOIZEUS database with various noise types demonstrate that the proposed algorithm outperforms recently proposed state-of-the-art methods under low signal-to-noise ratio (SNR) conditions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Loizou, P.C.: Speech Enhancement: Theory and Practice. CRC Press, Boca Raton (2007)
Mohammadiha, N., Smaragdis, P., Leijon, A.: Supervised and unsupervised speech enhancement using non-negative matrix factorization. IEEE Trans. Audio Speech Lang. Process. 21(10), 2140–2151 (2013)
Smaragdis, P., Fevotte, C., Mysore, G.J., Mohammadiha, N.: Static and dynamic source separation using nonnegative matrix factorizations: a unified view. IEEE Sig. Process. Mag. 31(3), 66–75 (2014)
Virtanen, T.: Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)
Wilson, K.W., Raj, B., Smaragdis, P.: Regularized non-negative matrix factorization with temporal dependencies for speech denoising. In: INTERSPEECH, pp. 411–414 (2008)
Duan, Z., Mysore, G.J., Smaragdis, P.: Online PLCA for real-time semi-supervised source separation. In: Theis, F., Cichocki, A., Yeredor, A., Zibulevsky, M. (eds.) LVA/ICA 2012. LNCS, vol. 7191, pp. 34–41. Springer, Heidelberg (2012). doi:10.1007/978-3-642-28551-6_5
Huang, P., Chen, S.D., Smaragdis, P., Hasegawa-Johnson, M.: Sing-voice separation from monaural recording using robust principal component analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 57–60. IEEE Press, Kyoto (2012)
Rafii, Z., Pardo, B.: Online repet-sim for real-time speech enhancement. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 848–852. IEEE Press, Vancouver (2013)
Sun, C., Zhu, Q., Wan, M.: A novel speech enhancement method based on constrained low-rank and sparse matrix decomposition. Speech Commun. 60, 44–55 (2014)
Chen, Z., Eills, D.P.W.: Speech enhancement by sparse, low-rank, and dictionary spectrogram decomposition. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 1–4. IEEE Press, New Paltz (2013)
Li, Y., Zhang, X., Sun, M., Min, G., Yang, J.: Adaptive extraction of repeating non-negative temporal patterns for single channel speech enhancement. In: IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), pp. 494–498. IEEE Press, Shanghai (2016)
Candes, E.J., Li, X., Ma, Y., Wright, J.: Robust principle component analysis? J. ACM. 58(3), 1–37 (2011)
Rafii, Z., Pardo, B.: Music/voice separation using the similarity matrix. In: 13th International Society for Music Information Retrieval, Porto, Portugal, pp. 583–588 (2013)
Sun, M., Li, Y., Gemmeke, J.F., Zhang, X.: Speech enhancement under low SNR conditions via noise estimation using sparse and low-rank NMF with Kullback-Leibler divergence. IEEE Trans. Audio Speech Lang. Process. 23(7), 1233–1242 (2015)
Fevotte, C., Bertin, N., Durrieu, J.-L.: Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis. Neural Comput. 21(3), 793–830 (2009)
Fevotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the beta-divergence. Neural Comput. 23(9), 2421–2456 (2011)
Varga, A., Steeneken, H.: Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)
Vincent, E., Gribonval, R., Fvotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)
Li, Y., Zhang, X., Meng, M., Min, G.: Speech enhancement based on robust NMF solved by alternating direction method of multipliers. In: IEEE International Workshop on Multimedia Signal Processing, pp. 1–5. IEEE Press, Xiamen (2015)
Acknowledgments
This work is supported by NSF of China (Grant No. 61471394, 61402519) and NSF of Jiangsu Province (Grant No. BK20140071, BK20140074).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Li, Y., Zhang, X., Sun, M., Chen, X., Qiao, L. (2016). Speech Enhancement Using Non-negative Low-Rank Modeling with Temporal Continuity and Sparseness Constraints. In: Chen, E., Gong, Y., Tie, Y. (eds) Advances in Multimedia Information Processing - PCM 2016. PCM 2016. Lecture Notes in Computer Science(), vol 9917. Springer, Cham. https://doi.org/10.1007/978-3-319-48896-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-48896-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48895-0
Online ISBN: 978-3-319-48896-7
eBook Packages: Computer ScienceComputer Science (R0)