Dual-Transform Source Separation Using Sparse Nonnegative Matrix Factorization

Hossain, Md. Imran; Islam, Md. Shohidul; Khatun, Mst. Titasa; Ullah, Rizwan; Masood, Asim; Ye, Zhongfu

doi:10.1007/s00034-020-01564-x

Dual-Transform Source Separation Using Sparse Nonnegative Matrix Factorization

Published: 23 October 2020

Volume 40, pages 1868–1891, (2021)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Md. Imran Hossain¹,
Md. Shohidul Islam²,
Mst. Titasa Khatun³,
Rizwan Ullah¹,
Asim Masood¹ &
…
Zhongfu Ye ORCID: orcid.org/0000-0002-1383-7673¹

330 Accesses
4 Citations
Explore all metrics

Abstract

In this article, we propose a new source separation method in which the dual-tree complex wavelet transform (DTCWT) and short-time Fourier transform (STFT) algorithms are used sequentially as dual transforms and sparse nonnegative matrix factorization (SNMF) is used to factorize the magnitude spectrum. STFT-based source separation faces issues related to time and frequency resolution because it cannot exactly determine which frequencies exist at what time. Discrete wavelet transform (DWT)-based source separation faces a time-variation-related problem (i.e., a small shift in the time-domain signal causes significant variation in the energy of the wavelet coefficients). To address these issues, we utilize the DTCWT, which comprises two-level trees with different sets of filters and provides additional information for analysis and approximate shift invariance; these properties enable the perfect reconstruction of the time-domain signal. Thus, the time-domain signal is transformed into a set of subband signals in which low- and high-frequency components are isolated. Next, each subband is passed through the STFT and a complex spectrogram is constructed. Then, SNMF is applied to decompose the magnitude part into a weighted linear combination of the trained basis vectors for both sources. Finally, the estimated signals can be obtained through a subband binary ratio mask by applying the inverse STFT (ISTFT) and the inverse DTCWT (IDTCWT). The proposed method is examined on speech separation tasks utilizing the GRID audiovisual and TIMIT corpora. The experimental findings indicate that the proposed approach outperforms the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Itakura-Saito Nonnegative Matrix Two-Dimensional Factorizations for Blind Single Channel Audio Separation

Single-Channel Audio Source Separation with NMF: Divergences, Constraints and Algorithms

Underdetermined blind source separation of speech mixtures unifying dictionary learning and sparse representation

Article 03 August 2021

Data availability

The datasets generated or analyzed during the current study are not publicly available because they are the subject of ongoing research but are available from the first author upon reasonable request.

References

G. Bao, Y. Xu, Z. Ye, Learning a discriminative dictionary for single-channel speech separation. IEEE Trans. Audio Speech Lang. Process. 22(7), 1130–1138 (2014)
Article Google Scholar
M. Cooke, J. Barker, S. Cunningham, X. Shao, An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 120(5), 2421 (2006)
Article Google Scholar
D.L. Daniel, H.S. Seung, Learning the pans of objects with non-negative matrix factorization. Nature 401, 788–791 (1999)
Article Google Scholar
M.G. Emad, E. Hakan, Single channel speech music separation using nonnegative matrix factorization with sliding windows and spectral masks. Digital Signal Processing (DSP), in 17th International Conference in August (2011)
G.G. Francois, J.M. Gautham, Stopping criteria for non-negative matrix factorization based supervised and semi-supervised source separation. IEEE Signal Processing Letters, November (2014)
J. Garofolo, et al., TIMIT acoustic-phonetic continuous speech corpus. LDC93S1 (1993)
E.M. Grais, H. Erdogan, Discriminative non-negative dictionary learning using cross-coherence penalties for single channel source separation, in Proceedings of the International Conference on Spoken Language Processing (INTERSPEECH). Lyon, France, 25–29 August (2013)
R. Hidayat, A. Bejo, S. Sumaryono, A. Winursito, Denoising speech for MFCC feature extraction using wavelet transformation in speech recognition system, in 10th International Conference on Information Technology and Electrical Engineering (2018)
P.O. Hoyer, Non-negative matrix factorization with sparseness constraint. J. Mach. Learn. Res. 1457–1469, November (2004)
Y. Hu, P.C. Loizou, Evaluation of objective quality measures for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 229–238 (2008)
Article Google Scholar
M.S. Islam, T.H. Al Mahmud, W.U. Khan, Z. Ye, Supervised single-channel speech enhancement based on stationary wavelet transforms and non-negative matrix factorization with concatenated framing process and subband smooth ratio mask. J. Sig. Process. Syst. Signal. Image Video Technol. 1–14 (2019)
M.S. Islam, T.H. Al Mahmud, W.U. Khan, Z. Ye, Supervised single-channel speech enhancement based on dual-tree complex wavelet transforms and nonnegative matrix factorization using the joint learning process and subband smooth ratio mask. Electronics 8, 353 (2019)
Article Google Scholar
G.J. Jang, T.W. Lee, A maximum likelihood approach to single-channel source separation. J. Mach. Learn. Res. 4, 1365–1392 (2003)
MathSciNet MATH Google Scholar
D.S. Kapoor, A.K. Kohli, Gain adapted optimum mixture estimation scheme for single-channel speech separation. Circuits Syst. Signal Process. 32(5), 2335–2351 (2013)
Article MathSciNet Google Scholar
J.M. Kates, K.H. Arehart, The hearing-aid speech perception index (HASPI). Speech Commun. 65, 75–93 (2014)
Article Google Scholar
J.M. Kates, K.H. Arehart, The hearing-aid speech quality index (HASQI). J. Audio Eng. Soc. 58, 5363–5381 (2010)
Google Scholar
N.G. Kingsbury, The dual-tree complex wavelet transforms: a new efficient tool for image restoration and enhancement, in Proceedings of the 9th European Signal Process Conference. EUSIPCO, Rhodes, Greece. 8–11 Sept (1998)
R.J. Le, F.J. Weninger, J.R. Hershey, Sparse NMF half-baked or well done? technical report TR2015–023, Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA, USA, March (2015)
D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst. 13, 556–562 (2001)
Google Scholar
A. Mahmoodzadeh, H.R. Abutalebi, Hybrid approach to single-channel speech separation based on coherent incoherent modulation filtering. Circuits Syst. Signal Process. 36(5), 1970–1988 (2017)
Article Google Scholar
S. Mavaddati, A novel singing voice separation method based on sparse non-negative matrix factorization and low-rank modeling. Iran. J. Electr. Electron. Eng. 15, 2 (2019)
Google Scholar
P. Mercorelli, A denoising procedure using wavelet packets for instantaneous detection of pantograph oscillations. Mech. Syst. Signal Process. 35, 137–149 (2013)
Article Google Scholar
P. Paatero, U. Tapper, Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2), 111–126 (1994)
Article Google Scholar
B.A. Pearlmutter, R.K. Olsson, Linear program differentiation for single-channel speech separation, in 16th IEEE Signal Processing Society Workshop in MLSP, Arlington, VA, USA (2006)
T. Pham, Y.S. Lee, Y.B. Lin, T.C. Tai, J.C. Wang, Single channel source separation using sparse nmf and graph regularization. ASE Big Data Soc. Inform. 55, 1–7 (2015)
Google Scholar
B. Premanode, J. Vongprasert, C. Toumazou, Noise reduction for nonlinear nonstationary time series data using averaging intrinsic mode function. Algorithms 6(3), 407–429 (2013)
Article MathSciNet Google Scholar
A.W. Rix, J.G. Beerends, M.P. Hollier, A.P. Hekstra, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, in IEEE International Conference on Acoustics, Speech, Signal Processing. 6, 7–11 May (2001)
S.T. Roweis, One microphone source separation. Advances in Neural Information Processing Systems. 793–799 (2001).
S.T. Roweis, Factorial models and refiltering for speech separation and denoising, in Eurospeech, Geneva, 1009–1012 (2003)
M.N. Schmidt, R.K. Olsson, Single-channel speech separation using sparse non-negative matrix factorization, in 9th International Conference on Spoken Language Processing. Pittsburgh, PA, USA (2006)
M.N. Schmidt, M. Morup, Sparse non-negative matrix factor 2-D deconvolution for blind single-channel source separation. Indep. Compon. Anal. Blind Signal Sep. 3889, 700–707 (2006)
Article Google Scholar
S.M. Seedahmed, A generalised wavelet packet‐based anonymization approach for ECG security application. 9, 18, 6137–6147 (2016)
L. Sun, C. Zhao, M. Su, F. Wang, Single-channel blind source separation based on joint dictionary with common sub-dictionary. Int. J. Speech Technol. 21(1), 19–27 (2018)
Article Google Scholar
L. Sun, K. Xie, T. Gu, J. Chen, Z. Yang, Joint dictionary learning using a new optimization method for single-channel blind source separation. Speech Commun. 106, 85–94 (2019)
Article Google Scholar
C.H. Tall, R.C. Hendriks, R. Heusdens, J. Jensen, An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136 (2011)
Article Google Scholar
P. Tianliang, C. Yang, L. Zengli, A time-frequency domain blind source separation method for underdetermined instantaneous mixtures. Circuits Syst. Signal Process. 34(12), 3883–3895 (2015)
Article MathSciNet Google Scholar
Y.V. Varshney, Z.A. Abbasi, M.R. Abidi, O. Farooq, Frequency selection based separation of speech signals with reduced computational time using sparse NMF. Arch. Acoust. 42(2), 287–295 (2017)
Article Google Scholar
E. Vincent, R. Gribonval, C. Fevote, Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)
Article Google Scholar
S. Wang, A. Chern, Y. Tsao, J. Hung, X. Lu, Y. Lai, B. Su, Wavelet speech enhancement based on non-negative matrix factorization. IEEE Signal Process. Lett. 23, 1101–1105 (2016)
Article Google Scholar
Y. Wang, Y. Li, K.C. Ho, A. Zare, M. Skubic, Sparsity promoted non-negative matrix factorization for source separation and detection, in Proceedings of the 19th International Conference on Digital Signal Processing. IEEE. 20–23 August (2014).
Z. Wanng, F. Sha, Discriminative non-negative matrix factorization for single-channel speech separation, in IEEE International Conference on Acoustics, Speech, and Signal Processing (2014)
Y. Xu, G. Bao, X. Xu, Z. Ye, Single-channel speech separation using sequential discriminative dictionary learning. Signal Process. 106, 134–140 (2015)
Article Google Scholar
V.V. Yash, A.A. Zia, R.A. Musiur, O. Farooq, Variable sparsity regularization factor based SNMF for monaural speech separation, in 40th International Conference on Telecommunications and Signal Processing (TSP). 5–7 July (2017)

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (Grant No. 61671418).

Author information

Authors and Affiliations

National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei, 230026, Anhui, China
Md. Imran Hossain, Rizwan Ullah, Asim Masood & Zhongfu Ye
Department of CSE, Islamic University, Kushtia, 7003, Bangladesh
Md. Shohidul Islam
Department of ICE, Islamic University, Kushtia, 7003, Bangladesh
Mst. Titasa Khatun

Authors

Md. Imran Hossain
View author publications
You can also search for this author in PubMed Google Scholar
Md. Shohidul Islam
View author publications
You can also search for this author in PubMed Google Scholar
Mst. Titasa Khatun
View author publications
You can also search for this author in PubMed Google Scholar
Rizwan Ullah
View author publications
You can also search for this author in PubMed Google Scholar
Asim Masood
View author publications
You can also search for this author in PubMed Google Scholar
Zhongfu Ye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhongfu Ye.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hossain, M.I., Islam, M.S., Khatun, M.T. et al. Dual-Transform Source Separation Using Sparse Nonnegative Matrix Factorization. Circuits Syst Signal Process 40, 1868–1891 (2021). https://doi.org/10.1007/s00034-020-01564-x

Download citation

Received: 02 December 2019
Revised: 29 September 2020
Accepted: 06 October 2020
Published: 23 October 2020
Issue Date: April 2021
DOI: https://doi.org/10.1007/s00034-020-01564-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dual-Transform Source Separation Using Sparse Nonnegative Matrix Factorization

Abstract

Access this article

Similar content being viewed by others

Itakura-Saito Nonnegative Matrix Two-Dimensional Factorizations for Blind Single Channel Audio Separation

Single-Channel Audio Source Separation with NMF: Divergences, Constraints and Algorithms

Underdetermined blind source separation of speech mixtures unifying dictionary learning and sparse representation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dual-Transform Source Separation Using Sparse Nonnegative Matrix Factorization

Abstract

Access this article

Similar content being viewed by others

Itakura-Saito Nonnegative Matrix Two-Dimensional Factorizations for Blind Single Channel Audio Separation

Single-Channel Audio Source Separation with NMF: Divergences, Constraints and Algorithms

Underdetermined blind source separation of speech mixtures unifying dictionary learning and sparse representation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation