Non-local MMDenseNet with Cross-Band Features for Audio Source Separation

Huang, Yi

doi:10.1007/978-3-030-36204-1_4

Yi Huang¹³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11936))

Included in the following conference series:

International Conference on Intelligent Science and Big Data Engineering

1589 Accesses

Abstract

Audio source separation is an important but challenging problem for many applications due to the only available single channel mixed signal. This work proposes a novel Non-Local Multi-scale Multi-band DenseNet model termed as NLMMDenseNet for audio source separation by jointly exploring the long-term dependencies and recovering the missing information around bands’ borders. Specifically, to well leverage the long-term dependencies among the audio spectrogram, we propose a new non-local model by incorporating the non-local layer into MMDenseNet. It enables the proposed model to capture different audio sources features. Besides, the proposed model can also capture cross-band features, which are used to recover the missing information around bands’ borders. The proposed model outperforms state-of-the-art results on the widely-used MIR-1K and DSD100 datasets by taking advantages of global information and bands’ border information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: Gcnet: Non-local networks meet squeeze-excitation networks and beyond. arXiv preprint arXiv:1904.11492 (2019)
Fu, J., et al.: Dual attention network for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Huang, P.S., Chen, S.D., Smaragdis, P., Hasegawa-Johnson, M.: Singing-voice separation from monaural recordings using robust principal component analysis. In: Acoustics, Speech and Signal Processing, pp. 57–60 (2012)
Google Scholar
Huang, P.S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Singing-voice separation from monaural recordings using deep recurrent neural networks. In: ISMIR, pp. 477–482 (2014)
Google Scholar
Huang, P.S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 23(12), 2136–2147 (2015)
Article Google Scholar
Jansson, A., Humphrey, E., Montecchio, N., Bittner, R., Kumar, A., Weyde, T.: Singing voice separation with deep u-net convolutional networks (2017)
Google Scholar
Jeong, I.Y., Lee, K.: Singing voice separation using rpca with weighted \(\rm {l}\_1\)-norm. In: International Conference on Latent Variable Analysis and Signal Separation, pp. 553–562 (2017)
Chapter Google Scholar
Le Roux, J., Hershey, J.R., Weninger, F.: Deep NMF for speech separation. In: Acoustics, Speech and Signal Processing, pp. 66–70 (2015)
Google Scholar
Li, Z., Liu, J., Tang, J., Lu, H.: Robust structured subspace learning for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 37(10), 2085–2098 (2015)
Article Google Scholar
Li, Z., Tang, J., Mei, T.: Deep collaborative embedding for social image understanding. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2070–2083 (2019)
Article Google Scholar
Miron, M., Janer Mestres, J., Gómez Gutiérrez, E.: Generating data to train convolutional neural networks for classical music source separation. In: The 14th Sound and Music Computing Conference (2017)
Google Scholar
Nugraha, A.A., Liutkus, A., Vincent, E.: Multichannel audio source separation with deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 24(9), 1652–1664 (2016)
Article Google Scholar
Park, S., Kim, T., Lee, K., Kwak, N.: Music source separation using stacked hourglass networks. arXiv preprint arXiv:1805.08559 (2018)
Rafii, Z., Pardo, B.: A simple music/voice separation method based on the extraction of the repeating musical structure. In: Acoustics, Speech and Signal Processing, pp. 221–224 (2011)
Google Scholar
Sebastian, J., Murthy, H.A.: Group delay based music source separation using deep recurrent neural networks. In: Signal Processing and Communications, pp. 1–5 (2016)
Google Scholar
Sprechmann, P., Bronstein, A.M., Sapiro, G.: Real-time online singing voice separation from monaural recordings using robust low-rank modeling. In: ISMIR, pp. 67–72 (2012)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Google Scholar
Takahashi, N., Goswami, N., Mitsufuji, Y.: MMDenseLSTM: an efficient combination of convolutional and recurrent neural networks for audio source separation. In: International Workshop on Acoustic Signal Enhancement, pp. 106–110 (2018)
Google Scholar
Takahashi, N., Mitsufuji, Y.: Multi-scale multi-band densenets for audio source separation. In: Applications of Signal Processing to Audio and Acoustics, pp. 21–25 (2017)
Google Scholar
Uhlich, S., et al.: Improving music source separation based on deep neural networks through data augmentation and network blending. In: Acoustics, Speech and Signal Processing, pp. 261–265 (2017)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 6000–6010 (2017)
Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Google Scholar
Weninger, F., Roux, J.L., Hershey, J.R., Watanabe, S.: Discriminative NMF and its application to single-channel source separation. In: Annual Conference of the International Speech Communication Association (2014)
Google Scholar
Yang, P.K., Hsu, C.C., Chien, J.T.: Bayesian singing-voice separation. In: ISMIR, pp. 507–512 (2014)
Google Scholar
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318 (2018)

Download references

Acknowledgement

This work was partially supported by the National Key Research and Development Program of China under Grant 2017YFC0820601 and the National Natural Science Foundation of China (Grant No. 61772275, 61720106004 and 61672304).

Author information

Authors and Affiliations

Nanjing University of Science and Technology, Nanjing, China
Yi Huang

Authors

Yi Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Huang .

Editor information

Editors and Affiliations

Nanjing University of Science and Technology, Nanjing, China
Zhen Cui
Nanjing University of Science and Technology, Nanjing, China
Jinshan Pan
Nanjing University of Science and Technology, Nanjing, China
Shanshan Zhang
Nanjing University of Science and Technology, Nanjing, China
Liang Xiao
Nanjing University of Science and Technology, Nanjing, China
Jian Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, Y. (2019). Non-local MMDenseNet with Cross-Band Features for Audio Source Separation. In: Cui, Z., Pan, J., Zhang, S., Xiao, L., Yang, J. (eds) Intelligence Science and Big Data Engineering. Big Data and Machine Learning. IScIDE 2019. Lecture Notes in Computer Science(), vol 11936. Springer, Cham. https://doi.org/10.1007/978-3-030-36204-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-36204-1_4
Published: 29 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36203-4
Online ISBN: 978-3-030-36204-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics