Abstract
Audio source separation is an important but challenging problem for many applications due to the only available single channel mixed signal. This work proposes a novel Non-Local Multi-scale Multi-band DenseNet model termed as NLMMDenseNet for audio source separation by jointly exploring the long-term dependencies and recovering the missing information around bands’ borders. Specifically, to well leverage the long-term dependencies among the audio spectrogram, we propose a new non-local model by incorporating the non-local layer into MMDenseNet. It enables the proposed model to capture different audio sources features. Besides, the proposed model can also capture cross-band features, which are used to recover the missing information around bands’ borders. The proposed model outperforms state-of-the-art results on the widely-used MIR-1K and DSD100 datasets by taking advantages of global information and bands’ border information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: Gcnet: Non-local networks meet squeeze-excitation networks and beyond. arXiv preprint arXiv:1904.11492 (2019)
Fu, J., et al.: Dual attention network for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Huang, P.S., Chen, S.D., Smaragdis, P., Hasegawa-Johnson, M.: Singing-voice separation from monaural recordings using robust principal component analysis. In: Acoustics, Speech and Signal Processing, pp. 57–60 (2012)
Huang, P.S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Singing-voice separation from monaural recordings using deep recurrent neural networks. In: ISMIR, pp. 477–482 (2014)
Huang, P.S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 23(12), 2136–2147 (2015)
Jansson, A., Humphrey, E., Montecchio, N., Bittner, R., Kumar, A., Weyde, T.: Singing voice separation with deep u-net convolutional networks (2017)
Jeong, I.Y., Lee, K.: Singing voice separation using rpca with weighted \(\rm {l}\_1\)-norm. In: International Conference on Latent Variable Analysis and Signal Separation, pp. 553–562 (2017)
Le Roux, J., Hershey, J.R., Weninger, F.: Deep NMF for speech separation. In: Acoustics, Speech and Signal Processing, pp. 66–70 (2015)
Li, Z., Liu, J., Tang, J., Lu, H.: Robust structured subspace learning for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 37(10), 2085–2098 (2015)
Li, Z., Tang, J., Mei, T.: Deep collaborative embedding for social image understanding. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2070–2083 (2019)
Miron, M., Janer Mestres, J., Gómez Gutiérrez, E.: Generating data to train convolutional neural networks for classical music source separation. In: The 14th Sound and Music Computing Conference (2017)
Nugraha, A.A., Liutkus, A., Vincent, E.: Multichannel audio source separation with deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 24(9), 1652–1664 (2016)
Park, S., Kim, T., Lee, K., Kwak, N.: Music source separation using stacked hourglass networks. arXiv preprint arXiv:1805.08559 (2018)
Rafii, Z., Pardo, B.: A simple music/voice separation method based on the extraction of the repeating musical structure. In: Acoustics, Speech and Signal Processing, pp. 221–224 (2011)
Sebastian, J., Murthy, H.A.: Group delay based music source separation using deep recurrent neural networks. In: Signal Processing and Communications, pp. 1–5 (2016)
Sprechmann, P., Bronstein, A.M., Sapiro, G.: Real-time online singing voice separation from monaural recordings using robust low-rank modeling. In: ISMIR, pp. 67–72 (2012)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Takahashi, N., Goswami, N., Mitsufuji, Y.: MMDenseLSTM: an efficient combination of convolutional and recurrent neural networks for audio source separation. In: International Workshop on Acoustic Signal Enhancement, pp. 106–110 (2018)
Takahashi, N., Mitsufuji, Y.: Multi-scale multi-band densenets for audio source separation. In: Applications of Signal Processing to Audio and Acoustics, pp. 21–25 (2017)
Uhlich, S., et al.: Improving music source separation based on deep neural networks through data augmentation and network blending. In: Acoustics, Speech and Signal Processing, pp. 261–265 (2017)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 6000–6010 (2017)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Weninger, F., Roux, J.L., Hershey, J.R., Watanabe, S.: Discriminative NMF and its application to single-channel source separation. In: Annual Conference of the International Speech Communication Association (2014)
Yang, P.K., Hsu, C.C., Chien, J.T.: Bayesian singing-voice separation. In: ISMIR, pp. 507–512 (2014)
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318 (2018)
Acknowledgement
This work was partially supported by the National Key Research and Development Program of China under Grant 2017YFC0820601 and the National Natural Science Foundation of China (Grant No. 61772275, 61720106004 and 61672304).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Huang, Y. (2019). Non-local MMDenseNet with Cross-Band Features for Audio Source Separation. In: Cui, Z., Pan, J., Zhang, S., Xiao, L., Yang, J. (eds) Intelligence Science and Big Data Engineering. Big Data and Machine Learning. IScIDE 2019. Lecture Notes in Computer Science(), vol 11936. Springer, Cham. https://doi.org/10.1007/978-3-030-36204-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-36204-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36203-4
Online ISBN: 978-3-030-36204-1
eBook Packages: Computer ScienceComputer Science (R0)