Skip to main content

Non-local MMDenseNet with Cross-Band Features for Audio Source Separation

  • Conference paper
  • First Online:
Intelligence Science and Big Data Engineering. Big Data and Machine Learning (IScIDE 2019)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11936))

  • 1589 Accesses

Abstract

Audio source separation is an important but challenging problem for many applications due to the only available single channel mixed signal. This work proposes a novel Non-Local Multi-scale Multi-band DenseNet model termed as NLMMDenseNet for audio source separation by jointly exploring the long-term dependencies and recovering the missing information around bands’ borders. Specifically, to well leverage the long-term dependencies among the audio spectrogram, we propose a new non-local model by incorporating the non-local layer into MMDenseNet. It enables the proposed model to capture different audio sources features. Besides, the proposed model can also capture cross-band features, which are used to recover the missing information around bands’ borders. The proposed model outperforms state-of-the-art results on the widely-used MIR-1K and DSD100 datasets by taking advantages of global information and bands’ border information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: Gcnet: Non-local networks meet squeeze-excitation networks and beyond. arXiv preprint arXiv:1904.11492 (2019)

  2. Fu, J., et al.: Dual attention network for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)

    Google Scholar 

  3. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

    Google Scholar 

  4. Huang, P.S., Chen, S.D., Smaragdis, P., Hasegawa-Johnson, M.: Singing-voice separation from monaural recordings using robust principal component analysis. In: Acoustics, Speech and Signal Processing, pp. 57–60 (2012)

    Google Scholar 

  5. Huang, P.S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Singing-voice separation from monaural recordings using deep recurrent neural networks. In: ISMIR, pp. 477–482 (2014)

    Google Scholar 

  6. Huang, P.S., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 23(12), 2136–2147 (2015)

    Article  Google Scholar 

  7. Jansson, A., Humphrey, E., Montecchio, N., Bittner, R., Kumar, A., Weyde, T.: Singing voice separation with deep u-net convolutional networks (2017)

    Google Scholar 

  8. Jeong, I.Y., Lee, K.: Singing voice separation using rpca with weighted \(\rm {l}\_1\)-norm. In: International Conference on Latent Variable Analysis and Signal Separation, pp. 553–562 (2017)

    Chapter  Google Scholar 

  9. Le Roux, J., Hershey, J.R., Weninger, F.: Deep NMF for speech separation. In: Acoustics, Speech and Signal Processing, pp. 66–70 (2015)

    Google Scholar 

  10. Li, Z., Liu, J., Tang, J., Lu, H.: Robust structured subspace learning for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 37(10), 2085–2098 (2015)

    Article  Google Scholar 

  11. Li, Z., Tang, J., Mei, T.: Deep collaborative embedding for social image understanding. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2070–2083 (2019)

    Article  Google Scholar 

  12. Miron, M., Janer Mestres, J., Gómez Gutiérrez, E.: Generating data to train convolutional neural networks for classical music source separation. In: The 14th Sound and Music Computing Conference (2017)

    Google Scholar 

  13. Nugraha, A.A., Liutkus, A., Vincent, E.: Multichannel audio source separation with deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 24(9), 1652–1664 (2016)

    Article  Google Scholar 

  14. Park, S., Kim, T., Lee, K., Kwak, N.: Music source separation using stacked hourglass networks. arXiv preprint arXiv:1805.08559 (2018)

  15. Rafii, Z., Pardo, B.: A simple music/voice separation method based on the extraction of the repeating musical structure. In: Acoustics, Speech and Signal Processing, pp. 221–224 (2011)

    Google Scholar 

  16. Sebastian, J., Murthy, H.A.: Group delay based music source separation using deep recurrent neural networks. In: Signal Processing and Communications, pp. 1–5 (2016)

    Google Scholar 

  17. Sprechmann, P., Bronstein, A.M., Sapiro, G.: Real-time online singing voice separation from monaural recordings using robust low-rank modeling. In: ISMIR, pp. 67–72 (2012)

    Google Scholar 

  18. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)

    Google Scholar 

  19. Takahashi, N., Goswami, N., Mitsufuji, Y.: MMDenseLSTM: an efficient combination of convolutional and recurrent neural networks for audio source separation. In: International Workshop on Acoustic Signal Enhancement, pp. 106–110 (2018)

    Google Scholar 

  20. Takahashi, N., Mitsufuji, Y.: Multi-scale multi-band densenets for audio source separation. In: Applications of Signal Processing to Audio and Acoustics, pp. 21–25 (2017)

    Google Scholar 

  21. Uhlich, S., et al.: Improving music source separation based on deep neural networks through data augmentation and network blending. In: Acoustics, Speech and Signal Processing, pp. 261–265 (2017)

    Google Scholar 

  22. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 6000–6010 (2017)

    Google Scholar 

  23. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of IEEE Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)

    Google Scholar 

  24. Weninger, F., Roux, J.L., Hershey, J.R., Watanabe, S.: Discriminative NMF and its application to single-channel source separation. In: Annual Conference of the International Speech Communication Association (2014)

    Google Scholar 

  25. Yang, P.K., Hsu, C.C., Chien, J.T.: Bayesian singing-voice separation. In: ISMIR, pp. 507–512 (2014)

    Google Scholar 

  26. Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318 (2018)

Download references

Acknowledgement

This work was partially supported by the National Key Research and Development Program of China under Grant 2017YFC0820601 and the National Natural Science Foundation of China (Grant No. 61772275, 61720106004 and 61672304).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, Y. (2019). Non-local MMDenseNet with Cross-Band Features for Audio Source Separation. In: Cui, Z., Pan, J., Zhang, S., Xiao, L., Yang, J. (eds) Intelligence Science and Big Data Engineering. Big Data and Machine Learning. IScIDE 2019. Lecture Notes in Computer Science(), vol 11936. Springer, Cham. https://doi.org/10.1007/978-3-030-36204-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-36204-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-36203-4

  • Online ISBN: 978-3-030-36204-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics