Skip to main content
Log in

Monophonic constrained non-negative sparse coding using instrument models for audio separation and transcription of monophonic source-based polyphonic mixtures

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper we propose a monophonic constrained signal decomposition model applied to polyphonic signals composed of several monophonic sources from different musical instruments. The harmonic constraint is particularly effective for tonal instruments because each note is associated with a unique basis. The monophonic constraint is implemented by enforcing single-non-zero gains per instrument in the factorization process. The proposed method uses previously trained instrument models with a supervised procedure. Both constraints (harmonic and monophonic) are implemented in a deterministic manner. The proposed method has been tested for two audio signal applications, Sound Source Separation and Automatic Music Transcription. Comparison with other state-of-the-art methods using a dataset of polyphonic mixtures composed of monophonic sources has produced competitive and promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Abdallah S, Plumbley M (2004) Polyphonic music transcription by non-negative sparse coding of power spectra. In: Proc. 5th Int. Society for Music Information Retrieval conf. (ISMIR), Barcelona, Spain

  2. Abdallah S, Plumbley M (2006) Unsupervised analysis of polyphonic music by sparse coding. IEEE Trans Neural Netw 17(1):179–196

    Article  Google Scholar 

  3. Benaroya L, Bimbot F, Gribonval R (2006) Audio source separation with a single sensor. IEEE Trans Audio Speech Lang Process 14(1):191–199

    Article  Google Scholar 

  4. Bertin N, Badeau R, Vincent E (2010) Enforcing harmonicity and smoothness in Bayesian non-negative matrix factorization applied to polyphonic music transcription. IEEE Trans Audio Speech Lang Process 18(3):538–549

    Article  Google Scholar 

  5. Candés EJ, Wakin MB (2008) An introduction to compressive sampling. IEEE Signal Process Mag 25(2):21–30

    Article  Google Scholar 

  6. Carabias-Orti JJ, Virtanen T, Vera-Candeas P, Ruiz-Reyes N, Cañadas-Quesada FJ (2011) Musical instrument sound multi-excitation model for non-negative spectrogram factorization. IEEE J Sel Topics Signal Process 5(6):1144–1158

    Article  Google Scholar 

  7. Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20:33–61

    Article  MathSciNet  Google Scholar 

  8. Dixon S (2000) On the computer recognition of solo piano music. In: Proceedings of Australasian computer music conference

  9. Duan Z, Pardo B (2011) Soundprism: an online system for score-informed source separation of music audio. IEEE J Sel Topics Signal Process 5(6):1205–1215

    Article  Google Scholar 

  10. Duan Z, Pardo B, Zhang C (2010) Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions. IEEE Trans Audio Speech Lang Process 18(8):2121–2133

    Article  Google Scholar 

  11. Every MR, Szymanski JE (2006) Separation of synchronous pitched notes by spectral filtering of harmonics. IEEE Trans Audio Speech Lang Process 14(5):1845–1856

    Article  Google Scholar 

  12. Févotte C, Idier J (2011) Algorithms for nonnegative matrix factorization with the beta-divergence. Neural Comput 23(9):2421–2456

    Article  MATH  MathSciNet  Google Scholar 

  13. Févotte C, Bertin N, Durrieu JL (2009) Nonnegative matrix factorization with the Itakura–Saito divergence. With application to music analysis. Neural Comput 21(3):793–830

    Article  MATH  Google Scholar 

  14. FitzGerald D, Cranitch M, Coyle E (2009) On the use of the beta divergence for musical source separation. In: Signals and systems conference (ISSC 2009), IET Irish, 10–11 June 2009, pp 1–6

  15. Gainza M, Coyle E (2007) Automating ornamentation transcription. In: IEEE international conference on acoustics, speech and signal processing, 2007. ICASSP 2007, vol 1, 15–20 April 2007, pp I-69–I-72

  16. Gemmeke JF, Virtanen T, Hurmalainen A (2011) Exemplar-based sparse representations for noise robust automatic speech recognition. IEEE Trans Audio Speech Lang Process 19(7):2067–2080

    Article  Google Scholar 

  17. Goto M (2004) Development of the RWC music database. In: Proc. of the 18th international congress on acoustics (ICA 2004), pp I-553–I-556 (invited paper)

  18. Goto M, Hashiguchi H, Nishimura T, Oka R (2002) RWC music database: popular, classical, and jazz music databases. In: Proc. of the 3rd Int. Society for Music Information Retrieval conf. (ISMIR), Paris, France

  19. Gribonval R, Bacry E (2003) Harmonic decomposition of audio signals with matching pursuit. IEEE Trans Signal Process 51(1):101–111

    Article  MathSciNet  Google Scholar 

  20. Helen M, Virtanen T (2005) Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine. In: Proc. EUSIPCO

  21. Hoyer P (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469

    MATH  MathSciNet  Google Scholar 

  22. Hyvarinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13:411–430

    Article  Google Scholar 

  23. Klapuri A (2004) Signal processing methods for the automatic transcription of music. PhD thesis, Tampere University of Technology

  24. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791

    Article  Google Scholar 

  25. Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. In: Proc. of neural information processing systems, Denver, USA

  26. Marxer R, Jordi J, Bonada J (2012) Low-latency instrument separation in polyphonic audio using timbre models. In: Proc. LVA/ICA

  27. Namgook C, Kuo C-CJ (2009) Underdetermined audio source separation from anechoic mixtures with long time delay. In: IEEE international conference on acoustics, speech and signal processing, 2009. ICASSP 2009, 19–24 April 2009, pp 1557–1560

  28. Olshausen BA, Field DF (1997) Sparse coding with an overcomplete basis set: a strategy employed by V1? Vis Res 37:3311–3325

    Article  Google Scholar 

  29. Ozerov A, Févotte C (2010) Multichannel non-negative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans Audio Speech Lang Process 18(3):550–563

    Article  Google Scholar 

  30. Ozerov A, Févotte C, Charbit M (2009) Factorial scaled hidden Markov model for polyphonic audio representation and source separation. In: IEEE workshop on applications of signal processing to audio and acoustics, WASPAA’09, pp 121–124

  31. Ozerov A, Vincent E, Bimbot F (2012) A general flexible framework for the handling of prior information in audio source separation. IEEE Trans Audio Speech Lang Process 20(4):1118–1133

    Article  Google Scholar 

  32. Plumbley M (2003) Algorithms for nonnegative independent component analysis. IEEE Trans Neural Netw 14(3):534–543

    Article  Google Scholar 

  33. Raczyński SA, Ono N, Sagayama S (2007) Multipitch analysis with harmonic nonnegative matrix approximation. In: Proc. int. conf. music inf. retrieval (ISMIR), pp 381–386

  34. Reyes-Gomez MJ, Raj B, Ellis D (2003) Multi-channel source separation by factorial HMMs. In: Proc. ICASSP, vol I, pp 664–667

  35. Sawada H, Araki S, Makino S (2011) Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Trans Audio Speech Lang Process 19(3):516–527

    Article  Google Scholar 

  36. Smaragdis P (1998) Blind separation of convolved mixtures in the frequency domain. Neurocomputing 22:21–34

    Article  MATH  Google Scholar 

  37. Valentin E, Vincent E, Harlander N, Hohmann V (2011) Subjective and objective quality assessment of audio source separation. IEEE Trans Audio Speech Lang Process 19(7):2046–2057

    Article  Google Scholar 

  38. Vincent E (2012) Improved perceptual metrics for the evaluation of audio source separation. In: 10th int. conf. on latent variable analysis and signal separation (LVA/ICA 2012)

  39. Vincent E, Bertin N, Badeau R (2010) Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans Audio Speech Lang Process 18(3):528–537

    Article  Google Scholar 

  40. Virtanen T (2007) Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans Audio Speech Lang Process 15(3):1066–1074

    Article  Google Scholar 

  41. Virtanen T, Klapuri A (2006) Analysis of polyphonic audio using source-filter model and non-negative matrix factorization. In: Advances in models for acoustic processing, neural information processing systems workshop

  42. Virtanen T, Cemgil AT, Godsill S (2008) Bayesian extensions to non-negative matrix factorisation for audio signal modeling. In: Proc. int. conf. acoust., speech, signal process. (ICASSP), Las Vegas, USA

  43. Wang B, Plumbley MD (2005) Musical audio stream separation by non-negative matrix factorization. In: Proc. DMRN summer conference, Glasgow

  44. Zibulevsky M, Kisilev P, Zeevi YY, Pearlmutter B (2002) Blind source separation via multinode sparse representation. In: NIPS

Download references

Acknowledgements

This work was supported by the Andalusian Business, Science and Innovation Council under project P10- TIC-6762, (FEDER) the Spanish Ministry of Science and Innovation under Project TEC2009-14414-C03-02, and the University of Jaen under Project R1/12/2010/64.

The authors would like to thank Z. Duan for kindly sharing his annotated real world music database with them.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francisco José Rodríguez-Serrano.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rodríguez-Serrano, F.J., Carabias-Orti, J.J., Vera-Candeas, P. et al. Monophonic constrained non-negative sparse coding using instrument models for audio separation and transcription of monophonic source-based polyphonic mixtures. Multimed Tools Appl 72, 925–949 (2014). https://doi.org/10.1007/s11042-013-1398-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1398-8

Keywords

Navigation