Monophonic constrained non-negative sparse coding using instrument models for audio separation and transcription of monophonic source-based polyphonic mixtures

Rodríguez-Serrano, Francisco José; Carabias-Orti, Julio José; Vera-Candeas, Pedro; Canadas-Quesada, Francisco Jesús; Ruiz-Reyes, Nicolás

doi:10.1007/s11042-013-1398-8

Monophonic constrained non-negative sparse coding using instrument models for audio separation and transcription of monophonic source-based polyphonic mixtures

Published: 08 March 2013

Volume 72, pages 925–949, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Francisco José Rodríguez-Serrano¹,
Julio José Carabias-Orti¹,
Pedro Vera-Candeas¹,
Francisco Jesús Canadas-Quesada¹ &
…
Nicolás Ruiz-Reyes¹

311 Accesses
7 Citations
Explore all metrics

Abstract

In this paper we propose a monophonic constrained signal decomposition model applied to polyphonic signals composed of several monophonic sources from different musical instruments. The harmonic constraint is particularly effective for tonal instruments because each note is associated with a unique basis. The monophonic constraint is implemented by enforcing single-non-zero gains per instrument in the factorization process. The proposed method uses previously trained instrument models with a supervised procedure. Both constraints (harmonic and monophonic) are implemented in a deterministic manner. The proposed method has been tested for two audio signal applications, Sound Source Separation and Automatic Music Transcription. Comparison with other state-of-the-art methods using a dataset of polyphonic mixtures composed of monophonic sources has produced competitive and promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Percussive/harmonic sound separation by non-negative matrix factorization with smoothness/sparseness constraints

Article Open access 11 July 2014

Francisco Jesus Canadas-Quesada, Pedro Vera-Candeas, … Pablo Cabanas-Molero

A Note Event-Based Decomposition of Polyphonic Recordings Applied to Single-channel Audio Source Separation

Knowledge Based Fundamental and Harmonic Frequency Detection in Polyphonic Music Analysis

References

Abdallah S, Plumbley M (2004) Polyphonic music transcription by non-negative sparse coding of power spectra. In: Proc. 5th Int. Society for Music Information Retrieval conf. (ISMIR), Barcelona, Spain
Abdallah S, Plumbley M (2006) Unsupervised analysis of polyphonic music by sparse coding. IEEE Trans Neural Netw 17(1):179–196
Article Google Scholar
Benaroya L, Bimbot F, Gribonval R (2006) Audio source separation with a single sensor. IEEE Trans Audio Speech Lang Process 14(1):191–199
Article Google Scholar
Bertin N, Badeau R, Vincent E (2010) Enforcing harmonicity and smoothness in Bayesian non-negative matrix factorization applied to polyphonic music transcription. IEEE Trans Audio Speech Lang Process 18(3):538–549
Article Google Scholar
Candés EJ, Wakin MB (2008) An introduction to compressive sampling. IEEE Signal Process Mag 25(2):21–30
Article Google Scholar
Carabias-Orti JJ, Virtanen T, Vera-Candeas P, Ruiz-Reyes N, Cañadas-Quesada FJ (2011) Musical instrument sound multi-excitation model for non-negative spectrogram factorization. IEEE J Sel Topics Signal Process 5(6):1144–1158
Article Google Scholar
Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20:33–61
Article MathSciNet Google Scholar
Dixon S (2000) On the computer recognition of solo piano music. In: Proceedings of Australasian computer music conference
Duan Z, Pardo B (2011) Soundprism: an online system for score-informed source separation of music audio. IEEE J Sel Topics Signal Process 5(6):1205–1215
Article Google Scholar
Duan Z, Pardo B, Zhang C (2010) Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions. IEEE Trans Audio Speech Lang Process 18(8):2121–2133
Article Google Scholar
Every MR, Szymanski JE (2006) Separation of synchronous pitched notes by spectral filtering of harmonics. IEEE Trans Audio Speech Lang Process 14(5):1845–1856
Article Google Scholar
Févotte C, Idier J (2011) Algorithms for nonnegative matrix factorization with the beta-divergence. Neural Comput 23(9):2421–2456
Article MATH MathSciNet Google Scholar
Févotte C, Bertin N, Durrieu JL (2009) Nonnegative matrix factorization with the Itakura–Saito divergence. With application to music analysis. Neural Comput 21(3):793–830
Article MATH Google Scholar
FitzGerald D, Cranitch M, Coyle E (2009) On the use of the beta divergence for musical source separation. In: Signals and systems conference (ISSC 2009), IET Irish, 10–11 June 2009, pp 1–6
Gainza M, Coyle E (2007) Automating ornamentation transcription. In: IEEE international conference on acoustics, speech and signal processing, 2007. ICASSP 2007, vol 1, 15–20 April 2007, pp I-69–I-72
Gemmeke JF, Virtanen T, Hurmalainen A (2011) Exemplar-based sparse representations for noise robust automatic speech recognition. IEEE Trans Audio Speech Lang Process 19(7):2067–2080
Article Google Scholar
Goto M (2004) Development of the RWC music database. In: Proc. of the 18th international congress on acoustics (ICA 2004), pp I-553–I-556 (invited paper)
Goto M, Hashiguchi H, Nishimura T, Oka R (2002) RWC music database: popular, classical, and jazz music databases. In: Proc. of the 3rd Int. Society for Music Information Retrieval conf. (ISMIR), Paris, France
Gribonval R, Bacry E (2003) Harmonic decomposition of audio signals with matching pursuit. IEEE Trans Signal Process 51(1):101–111
Article MathSciNet Google Scholar
Helen M, Virtanen T (2005) Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine. In: Proc. EUSIPCO
Hoyer P (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469
MATH MathSciNet Google Scholar
Hyvarinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13:411–430
Article Google Scholar
Klapuri A (2004) Signal processing methods for the automatic transcription of music. PhD thesis, Tampere University of Technology
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791
Article Google Scholar
Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. In: Proc. of neural information processing systems, Denver, USA
Marxer R, Jordi J, Bonada J (2012) Low-latency instrument separation in polyphonic audio using timbre models. In: Proc. LVA/ICA
Namgook C, Kuo C-CJ (2009) Underdetermined audio source separation from anechoic mixtures with long time delay. In: IEEE international conference on acoustics, speech and signal processing, 2009. ICASSP 2009, 19–24 April 2009, pp 1557–1560
Olshausen BA, Field DF (1997) Sparse coding with an overcomplete basis set: a strategy employed by V1? Vis Res 37:3311–3325
Article Google Scholar
Ozerov A, Févotte C (2010) Multichannel non-negative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans Audio Speech Lang Process 18(3):550–563
Article Google Scholar
Ozerov A, Févotte C, Charbit M (2009) Factorial scaled hidden Markov model for polyphonic audio representation and source separation. In: IEEE workshop on applications of signal processing to audio and acoustics, WASPAA’09, pp 121–124
Ozerov A, Vincent E, Bimbot F (2012) A general flexible framework for the handling of prior information in audio source separation. IEEE Trans Audio Speech Lang Process 20(4):1118–1133
Article Google Scholar
Plumbley M (2003) Algorithms for nonnegative independent component analysis. IEEE Trans Neural Netw 14(3):534–543
Article Google Scholar
Raczyński SA, Ono N, Sagayama S (2007) Multipitch analysis with harmonic nonnegative matrix approximation. In: Proc. int. conf. music inf. retrieval (ISMIR), pp 381–386
Reyes-Gomez MJ, Raj B, Ellis D (2003) Multi-channel source separation by factorial HMMs. In: Proc. ICASSP, vol I, pp 664–667
Sawada H, Araki S, Makino S (2011) Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Trans Audio Speech Lang Process 19(3):516–527
Article Google Scholar
Smaragdis P (1998) Blind separation of convolved mixtures in the frequency domain. Neurocomputing 22:21–34
Article MATH Google Scholar
Valentin E, Vincent E, Harlander N, Hohmann V (2011) Subjective and objective quality assessment of audio source separation. IEEE Trans Audio Speech Lang Process 19(7):2046–2057
Article Google Scholar
Vincent E (2012) Improved perceptual metrics for the evaluation of audio source separation. In: 10th int. conf. on latent variable analysis and signal separation (LVA/ICA 2012)
Vincent E, Bertin N, Badeau R (2010) Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans Audio Speech Lang Process 18(3):528–537
Article Google Scholar
Virtanen T (2007) Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans Audio Speech Lang Process 15(3):1066–1074
Article Google Scholar
Virtanen T, Klapuri A (2006) Analysis of polyphonic audio using source-filter model and non-negative matrix factorization. In: Advances in models for acoustic processing, neural information processing systems workshop
Virtanen T, Cemgil AT, Godsill S (2008) Bayesian extensions to non-negative matrix factorisation for audio signal modeling. In: Proc. int. conf. acoust., speech, signal process. (ICASSP), Las Vegas, USA
Wang B, Plumbley MD (2005) Musical audio stream separation by non-negative matrix factorization. In: Proc. DMRN summer conference, Glasgow
Zibulevsky M, Kisilev P, Zeevi YY, Pearlmutter B (2002) Blind source separation via multinode sparse representation. In: NIPS

Download references

Acknowledgements

This work was supported by the Andalusian Business, Science and Innovation Council under project P10- TIC-6762, (FEDER) the Spanish Ministry of Science and Innovation under Project TEC2009-14414-C03-02, and the University of Jaen under Project R1/12/2010/64.

The authors would like to thank Z. Duan for kindly sharing his annotated real world music database with them.

Author information

Authors and Affiliations

Telecommunication Engineering Department, University of Jaen, Alfonso X El Sabio, 28, 23700, Linares, Jaen, Spain
Francisco José Rodríguez-Serrano, Julio José Carabias-Orti, Pedro Vera-Candeas, Francisco Jesús Canadas-Quesada & Nicolás Ruiz-Reyes

Authors

Francisco José Rodríguez-Serrano
View author publications
You can also search for this author in PubMed Google Scholar
Julio José Carabias-Orti
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Vera-Candeas
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Jesús Canadas-Quesada
View author publications
You can also search for this author in PubMed Google Scholar
Nicolás Ruiz-Reyes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francisco José Rodríguez-Serrano.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rodríguez-Serrano, F.J., Carabias-Orti, J.J., Vera-Candeas, P. et al. Monophonic constrained non-negative sparse coding using instrument models for audio separation and transcription of monophonic source-based polyphonic mixtures. Multimed Tools Appl 72, 925–949 (2014). https://doi.org/10.1007/s11042-013-1398-8

Download citation

Published: 08 March 2013
Issue Date: September 2014
DOI: https://doi.org/10.1007/s11042-013-1398-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Monophonic constrained non-negative sparse coding using instrument models for audio separation and transcription of monophonic source-based polyphonic mixtures

Abstract

Access this article

Similar content being viewed by others

Percussive/harmonic sound separation by non-negative matrix factorization with smoothness/sparseness constraints

A Note Event-Based Decomposition of Polyphonic Recordings Applied to Single-channel Audio Source Separation

Knowledge Based Fundamental and Harmonic Frequency Detection in Polyphonic Music Analysis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Percussive/harmonic sound separation by non-negative matrix factorization with smoothness/sparseness constraints

A Note Event-Based Decomposition of Polyphonic Recordings Applied to Single-channel Audio Source Separation

Knowledge Based Fundamental and Harmonic Frequency Detection in Polyphonic Music Analysis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation