skip to main content
10.1145/1553374.1553396acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Learning dictionaries of stable autoregressive models for audio scene analysis

Published:14 June 2009Publication History

ABSTRACT

In this paper, we explore an application of basis pursuit to audio scene analysis. The goal of our work is to detect when certain sounds are present in a mixed audio signal. We focus on the regime where out of a large number of possible sources, a small but unknown number combine and overlap to yield the observed signal. To infer which sounds are present, we decompose the observed signal as a linear combination of a small number of active sources. We cast the inference as a regularized form of linear regression whose sparse solutions yield decompositions with few active sources. We characterize the acoustic variability of individual sources by autoregressive models of their time domain waveforms. When we do not have prior knowledge of the individual sources, the coefficients of these autoregressive models must be learned from audio examples. We analyze the dynamical stability of these models and show how to estimate stable models by substituting a simple convex optimization for a difficult eigenvalue problem. We demonstrate our approach by learning dictionaries of musical notes and using these dictionaries to analyze polyphonic recordings of piano, cello, and violin.

References

  1. Chechik, G., Ie, E., Rehn, M., Bengio, S., & Lyon, D. (2008). Large-scale content-based audio retrieval from text queries. Proceeding of the 1st ACM International Conference on Multimedia Information Retrieval (MIR-08) (pp. 105--112). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Chen, S. S., Donoho, D. L., & Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20, 33--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cheng, C., Hu, D. J., & Saul, L. K. (2008). Nonnegative matrix factorization for real time musical analysis and sight-reading evaluation. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-08) (pp. 2017--2020).Google ScholarGoogle Scholar
  4. Cho, Y., & Saul, L. K. (2009). Sparse decomposition of mixed audio signals by basis pursuit with autoregressive models. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-09) (pp. 1705--1708). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cont, A. (2006). Realtime multiple pitch observation using sparse non-negative constraints. Proceedings of the International Symposium on Music Information Retrieval (ISMIR-06).Google ScholarGoogle Scholar
  6. Fritts, L. (1997). The University of Iowa Musical Instrument Samples. http://theremin.music.uiowa.edu/MIS.html.Google ScholarGoogle Scholar
  7. Golub, G. H., & Loan, C. F. V. (1996). Matrix computations. The Johns Hopkins University Press.Google ScholarGoogle Scholar
  8. Goto, M. (2006). Analysis of musical audio signals. In D. Wang and G. Brown (Eds.), Computational auditory scene analysis: Principles, algorithms, and applications, 251--295. John Wiley & Sons, Inc.Google ScholarGoogle Scholar
  9. Grosse, R., Raina, R., Kwong, H., & Ng, A. Y. (2007). Shift-invariant sparse coding for audio classification. Proceedings of the 23rd Annual Conference on Uncertainty in Artificial Intelligence (UAI-07) (pp. 149--158).Google ScholarGoogle Scholar
  10. Hyvarinen, A., Karhunen, J., & Oja, E. (2001). Independent component analysis. John Wiley & Sons.Google ScholarGoogle Scholar
  11. Lacy, S. L., & Bernstein, D. S. (2002). Subspace identification with guaranteed stability using constrained optimization. Proceedings of the American Control Conference (pp. 3307--3312).Google ScholarGoogle ScholarCross RefCross Ref
  12. Lee, D. D., & Seung, H. S. (2001). Algorithms for nonnegative matrix factorization. Advances in Neural Information Processing Systems 14 (pp. 556--562). MIT Press.Google ScholarGoogle Scholar
  13. Makhoul, J. J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63, 561--580.Google ScholarGoogle ScholarCross RefCross Ref
  14. Nakashizuka, M. (2008). A sparse decomposition method for periodic signal mixtures. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 91, 791--800. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Roweis, S. T. (2000). One microphone source separation. Advances in Neural Information Processing Systems 13 (pp. 793--799). MIT Press.Google ScholarGoogle Scholar
  16. Sardy, S., Bruce, A. G., & Tseng, P. (2000). Block coordinate relaxation methods for nonparametric wavelet denoising. Journal of Computational and Graphical Statistics, 9, 361--379.Google ScholarGoogle Scholar
  17. Siddiqi, S., Boots, B., & Gordon, G. (2008). A constraint generation approach to learning stable linear dynamical systems. Advances in Neural Information Processing Systems 20 (pp. 1329--1336). MIT Press.Google ScholarGoogle Scholar
  18. Smaragdis, P., & Brown, J. C. (2003). Non-negative matrix factorization for polyphonic music transcription. Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (pp. 177--180).Google ScholarGoogle ScholarCross RefCross Ref
  19. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 58(1), 267--288.Google ScholarGoogle ScholarCross RefCross Ref
  20. Wang, D., & Brown, G. J. (Eds.). (2006). Computational auditory scene analysis: Principles, algorithms, and applications. John Wiley & Sons, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68, 49--67.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Learning dictionaries of stable autoregressive models for audio scene analysis

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Other conferences
                  ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
                  June 2009
                  1331 pages
                  ISBN:9781605585161
                  DOI:10.1145/1553374

                  Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 14 June 2009

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                  Acceptance Rates

                  Overall Acceptance Rate140of548submissions,26%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader