research-article

Learning dictionaries of stable autoregressive models for audio scene analysis

Authors:
Youngmin Cho

University of California, San Diego, La Jolla, CA

University of California, San Diego, La Jolla, CA
View Profile

,
Lawrence K. Saul

University of California, San Diego, La Jolla, CA

University of California, San Diego, La Jolla, CA
View Profile

ICML '09: Proceedings of the 26th Annual International Conference on Machine LearningJune 2009Pages 169–176https://doi.org/10.1145/1553374.1553396

Published:14 June 2009Publication History

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Pages 169–176

ABSTRACT

In this paper, we explore an application of basis pursuit to audio scene analysis. The goal of our work is to detect when certain sounds are present in a mixed audio signal. We focus on the regime where out of a large number of possible sources, a small but unknown number combine and overlap to yield the observed signal. To infer which sounds are present, we decompose the observed signal as a linear combination of a small number of active sources. We cast the inference as a regularized form of linear regression whose sparse solutions yield decompositions with few active sources. We characterize the acoustic variability of individual sources by autoregressive models of their time domain waveforms. When we do not have prior knowledge of the individual sources, the coefficients of these autoregressive models must be learned from audio examples. We analyze the dynamical stability of these models and show how to estimate stable models by substituting a simple convex optimization for a difficult eigenvalue problem. We demonstrate our approach by learning dictionaries of musical notes and using these dictionaries to analyze polyphonic recordings of piano, cello, and violin.

References

Chechik, G., Ie, E., Rehn, M., Bengio, S., & Lyon, D. (2008). Large-scale content-based audio retrieval from text queries. Proceeding of the 1st ACM International Conference on Multimedia Information Retrieval (MIR-08) (pp. 105--112). ACM. Google ScholarDigital Library
Chen, S. S., Donoho, D. L., & Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20, 33--61. Google ScholarDigital Library
Cheng, C., Hu, D. J., & Saul, L. K. (2008). Nonnegative matrix factorization for real time musical analysis and sight-reading evaluation. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-08) (pp. 2017--2020).Google Scholar
Cho, Y., & Saul, L. K. (2009). Sparse decomposition of mixed audio signals by basis pursuit with autoregressive models. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-09) (pp. 1705--1708). Google ScholarDigital Library
Cont, A. (2006). Realtime multiple pitch observation using sparse non-negative constraints. Proceedings of the International Symposium on Music Information Retrieval (ISMIR-06).Google Scholar
Fritts, L. (1997). The University of Iowa Musical Instrument Samples. http://theremin.music.uiowa.edu/MIS.html.Google Scholar
Golub, G. H., & Loan, C. F. V. (1996). Matrix computations. The Johns Hopkins University Press.Google Scholar
Goto, M. (2006). Analysis of musical audio signals. In D. Wang and G. Brown (Eds.), Computational auditory scene analysis: Principles, algorithms, and applications, 251--295. John Wiley & Sons, Inc.Google Scholar
Grosse, R., Raina, R., Kwong, H., & Ng, A. Y. (2007). Shift-invariant sparse coding for audio classification. Proceedings of the 23rd Annual Conference on Uncertainty in Artificial Intelligence (UAI-07) (pp. 149--158).Google Scholar
Hyvarinen, A., Karhunen, J., & Oja, E. (2001). Independent component analysis. John Wiley & Sons.Google Scholar
Lacy, S. L., & Bernstein, D. S. (2002). Subspace identification with guaranteed stability using constrained optimization. Proceedings of the American Control Conference (pp. 3307--3312).Google ScholarCross Ref
Lee, D. D., & Seung, H. S. (2001). Algorithms for nonnegative matrix factorization. Advances in Neural Information Processing Systems 14 (pp. 556--562). MIT Press.Google Scholar
Makhoul, J. J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63, 561--580.Google ScholarCross Ref
Nakashizuka, M. (2008). A sparse decomposition method for periodic signal mixtures. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 91, 791--800. Google ScholarDigital Library
Roweis, S. T. (2000). One microphone source separation. Advances in Neural Information Processing Systems 13 (pp. 793--799). MIT Press.Google Scholar
Sardy, S., Bruce, A. G., & Tseng, P. (2000). Block coordinate relaxation methods for nonparametric wavelet denoising. Journal of Computational and Graphical Statistics, 9, 361--379.Google Scholar
Siddiqi, S., Boots, B., & Gordon, G. (2008). A constraint generation approach to learning stable linear dynamical systems. Advances in Neural Information Processing Systems 20 (pp. 1329--1336). MIT Press.Google Scholar
Smaragdis, P., & Brown, J. C. (2003). Non-negative matrix factorization for polyphonic music transcription. Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (pp. 177--180).Google ScholarCross Ref
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B, 58(1), 267--288.Google ScholarCross Ref
Wang, D., & Brown, G. J. (Eds.). (2006). Computational auditory scene analysis: Principles, algorithms, and applications. John Wiley & Sons, Inc. Google ScholarDigital Library
Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68, 49--67.Google ScholarCross Ref

Index Terms

Recommendations

Probabilistic graphical models for the analysis and synthesis of musical audio
Read More
Learning Core Audio: A Hands-On Guide to Audio Programming for Mac and iOS
Read More
Audio Coding for Representation in MIDI via Pitch Detection Using Harmonic Dictionaries
special issue on multimedia signal processing

The search for a flexible and concise alternate representation for digital musical sound leads to the proposal for the use of the MIDI (Musical Instrument Digital Interface) protocol. The problem becomes one of automating the conversion process from sound ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
ISBN:9781605585161
DOI:10.1145/1553374
General Chair:
Andrea Danyluk
Williams College
,
Program Chairs:
Léon Bottou
NEC Laboratories America
,
Michael Littman
Rutgers University
Copyright © 2009 Copyright 2009 by the author(s)/owner(s).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 207
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning dictionaries of stable autoregressive models for audio scene analysis

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Probabilistic graphical models for the analysis and synthesis of musical audio

Learning Core Audio: A Hands-On Guide to Audio Programming for Mac and iOS

Audio Coding for Representation in MIDI via Pitch Detection Using Harmonic Dictionaries

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Learning dictionaries of stable autoregressive models for audio scene analysis

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Probabilistic graphical models for the analysis and synthesis of musical audio

Learning Core Audio: A Hands-On Guide to Audio Programming for Mac and iOS

Audio Coding for Representation in MIDI via Pitch Detection Using Harmonic Dictionaries

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media