skip to main content
10.1145/2393347.2396323acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
poster

Bilingual analysis of song lyrics and audio words

Published: 29 October 2012 Publication History

Abstract

Thanks to the development of music audio analysis, state-of-the-art techniques can now detect musical attributes such as timbre, rhythm, and pitch with certain level of reliability and effectiveness. An emerging body of research has begun to model the high-level perceptual properties of music listening, including the mood and the preferable listening context of a music piece. Towards this goal, we propose a novel text-like feature representation that encodes the rich and time-varying information of music using a composite of features extracted from the song lyrics and audio signals. In particular, we investigate dictionary learning algorithms to optimize the generation of local feature descriptors and also probabilistic topic models to group semantically relevant text and audio words. This text-like representation leads to significant improvement in automatic mood classification over conventional audio features.

References

[1]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003.
[2]
L. Breiman. Random forests. In Machine Learning, pages 5--32, 2001.
[3]
L. Breiman. Random Forest Matlab Toolbox, 20 {Online} http://code.google.com/p/randomforest-matlab/}.
[4]
D. Cabrera. Psysound: A computer program for psycho-acoustical analysis. In Proc. Aust. Acoustic Soc. Conf., 1999. {Online}http://psysound.wikidot.com/.
[5]
M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney. Content-based music information retrieval: Current directions and future challenges. PIEEE, 96(4):668--696, 2008.
[6]
B. Efron, T. Hastie, L. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32:407--499, 2004.
[7]
H. Hassanieh and et al. Simple and practical algorithm for sparse Fourier transform. In SODA, 2012.
[8]
D. Huron. Perceptual and cognitive applications in music information retrieval. In ISMIR, 2000.
[9]
O. Lartillot and P. Toiviainen. MIR in Matlab (II): A toolbox for musical feature extraction from audio. In ISMIR, pages 127--130, 2007.
[10]
C.-T. Lee, Y.-H. Yang, and H. H. Chen. Multipitch estimation of piano music by exemplar-based sparse representation. IEEE TMM, 2012.
[11]
D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60:91--110, 2004.
[12]
Q. Lu, X. Chen, D. Yang, and J. Wang. Boosting for multi-modal music emotion classification. In ISMIR, 2010.
[13]
J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionary learning for sparse coding. In ICML, pages 689--696, 2009.
[14]
B. McFee, L. Barrington, and G. Lanckriet. Learning content similarity for music recommendation. TASLP, 2012.
[15]
M. Müller, D. P. W. Ellis, A. Klapuri, and G. Richard. Signal processing for music analysis. J. Sel. Topics Signal Processing, 5(6):1088--1110, 2011.
[16]
M. Müller and S. Ewert. Towards timbre-invariant audio features for harmony-based music. IEEE TASLP, 18(3):649--662, 2010.
[17]
E. Pampalk, S. Dixon, and G. Widmer. Exploring music collections by browsing different views. In ISMIR, 2003.
[18]
C. Schörkhuber and A. Klapuri. Constant-Q transform toolbox for music processing. In Proc. SMC, 2010.
[19]
D. Turnbull, L. Barrington, D. Torres, and G. Lanckriet. Semantic annotation and retrieval of music and sound effects. IEEE TASLP, 16(2):467--476, 2008.
[20]
G. Tzanetakis and P. Cook. Musical genre classification of audio signals. IEEE TASLP, 10(5):293--302, 2002.
[21]
J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. Huang, and S. Yan. Sparse representation for computer vision and pattern recognition. PIEEE, 98(6):1031--1044, 2010.
[22]
Y.-H. Yang and H. H. Chen. Music Emotion Recognition. CRC Press, 2011.
[23]
C.-C. Yeh and Y.-H. Yang. Supervised dictionary learning for music genre classification. In ICMR, 2012.
[24]
M. Zentner, D. Grandjean, and K. Scherer. Emotions evoked by the sound of music: Characterization, classification, and measurement. Emotion, 8:494--521, 2008.

Cited By

View all
  • (2014)Improving music auto-tagging by intra-song instance bagging2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2014.6853977(2139-2143)Online publication date: May-2014
  • (2013)Dual-layer bag-of-frames model for music genre classification2013 IEEE International Conference on Acoustics, Speech and Signal Processing10.1109/ICASSP.2013.6637646(246-250)Online publication date: May-2013
  • (2013)Towards a more efficient sparse coding based audio-word feature extraction system2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference10.1109/APSIPA.2013.6694252(1-7)Online publication date: Oct-2013

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '12: Proceedings of the 20th ACM international conference on Multimedia
October 2012
1584 pages
ISBN:9781450310895
DOI:10.1145/2393347
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. LDA
  2. MLLDA
  3. audioword
  4. sparse coding
  5. topic model

Qualifiers

  • Poster

Conference

MM '12
Sponsor:
MM '12: ACM Multimedia Conference
October 29 - November 2, 2012
Nara, Japan

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2014)Improving music auto-tagging by intra-song instance bagging2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2014.6853977(2139-2143)Online publication date: May-2014
  • (2013)Dual-layer bag-of-frames model for music genre classification2013 IEEE International Conference on Acoustics, Speech and Signal Processing10.1109/ICASSP.2013.6637646(246-250)Online publication date: May-2013
  • (2013)Towards a more efficient sparse coding based audio-word feature extraction system2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference10.1109/APSIPA.2013.6694252(1-7)Online publication date: Oct-2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media