poster

Bilingual analysis of song lyrics and audio words

Authors:

Yuan-Ching TengAuthors Info & Claims

MM '12: Proceedings of the 20th ACM international conference on Multimedia

Pages 829 - 832

https://doi.org/10.1145/2393347.2396323

Published: 29 October 2012 Publication History

Abstract

Thanks to the development of music audio analysis, state-of-the-art techniques can now detect musical attributes such as timbre, rhythm, and pitch with certain level of reliability and effectiveness. An emerging body of research has begun to model the high-level perceptual properties of music listening, including the mood and the preferable listening context of a music piece. Towards this goal, we propose a novel text-like feature representation that encodes the rich and time-varying information of music using a composite of features extracted from the song lyrics and audio signals. In particular, we investigate dictionary learning algorithms to optimize the generation of local feature descriptors and also probabilistic topic models to group semantically relevant text and audio words. This text-like representation leads to significant improvement in automatic mood classification over conventional audio features.

References

[1]

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003.

Digital Library

[2]

L. Breiman. Random forests. In Machine Learning, pages 5--32, 2001.

Digital Library

[3]

L. Breiman. Random Forest Matlab Toolbox, 20 {Online} http://code.google.com/p/randomforest-matlab/}.

[4]

D. Cabrera. Psysound: A computer program for psycho-acoustical analysis. In Proc. Aust. Acoustic Soc. Conf., 1999. {Online}http://psysound.wikidot.com/.

[5]

M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney. Content-based music information retrieval: Current directions and future challenges. PIEEE, 96(4):668--696, 2008.

[6]

B. Efron, T. Hastie, L. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32:407--499, 2004.

[7]

H. Hassanieh and et al. Simple and practical algorithm for sparse Fourier transform. In SODA, 2012.

Digital Library

[8]

D. Huron. Perceptual and cognitive applications in music information retrieval. In ISMIR, 2000.

[9]

O. Lartillot and P. Toiviainen. MIR in Matlab (II): A toolbox for musical feature extraction from audio. In ISMIR, pages 127--130, 2007.

[10]

C.-T. Lee, Y.-H. Yang, and H. H. Chen. Multipitch estimation of piano music by exemplar-based sparse representation. IEEE TMM, 2012.

[11]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60:91--110, 2004.

Digital Library

[12]

Q. Lu, X. Chen, D. Yang, and J. Wang. Boosting for multi-modal music emotion classification. In ISMIR, 2010.

[13]

J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionary learning for sparse coding. In ICML, pages 689--696, 2009.

Digital Library

[14]

B. McFee, L. Barrington, and G. Lanckriet. Learning content similarity for music recommendation. TASLP, 2012.

Digital Library

[15]

M. Müller, D. P. W. Ellis, A. Klapuri, and G. Richard. Signal processing for music analysis. J. Sel. Topics Signal Processing, 5(6):1088--1110, 2011.

[16]

M. Müller and S. Ewert. Towards timbre-invariant audio features for harmony-based music. IEEE TASLP, 18(3):649--662, 2010.

Digital Library

[17]

E. Pampalk, S. Dixon, and G. Widmer. Exploring music collections by browsing different views. In ISMIR, 2003.

Digital Library

[18]

C. Schörkhuber and A. Klapuri. Constant-Q transform toolbox for music processing. In Proc. SMC, 2010.

[19]

D. Turnbull, L. Barrington, D. Torres, and G. Lanckriet. Semantic annotation and retrieval of music and sound effects. IEEE TASLP, 16(2):467--476, 2008.

Digital Library

[20]

G. Tzanetakis and P. Cook. Musical genre classification of audio signals. IEEE TASLP, 10(5):293--302, 2002.

[21]

J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. Huang, and S. Yan. Sparse representation for computer vision and pattern recognition. PIEEE, 98(6):1031--1044, 2010.

[22]

Y.-H. Yang and H. H. Chen. Music Emotion Recognition. CRC Press, 2011.

Digital Library

[23]

C.-C. Yeh and Y.-H. Yang. Supervised dictionary learning for music genre classification. In ICMR, 2012.

Digital Library

[24]

M. Zentner, D. Grandjean, and K. Scherer. Emotions evoked by the sound of music: Characterization, classification, and measurement. Emotion, 8:494--521, 2008.

Cited By

Yeh CWang JYang YWang H(2014)Improving music auto-tagging by intra-song instance bagging2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2014.6853977(2139-2143)Online publication date: May-2014
https://doi.org/10.1109/ICASSP.2014.6853977
Yeh CSu LYang Y(2013)Dual-layer bag-of-frames model for music genre classification2013 IEEE International Conference on Acoustics, Speech and Signal Processing10.1109/ICASSP.2013.6637646(246-250)Online publication date: May-2013
https://doi.org/10.1109/ICASSP.2013.6637646
Yeh CYang Y(2013)Towards a more efficient sparse coding based audio-word feature extraction system2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference10.1109/APSIPA.2013.6694252(1-7)Online publication date: Oct-2013
https://doi.org/10.1109/APSIPA.2013.6694252

Recommendations

Genre Differences of Song Lyrics and Artist Wikis: An Analysis of Popularity, Length, Repetitiveness, and Readability

WWW '19: The World Wide Web Conference

Music is known to exhibit different characteristics, depending on genre and style. While most research that studies such differences takes a musicological perspective and analyzes acoustic properties of individual pieces or artists, we conduct a large-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '12: Proceedings of the 20th ACM international conference on Multimedia

October 2012

1584 pages

ISBN:9781450310895

DOI:10.1145/2393347

General Chairs:
Noboru Babaguchi
Osaka University, Japan
,
Kiyoharu Aizawa
The University of Tokyo, Japan
,
John Smith
IBM, USA
,
Program Chairs:
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Thomas Plagemann
University of Oslo, Norway
,
Xian-Sheng Hua
Microsoft, USA
,
Rong Yan
Facebook, USA

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

MM '12

Sponsor:

SIGMM

MM '12: ACM Multimedia Conference

October 29 - November 2, 2012

Nara, Japan

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
247
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yeh CWang JYang YWang H(2014)Improving music auto-tagging by intra-song instance bagging2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2014.6853977(2139-2143)Online publication date: May-2014
https://doi.org/10.1109/ICASSP.2014.6853977
Yeh CSu LYang Y(2013)Dual-layer bag-of-frames model for music genre classification2013 IEEE International Conference on Acoustics, Speech and Signal Processing10.1109/ICASSP.2013.6637646(246-250)Online publication date: May-2013
https://doi.org/10.1109/ICASSP.2013.6637646
Yeh CYang Y(2013)Towards a more efficient sparse coding based audio-word feature extraction system2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference10.1109/APSIPA.2013.6694252(1-7)Online publication date: Oct-2013
https://doi.org/10.1109/APSIPA.2013.6694252

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten