skip to main content
10.1145/2371456.2371464acmotherconferencesArticle/Chapter ViewAbstractPublication PagesamConference Proceedingsconference-collections
research-article

Improved music similarity computation based on tone objects

Published: 26 September 2012 Publication History

Abstract

In this paper, we propose a novel approach for music similarity estimation. It combines temporal segmentation of music signals with source separation into so-called tone objects. We solely use the timbre-related audio features Mel-Frequency Cepstral Coefficients (MFCC) and Octave-based Spectral Contrast (OSC) to describe the extracted tone objects. First, we compare our approach to a baseline system that employs frame-wise feature extraction and bag-of-frames classification. Second, we set up a system that extracts features on perfectly isolated single track recordings, achieving near perfect classification. Finally, we compare our novel approach against the basis experiments. We find that it clearly outperforms the baseline system in a five-class genre classification task. Our results indicate that tone object based feature extraction clearly improves music similarity estimation.

References

[1]
J.-J. Aucouturier and F. Pachet. Improving timbre similarity: How high's the sky? Journal of Negative Results in Speech and Audio Sciences, 1(1), 2004.
[2]
J. Bello, L. Daudet, S. Abdallah, et al. A tutorial on onset detection in music signals. IEEE Transactions on Speech and Audio Processing, 13(5):1035--1047, Sept. 2005.
[3]
P.-H. Chen, L. Cheh-Jen, and B. Schölkopf. A turorial on ν-support vector machines. Technical report, Department of Computer Science and Information Engineering, Taipei, Max Planck Institute for Biological Cybernetics, Tübingen, 2005.
[4]
C. Dittmar, C. Bastuck, and M. Gruhne. Novel mid-level audio features for music similarity. In Proc. of the Inaugural Intl. Conference on Music Communication Science (ICoMCS), pages 38--41, 2007.
[5]
S. Dixon. Onset detection revisited. In Proc. of the Intl. Conference on Digital Audio Effects (DAFx), Montreal, Canada, 2006.
[6]
M. R. Every and J. E. Szymanski. Separation of synchronous pitched notes by spectral filtering of harmonics. IEEE Transactions on Audio, Speech, and Language Processing, 14(5):1845--1856, Sept. 2006.
[7]
D. FitzGerald. Harmonic/percussive separation using median filtering. In Proc. of the 13th Intl. Conference on Digital Audio Effects (DAFx), Graz, Sept. 2010.
[8]
D. FitzGerald, E. Coyle, and M. Cranitch. Using tensor factorisation models to separate drums from polyphonic music. In Proc. of the Intl. Conference on Digital Audio Effects (DAFx), Como, Italy, 2009.
[9]
J. Foote and S. Uchihashi. The beat spectrum: a new approach to rhythm analysis. In IEEE Intl. Conference on Multimedia and Expo (ICME), pages 881--884, 2001.
[10]
F. Fuhrmann and P. Herrera. Polyphonic Instrument Recognition for Exploring Semantic Similarities in Music. In Proc. of the 13th Intl. Conference on Digital Audio Effects (DAFx), 2010.
[11]
H. Fujihara and M. Goto. A Music Information Retrieval System Based on Singing Voice Timbre. In Proc. of the 8th Intl. Conference on Music Information Retrieval (ISMIR), pages 467--470, Vienna, Austria, Sept. 2007.
[12]
H. Fujihara, A. Klapuri, and M. D. Plumbey. Instrumentation-based Music Similarity using Sparse Representations. In Proc. of the IEEE Intl. Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 433--436, Kyoto, Japan, 2012.
[13]
F. Gouyon, A. Klapuri, S. Dixon, et al. An experimental comparison of audio tempo induction algorithms. IEEE Transactions on Audio, Speech, and Language Processing, 14(5):1832--1844, 2006.
[14]
H. Großmann, A. Kruspe, J. Abeßer, and H. Lukashevich. Towards cross-modal search and synchronization of music and video streams. In Proc. of the Intl. Congress on Computer Science: Information Systems and Technologies, (CSIST), Minsk, Belarus, 2011.
[15]
S. Hainsworth and M. Macleod. Onset detection in musical audio signals. In Proc. of the Intl. Computer Music Conference (ICMC), Singapore, 2003.
[16]
C. Hsu, C. Chang, C. Lin, et al. A practical guide to support vector classification. Technical report, National Taiwan University, Taiwan, July 2003.
[17]
M. J. Hunt, M. Lennig, and P. Mermelstein. Experiments in syllable-based recognition of continuous speech. In Proc. of the Intl. Conference on Acoustics and Signal Processing (ICASSP), Denver, Colorado, USA, 1980.
[18]
J. H. Jensen, M. G. Christensen, M. N. Murthi, and S. H. Jensen. Evaluation of MFCC estimation techniques for music similarity. In Proc. of the European Signal Processing Conference (EUSIPCO), pages 926--930, 2006.
[19]
D.-N. Jiang, L. Lu, H.-J. Zhang, et al. Music type classification by spectral contrast feature. In IEEE Intl. Conference on Multimedia and Expo (ICME), volume 1, pages 113--116 vol.1, 2002.
[20]
A. Klapuri. Multipitch analysis of polyphonic music and speech signals using an auditory model. In IEEE Transactions on Audio, Speech, and Language Processing, volume 16, pages 255--266, Feb. 2008.
[21]
T. Lidy, A. Rauber, A. Pertusa, and J. M. Iñesta. Improving Genre Classification by Combination of Audio and Symbolic Descriptors using a Transcription System. In Proc. of the 8th Intl. Conference on Music Information Retrieval (ISMIR), Wien, Sept. 2007.
[22]
B. Logan. Mel frequency cepstral coefficients for music modeling. In Intl. Symposium on Music Information Retrieval, 2000.
[23]
H. Lukashevich, J. Abeßer, C. Dittmar, and H. Großmann. From Multi-Labeling to Multi-Domain-Labeling: A Novel Two-Dimensional Approach to Music Genre Classification. In 10th Intl. Society for Music Information Retrieval Conference (ISMIR 2009), pages 459--464, 2009.
[24]
S. Marsland. Machine Learning: An Algorithmic Perspective. Chapman & Hall/CRC, Boca Raton, 2009.
[25]
N. Ono, K. Miyamoto, H. Kameoka, and S. Sagayama. A real-time equalizer of harmonic and percussive components in music signals. In Proc. of the 9th Intl. Conference on Music Information Retrieval (ISMIR), pages 139--144, 2008.
[26]
Y. Panagakis, C. Kotropoulos, and G. R. Arce. Music Genre Classification using Locality Preserving Non-negative Tensor Factorization and Sparse Representations. In Proc. of the 10th Intl. Society for Music Information Retrieval Conference (ISMIR), pages 249--254, 2009.
[27]
G. Peeters and X. Rodet. Hierarchical gaussian tree with inertia ratio maximization for the classification of large musical instrument databases. In Proc. of the 6th Intl. Conference on Digital Audio Effects (DAFx-03), 2003.
[28]
T. Pohle, D. Schnitzer, M. Schedl, et al. On Rhythm and General Music Similarity. In Proc. of the 10th Intl. Society for Music Information Retrieval Conference (ISMIR), pages 525--530, 2009.
[29]
H. Rump, S. Miyabe, E. Tsunoo, et al. Autoregressive MFCC models for genre classification improved by harmonic-percussion separation. In Proc. of the 11th Intl. Society for Music Information Retrieval Conference (ISMIR), pages 87--92, 2010.
[30]
J. Salamon, B. Rocha, and E. Gómez. Musical genre classification using melody features extracted from polyphonic music signals. In Proc. of the IEEE Intl. Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 81--84, Mar. 2012.
[31]
N. Scaringella, G. Zoia, and D. Mlynek. Automatic Genre Classification of Music Content: a Survey. IEEE Signal Processing Magazine, 23(2):133--141, Mar. 2006.
[32]
K. Seyerlehner, G. Widmer, and P. Knees. Frame Level Audio Similarity -- A Codebook Approach. In Proc. of the 11th Intl. Conference on Digital Audio Effects (DAFx), pages 349--356, Espoo, Finland, 2008.
[33]
M. Slaney. Auditory toolbox. Technical Report 45, Apple Computer, Inc., 1994.
[34]
B. L. Sturm. An analysis of the gtzan music genre dataset. Dept. Architecture, Design and Media Technology, Aalborg University Copenhagen, Denmark; ACM Special Session, 2012.
[35]
B. L. Sturm and P. Noorzad. On automatic music genre recognition by sparse representation classification using auditory temporal modulations. In 9th Intl. Symposium on Computer Music Modeling and Retrieval (CMMR), June 2012.
[36]
H. Terasawa, M. Slaney, and J. Berger. Thirteen colors of timbre. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 2005.
[37]
G. Tzanetakis and P. Cook. Musical Genre Classification of Audio Signals. In IEEE Transactions on Speech and Audio Processing, volume 10, pages 293--302, 2002.
[38]
C. Uhle, C. Dittmar, and T. Sporer. Extraction of drum tracks from polyphonic music using independent subspace analysis. In Proc. of the 4th Intl. Symposium on Independent Component Analysis (ICA), Nara, Japan, 2003.
[39]
A. Webb. Statistical Pattern Recognition. John Wiley and Sons Ltd., 2nd edition, 2002.

Cited By

View all
  • (2020)Fuzzy Similarity Relations in Decision MakingHandbook of Research on Emerging Applications of Fuzzy Algebraic Structures10.4018/978-1-7998-0190-0.ch020(369-385)Online publication date: 2020
  • (2014)A Survey of Evaluation in Music Genre RecognitionAdaptive Multimedia Retrieval: Semantics, Context, and Adaptation10.1007/978-3-319-12093-5_2(29-66)Online publication date: 29-Oct-2014

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
AM '12: Proceedings of the 7th Audio Mostly Conference: A Conference on Interaction with Sound
September 2012
174 pages
ISBN:9781450315692
DOI:10.1145/2371456
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 September 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. audio similarity
  2. genre classification
  3. harmonic-percussive decomposition
  4. multitrack recordings
  5. tone objects

Qualifiers

  • Research-article

Conference

AM '12
AM '12: A conference on interaction with sound
September 26 - 28, 2012
Corfu, Greece

Acceptance Rates

Overall Acceptance Rate 177 of 275 submissions, 64%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Fuzzy Similarity Relations in Decision MakingHandbook of Research on Emerging Applications of Fuzzy Algebraic Structures10.4018/978-1-7998-0190-0.ch020(369-385)Online publication date: 2020
  • (2014)A Survey of Evaluation in Music Genre RecognitionAdaptive Multimedia Retrieval: Semantics, Context, and Adaptation10.1007/978-3-319-12093-5_2(29-66)Online publication date: 29-Oct-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media