research-article

Improved music similarity computation based on tone objects

Authors:

Johannes Krasser,

Holger Großmann,

Christian Dittmar,

Estefanía CanoAuthors Info & Claims

AM '12: Proceedings of the 7th Audio Mostly Conference: A Conference on Interaction with Sound

Pages 47 - 54

https://doi.org/10.1145/2371456.2371464

Published: 26 September 2012 Publication History

Abstract

In this paper, we propose a novel approach for music similarity estimation. It combines temporal segmentation of music signals with source separation into so-called tone objects. We solely use the timbre-related audio features Mel-Frequency Cepstral Coefficients (MFCC) and Octave-based Spectral Contrast (OSC) to describe the extracted tone objects. First, we compare our approach to a baseline system that employs frame-wise feature extraction and bag-of-frames classification. Second, we set up a system that extracts features on perfectly isolated single track recordings, achieving near perfect classification. Finally, we compare our novel approach against the basis experiments. We find that it clearly outperforms the baseline system in a five-class genre classification task. Our results indicate that tone object based feature extraction clearly improves music similarity estimation.

References

[1]

J.-J. Aucouturier and F. Pachet. Improving timbre similarity: How high's the sky? Journal of Negative Results in Speech and Audio Sciences, 1(1), 2004.

[2]

J. Bello, L. Daudet, S. Abdallah, et al. A tutorial on onset detection in music signals. IEEE Transactions on Speech and Audio Processing, 13(5):1035--1047, Sept. 2005.

[3]

P.-H. Chen, L. Cheh-Jen, and B. Schölkopf. A turorial on ν-support vector machines. Technical report, Department of Computer Science and Information Engineering, Taipei, Max Planck Institute for Biological Cybernetics, Tübingen, 2005.

[4]

C. Dittmar, C. Bastuck, and M. Gruhne. Novel mid-level audio features for music similarity. In Proc. of the Inaugural Intl. Conference on Music Communication Science (ICoMCS), pages 38--41, 2007.

[5]

S. Dixon. Onset detection revisited. In Proc. of the Intl. Conference on Digital Audio Effects (DAFx), Montreal, Canada, 2006.

[6]

M. R. Every and J. E. Szymanski. Separation of synchronous pitched notes by spectral filtering of harmonics. IEEE Transactions on Audio, Speech, and Language Processing, 14(5):1845--1856, Sept. 2006.

Digital Library

[7]

D. FitzGerald. Harmonic/percussive separation using median filtering. In Proc. of the 13th Intl. Conference on Digital Audio Effects (DAFx), Graz, Sept. 2010.

[8]

D. FitzGerald, E. Coyle, and M. Cranitch. Using tensor factorisation models to separate drums from polyphonic music. In Proc. of the Intl. Conference on Digital Audio Effects (DAFx), Como, Italy, 2009.

[9]

J. Foote and S. Uchihashi. The beat spectrum: a new approach to rhythm analysis. In IEEE Intl. Conference on Multimedia and Expo (ICME), pages 881--884, 2001.

[10]

F. Fuhrmann and P. Herrera. Polyphonic Instrument Recognition for Exploring Semantic Similarities in Music. In Proc. of the 13th Intl. Conference on Digital Audio Effects (DAFx), 2010.

[11]

H. Fujihara and M. Goto. A Music Information Retrieval System Based on Singing Voice Timbre. In Proc. of the 8th Intl. Conference on Music Information Retrieval (ISMIR), pages 467--470, Vienna, Austria, Sept. 2007.

[12]

H. Fujihara, A. Klapuri, and M. D. Plumbey. Instrumentation-based Music Similarity using Sparse Representations. In Proc. of the IEEE Intl. Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 433--436, Kyoto, Japan, 2012.

[13]

F. Gouyon, A. Klapuri, S. Dixon, et al. An experimental comparison of audio tempo induction algorithms. IEEE Transactions on Audio, Speech, and Language Processing, 14(5):1832--1844, 2006.

Digital Library

[14]

H. Großmann, A. Kruspe, J. Abeßer, and H. Lukashevich. Towards cross-modal search and synchronization of music and video streams. In Proc. of the Intl. Congress on Computer Science: Information Systems and Technologies, (CSIST), Minsk, Belarus, 2011.

[15]

S. Hainsworth and M. Macleod. Onset detection in musical audio signals. In Proc. of the Intl. Computer Music Conference (ICMC), Singapore, 2003.

[16]

C. Hsu, C. Chang, C. Lin, et al. A practical guide to support vector classification. Technical report, National Taiwan University, Taiwan, July 2003.

[17]

M. J. Hunt, M. Lennig, and P. Mermelstein. Experiments in syllable-based recognition of continuous speech. In Proc. of the Intl. Conference on Acoustics and Signal Processing (ICASSP), Denver, Colorado, USA, 1980.

[18]

J. H. Jensen, M. G. Christensen, M. N. Murthi, and S. H. Jensen. Evaluation of MFCC estimation techniques for music similarity. In Proc. of the European Signal Processing Conference (EUSIPCO), pages 926--930, 2006.

[19]

D.-N. Jiang, L. Lu, H.-J. Zhang, et al. Music type classification by spectral contrast feature. In IEEE Intl. Conference on Multimedia and Expo (ICME), volume 1, pages 113--116 vol.1, 2002.

[20]

A. Klapuri. Multipitch analysis of polyphonic music and speech signals using an auditory model. In IEEE Transactions on Audio, Speech, and Language Processing, volume 16, pages 255--266, Feb. 2008.

Digital Library

[21]

T. Lidy, A. Rauber, A. Pertusa, and J. M. Iñesta. Improving Genre Classification by Combination of Audio and Symbolic Descriptors using a Transcription System. In Proc. of the 8th Intl. Conference on Music Information Retrieval (ISMIR), Wien, Sept. 2007.

[22]

B. Logan. Mel frequency cepstral coefficients for music modeling. In Intl. Symposium on Music Information Retrieval, 2000.

[23]

H. Lukashevich, J. Abeßer, C. Dittmar, and H. Großmann. From Multi-Labeling to Multi-Domain-Labeling: A Novel Two-Dimensional Approach to Music Genre Classification. In 10th Intl. Society for Music Information Retrieval Conference (ISMIR 2009), pages 459--464, 2009.

[24]

S. Marsland. Machine Learning: An Algorithmic Perspective. Chapman & Hall/CRC, Boca Raton, 2009.

Digital Library

[25]

N. Ono, K. Miyamoto, H. Kameoka, and S. Sagayama. A real-time equalizer of harmonic and percussive components in music signals. In Proc. of the 9th Intl. Conference on Music Information Retrieval (ISMIR), pages 139--144, 2008.

[26]

Y. Panagakis, C. Kotropoulos, and G. R. Arce. Music Genre Classification using Locality Preserving Non-negative Tensor Factorization and Sparse Representations. In Proc. of the 10th Intl. Society for Music Information Retrieval Conference (ISMIR), pages 249--254, 2009.

[27]

G. Peeters and X. Rodet. Hierarchical gaussian tree with inertia ratio maximization for the classification of large musical instrument databases. In Proc. of the 6th Intl. Conference on Digital Audio Effects (DAFx-03), 2003.

[28]

T. Pohle, D. Schnitzer, M. Schedl, et al. On Rhythm and General Music Similarity. In Proc. of the 10th Intl. Society for Music Information Retrieval Conference (ISMIR), pages 525--530, 2009.

[29]

H. Rump, S. Miyabe, E. Tsunoo, et al. Autoregressive MFCC models for genre classification improved by harmonic-percussion separation. In Proc. of the 11th Intl. Society for Music Information Retrieval Conference (ISMIR), pages 87--92, 2010.

[30]

J. Salamon, B. Rocha, and E. Gómez. Musical genre classification using melody features extracted from polyphonic music signals. In Proc. of the IEEE Intl. Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 81--84, Mar. 2012.

[31]

N. Scaringella, G. Zoia, and D. Mlynek. Automatic Genre Classification of Music Content: a Survey. IEEE Signal Processing Magazine, 23(2):133--141, Mar. 2006.

[32]

K. Seyerlehner, G. Widmer, and P. Knees. Frame Level Audio Similarity -- A Codebook Approach. In Proc. of the 11th Intl. Conference on Digital Audio Effects (DAFx), pages 349--356, Espoo, Finland, 2008.

[33]

M. Slaney. Auditory toolbox. Technical Report 45, Apple Computer, Inc., 1994.

[34]

B. L. Sturm. An analysis of the gtzan music genre dataset. Dept. Architecture, Design and Media Technology, Aalborg University Copenhagen, Denmark; ACM Special Session, 2012.

[35]

B. L. Sturm and P. Noorzad. On automatic music genre recognition by sparse representation classification using auditory temporal modulations. In 9th Intl. Symposium on Computer Music Modeling and Retrieval (CMMR), June 2012.

[36]

H. Terasawa, M. Slaney, and J. Berger. Thirteen colors of timbre. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 2005.

[37]

G. Tzanetakis and P. Cook. Musical Genre Classification of Audio Signals. In IEEE Transactions on Speech and Audio Processing, volume 10, pages 293--302, 2002.

[38]

C. Uhle, C. Dittmar, and T. Sporer. Extraction of drum tracks from polyphonic music using independent subspace analysis. In Proc. of the 4th Intl. Symposium on Independent Component Analysis (ICA), Nara, Japan, 2003.

[39]

A. Webb. Statistical Pattern Recognition. John Wiley and Sons Ltd., 2nd edition, 2002.

Cited By

El Alaoui MEl Yassini K(2020)Fuzzy Similarity Relations in Decision MakingHandbook of Research on Emerging Applications of Fuzzy Algebraic Structures10.4018/978-1-7998-0190-0.ch020(369-385)Online publication date: 2020
https://doi.org/10.4018/978-1-7998-0190-0.ch020
Sturm B(2014)A Survey of Evaluation in Music Genre RecognitionAdaptive Multimedia Retrieval: Semantics, Context, and Adaptation10.1007/978-3-319-12093-5_2(29-66)Online publication date: 29-Oct-2014
https://doi.org/10.1007/978-3-319-12093-5_2

Index Terms

Improved music similarity computation based on tone objects

Recommendations

Music Genre Classification and Similarity Calculation Using Bass-Line Features
ISM '08: Proceedings of the 2008 Tenth IEEE International Symposium on Multimedia

Various studies on music information retrieval have been conducted. Most of them used low-level features such as spectral and cepstral features, which are effective to some extent but have a limit because they do not directly or clearly correspond to ...
Music Genre Classification and Feature Comparison using ML
ICMLT '22: Proceedings of the 2022 7th International Conference on Machine Learning Technologies

An essential feature of the music is the genre, which can be considered a high-level description of an individual piece of music. In this sense, genre as a music feature is similar to typical descriptive features from the ML perspective. Although a ...
Visualizing music and audio using self-similarity
MULTIMEDIA '99: Proceedings of the seventh ACM international conference on Multimedia (Part 1)

This paper presents a novel approach to visualizing the time structure of music and audio. The acoustic similarity between any two instants of an audio recording is displayed in a 2D representation, allowing identification of structural and rhythmic ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AM '12: Proceedings of the 7th Audio Mostly Conference: A Conference on Interaction with Sound

September 2012

174 pages

ISBN:9781450315692

DOI:10.1145/2371456

Conference Chairs:
Andreas Floros
Ionian University, Greece
,
Andreas Mniestris
Ionian University, Greece
,
Program Chairs:
Iani Zannos
Ionian University, Greece
,
Theodoros Lotis
Ionian University, Greece

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 September 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

AM '12

AM '12: A conference on interaction with sound

September 26 - 28, 2012

Corfu, Greece

Acceptance Rates

Overall Acceptance Rate 177 of 275 submissions, 64%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
182
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

El Alaoui MEl Yassini K(2020)Fuzzy Similarity Relations in Decision MakingHandbook of Research on Emerging Applications of Fuzzy Algebraic Structures10.4018/978-1-7998-0190-0.ch020(369-385)Online publication date: 2020
https://doi.org/10.4018/978-1-7998-0190-0.ch020
Sturm B(2014)A Survey of Evaluation in Music Genre RecognitionAdaptive Multimedia Retrieval: Semantics, Context, and Adaptation10.1007/978-3-319-12093-5_2(29-66)Online publication date: 29-Oct-2014
https://doi.org/10.1007/978-3-319-12093-5_2

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents