Elsevier

Signal Processing

Volume 90, Issue 4, April 2010, Pages 1064-1076
Signal Processing

Automatic identification of audio recordings based on statistical modeling

https://doi.org/10.1016/j.sigpro.2009.10.025Get rights and content

Abstract

This paper describes a methodology for the automatic identification of audio recordings of ethnic music. The identification is based on an application of hidden Markov models (HMMs), which are automatically built from a representation of the music pieces to be identified. States of the HMMs are labeled with music events, and the transition and observation probabilities are directly computed from the information on the music piece. The recordings are modeled by a set of acoustic features that are computed according with the characteristics of the music events. Three alternative approaches, based on typical applications of HMMs, are proposed to perform the identification. Tests carried out on collections of recordings showed that the methodology can achieve good results, and the identification rate is high enough to suggest applications for automatic retrieval of metadata and for the identification of alternative recordings of the same piece.

Introduction

Digital collections, usually stored and organized in digital libraries, are becoming an increasingly important tool for the preservation and dissemination of music cultural heritage. Music can benefit from the application of digital technologies even more than other art forms, because it can be shared by users with different background and education, easily crossing language barriers. This is particularly true for ethnic music, which can be enjoyed by users with different culture, traditions, and living in different geographical areas. For a digital library of ethnic music, the goal “anytime–anywhere” is meant in its widest sense. Yet, the interest towards music digital collections should be paired by the availability of tools for an effective access to digital content. Textual descriptors such as title, author, and performers are all valuable metadata that can allow a user to retrieve potentially relevant music items. Moreover, additional information about time and key signatures, instrumentation, and geographical location, may provide alternative ways to access music content.

In the case of ethnic music, the access to music documents becomes more challenging, because typical metadata may not be good descriptors of music content. The most effective fields in a music search—author and title according to [1]—may not be good descriptors for this repertoire. To this end, tools for content-based search should be provided to the user, aiming at the exploration of the digital collection, for instance to retrieve alternative recordings of a given piece also when textual metadata are different or not available. Another important issue on the development of a music digital collection regards how music content is cataloged during the acquisition of the material. Classification and labeling should be manually carried out by trained personnel, with a considerable increment in time and costs. Also in this case, automatic tools that allow the identification of a recording as an instance of a given song can be used to automatically add pertinent metadata.

This paper reports a methodology for the automatic identification of unknown recordings. The approach is based on hidden Markov models (HMMs), which model the unobservable process underlying the production of a music performance given a statistical representation of a music piece. The presented approach builds on earlier work on audio to score alignment [2], [3], which has been extended to the particular task of exploring ethnic music digital collections. One of the goals of the proposed paper is the exploration on whether HMM-based identification can be carried out also when partial information—e.g., melodic content—is available to describe a song. To this end, some of the common characteristics of many repertoires of ethnic music make the proposed approach particularly suitable.

First of all, melody plays a fundamental role in most ethnic repertories, as opposed to functional harmony used in tonal Western music, yet the same song can be performed in a countless number of variants, which depend on the oral tradition by which it is transmitted, on the available instrumentation, on the geographical areas and thus on the cultures that have been influenced by it. Given the prominent role of oral tradition, for many songs the most common metadata such as author and title may not be significant—the author being unknown and the title being present in dozens of variants—making music retrieval a particularly difficult task. The fact that a given piece can be sung a cappella, played by a single instrument, played or sung with accompaniment, makes its identification challenging because only partial information about the acoustic features is available. All these considerations suggest the use of a statistical framework to model ethnic music recordings. The framework can be applied to identify unknown recordings, mine a music collection to highlight the presence of different recordings—for instance due to geographical variants—of the same song.

The paper is organized as follows. Section 2 introduces a number of approaches that are related to the presented methodology. The methodology for music modeling is described in Section 3, while three approaches to identification are presented in Section 4. The results of the experimental evaluation are given in Section 5, followed by some concluding considerations in Section 6.

Section snippets

Related work

This section describes some approaches reported in the literature related to the present work. It has been chosen to limit the discussion to research work focusing on music identification, thus not including the vast literature in music information retrieval, which has a focus on the computation of music similarity aimed at content-based retrieval. A complete overview of these approaches can be found in [4].

Statistical modeling of music performances

The idea underlying the proposed approach is that a performance can be considered as the realization of a process that converts the representation a performer has about a music piece into a sequence of acoustic features. The representation can be read from a music score, recalled from memory or improvised. The process is non-deterministic because different performances correspond to the same music piece, depending on a number of parameters which are only partially known. For instance, features

Music identification

Identification, or recognition, is probably the application of HMMs that is most often described in literature. The classical identification problem may be stated as follows:

  • Case A:

    Given an unknown audio recording, described by a sequence of audio features E¯={e¯(1),,e¯(T)}, where e¯(t)={eo(t),e1(t),} are the acoustic events described in Section 3.2, and a set of competing models λi, which have been generated from the representation of music pieces as described in Section 3.1, find the model that

Experimental evaluation

A number of experiments have been carried out to evaluate the methodology with real acoustic data from original recordings, taken from the personal collection of the author. The test collection was partitioned in two subsets, related to Cases A and B described in Section 4:

  • Set A.

    A collection of 139 transcriptions of Balkan music (mainly Romanian and Bulgarian songs), where only melodic information was represented as it often happens for this kind of transcriptions. The first bars of the songs,

Conclusions

A HMM-based methodology for statistical modeling of audio recording is presented and applied to the automatic identification of ethnic music. Three approaches to compute the conditional probability of observing an audio recording given a statistical model of the corresponding music piece have been proposed and tested on two collections: a collection of transcriptions of Balkan music and a collection of recordings of Italian songs. In both cases identification has been carried out by modeling

References (40)

  • G. Leazer

    The effectiveness of keyword searching in the retrieval of musical works on sound recordings

    Cataloging and Classification Quarterly

    (1992)
  • N. Orio et al.

    Score following using spectral analysis and hidden Markov models

  • N. Orio et al.

    Alignment of monophonic and polyphonic music to a score

  • J. Downie

    Music information retrieval

    Annual Review of Information Science and Technology

    (2003)
  • J. Haitsma et al.

    A highly robust audio fingerprinting system with an efficient search strategy

    Journal of New Music Research

    (2003)
  • H. Özer et al.

    Perceptual audio hashing functions

    EURASIP Journal on Applied Signal Processing

    (2005)
  • P. Cano et al.

    A review of audio fingerprinting

    Journal of VLSI Signal Processing

    (2005)
  • Gracenote, Music search 〈http://www.gracenote.com/〉 (September...
  • Tunatic, Free music identification software 〈http://www.wildbits.com/tunatic/〉 (September...
  • L. Boney, A. Tewfik, K. Hamdy, Digital watermarks for audio signals, IEEE Proceedings Multimedia (1996)...
  • J.-S. Pan et al.

    Intelligent Watermarking Techniques

    (2004)
  • T. Fujishima

    Realtime chord recognition of musical sound: a system using common lisp music

  • M. Goto

    A chorus section detection method for musical audio signals and its application to a music listening station

    IEEE Transactions on Audio, Speech, and Language Processing

    (2006)
  • C. Harte et al.

    Detecting harmonic changes in musical audio

  • P. Herrera et al.

    Chroma binary similarity and local alignment applied to cover song identification

    IEEE Transactions on Audio, Speech, and Language Processing

    (2008)
  • N. Hu et al.

    Polyphonic audio matching and alignment for music retrieval

  • D. Ellis et al.

    Identifying cover songs with chroma features and dynamic programming beat tracking

  • R. Miotto et al.

    Automatic identification of music works through audio matching

  • M. Müller et al.

    Efficient index-based audio matching

    IEEE Transactions on Audio, Speech, and Language Processing

    (2008)
  • R. Miotto et al.

    A music identification system based on chroma indexing and statistical modeling

  • Cited by (6)

    • Listen by Looking: A framework to support the development of serious games for live music

      2021, Entertainment Computing
      Citation Excerpt :

      An overview of the different approaches to score following is presented in [38]. The work in this paper builds upon previous research on score following, applied to an identification task [39]. The idea behind the proposed approach is that a performance can be considered as the realization of a process that converts the representation a performer has about a music piece into a sequence of acoustic features.

    • Music genre classification using LBP textural features

      2012, Signal Processing
      Citation Excerpt :

      In 2011, the amount of digital information produced in the year should be equal nearly 1800 exabytes, or 10 times that produced in 2006 [1]. Among all the different sources of information, music certainly is the one that can benefit from this impressive growing since it can be shared by users with different background and education, easily crossing cultural and language barriers [2]. In general, indexing and retrieving music is based on meta information tags such as ID3 tags.

    • Comprehensive non-repudiate speech communication involving geo-tagged featuremark

      2015, Transactions on Engineering Technologies: World Congress on Engineering and Computer Science 2014
    • ClipBoard: Augmenting movie entertainment1

      2013, CEUR Workshop Proceedings
    • A framework for robust audio fingerprinting

      2010, Journal of Communications
    View full text