Automatic identification of audio recordings based on statistical modeling
Introduction
Digital collections, usually stored and organized in digital libraries, are becoming an increasingly important tool for the preservation and dissemination of music cultural heritage. Music can benefit from the application of digital technologies even more than other art forms, because it can be shared by users with different background and education, easily crossing language barriers. This is particularly true for ethnic music, which can be enjoyed by users with different culture, traditions, and living in different geographical areas. For a digital library of ethnic music, the goal “anytime–anywhere” is meant in its widest sense. Yet, the interest towards music digital collections should be paired by the availability of tools for an effective access to digital content. Textual descriptors such as title, author, and performers are all valuable metadata that can allow a user to retrieve potentially relevant music items. Moreover, additional information about time and key signatures, instrumentation, and geographical location, may provide alternative ways to access music content.
In the case of ethnic music, the access to music documents becomes more challenging, because typical metadata may not be good descriptors of music content. The most effective fields in a music search—author and title according to [1]—may not be good descriptors for this repertoire. To this end, tools for content-based search should be provided to the user, aiming at the exploration of the digital collection, for instance to retrieve alternative recordings of a given piece also when textual metadata are different or not available. Another important issue on the development of a music digital collection regards how music content is cataloged during the acquisition of the material. Classification and labeling should be manually carried out by trained personnel, with a considerable increment in time and costs. Also in this case, automatic tools that allow the identification of a recording as an instance of a given song can be used to automatically add pertinent metadata.
This paper reports a methodology for the automatic identification of unknown recordings. The approach is based on hidden Markov models (HMMs), which model the unobservable process underlying the production of a music performance given a statistical representation of a music piece. The presented approach builds on earlier work on audio to score alignment [2], [3], which has been extended to the particular task of exploring ethnic music digital collections. One of the goals of the proposed paper is the exploration on whether HMM-based identification can be carried out also when partial information—e.g., melodic content—is available to describe a song. To this end, some of the common characteristics of many repertoires of ethnic music make the proposed approach particularly suitable.
First of all, melody plays a fundamental role in most ethnic repertories, as opposed to functional harmony used in tonal Western music, yet the same song can be performed in a countless number of variants, which depend on the oral tradition by which it is transmitted, on the available instrumentation, on the geographical areas and thus on the cultures that have been influenced by it. Given the prominent role of oral tradition, for many songs the most common metadata such as author and title may not be significant—the author being unknown and the title being present in dozens of variants—making music retrieval a particularly difficult task. The fact that a given piece can be sung a cappella, played by a single instrument, played or sung with accompaniment, makes its identification challenging because only partial information about the acoustic features is available. All these considerations suggest the use of a statistical framework to model ethnic music recordings. The framework can be applied to identify unknown recordings, mine a music collection to highlight the presence of different recordings—for instance due to geographical variants—of the same song.
The paper is organized as follows. Section 2 introduces a number of approaches that are related to the presented methodology. The methodology for music modeling is described in Section 3, while three approaches to identification are presented in Section 4. The results of the experimental evaluation are given in Section 5, followed by some concluding considerations in Section 6.
Section snippets
Related work
This section describes some approaches reported in the literature related to the present work. It has been chosen to limit the discussion to research work focusing on music identification, thus not including the vast literature in music information retrieval, which has a focus on the computation of music similarity aimed at content-based retrieval. A complete overview of these approaches can be found in [4].
Statistical modeling of music performances
The idea underlying the proposed approach is that a performance can be considered as the realization of a process that converts the representation a performer has about a music piece into a sequence of acoustic features. The representation can be read from a music score, recalled from memory or improvised. The process is non-deterministic because different performances correspond to the same music piece, depending on a number of parameters which are only partially known. For instance, features
Music identification
Identification, or recognition, is probably the application of HMMs that is most often described in literature. The classical identification problem may be stated as follows:
- Case A:
Given an unknown audio recording, described by a sequence of audio features , where are the acoustic events described in Section 3.2, and a set of competing models , which have been generated from the representation of music pieces as described in Section 3.1, find the model that
Experimental evaluation
A number of experiments have been carried out to evaluate the methodology with real acoustic data from original recordings, taken from the personal collection of the author. The test collection was partitioned in two subsets, related to Cases A and B described in Section 4:
- Set A.
A collection of 139 transcriptions of Balkan music (mainly Romanian and Bulgarian songs), where only melodic information was represented as it often happens for this kind of transcriptions. The first bars of the songs,
Conclusions
A HMM-based methodology for statistical modeling of audio recording is presented and applied to the automatic identification of ethnic music. Three approaches to compute the conditional probability of observing an audio recording given a statistical model of the corresponding music piece have been proposed and tested on two collections: a collection of transcriptions of Balkan music and a collection of recordings of Italian songs. In both cases identification has been carried out by modeling
References (40)
The effectiveness of keyword searching in the retrieval of musical works on sound recordings
Cataloging and Classification Quarterly
(1992)- et al.
Score following using spectral analysis and hidden Markov models
- et al.
Alignment of monophonic and polyphonic music to a score
Music information retrieval
Annual Review of Information Science and Technology
(2003)- et al.
A highly robust audio fingerprinting system with an efficient search strategy
Journal of New Music Research
(2003) - et al.
Perceptual audio hashing functions
EURASIP Journal on Applied Signal Processing
(2005) - et al.
A review of audio fingerprinting
Journal of VLSI Signal Processing
(2005) - Gracenote, Music search 〈http://www.gracenote.com/〉 (September...
- Tunatic, Free music identification software 〈http://www.wildbits.com/tunatic/〉 (September...
- L. Boney, A. Tewfik, K. Hamdy, Digital watermarks for audio signals, IEEE Proceedings Multimedia (1996)...
Intelligent Watermarking Techniques
Realtime chord recognition of musical sound: a system using common lisp music
A chorus section detection method for musical audio signals and its application to a music listening station
IEEE Transactions on Audio, Speech, and Language Processing
Detecting harmonic changes in musical audio
Chroma binary similarity and local alignment applied to cover song identification
IEEE Transactions on Audio, Speech, and Language Processing
Polyphonic audio matching and alignment for music retrieval
Identifying cover songs with chroma features and dynamic programming beat tracking
Automatic identification of music works through audio matching
Efficient index-based audio matching
IEEE Transactions on Audio, Speech, and Language Processing
A music identification system based on chroma indexing and statistical modeling
Cited by (6)
Listen by Looking: A framework to support the development of serious games for live music
2021, Entertainment ComputingCitation Excerpt :An overview of the different approaches to score following is presented in [38]. The work in this paper builds upon previous research on score following, applied to an identification task [39]. The idea behind the proposed approach is that a performance can be considered as the realization of a process that converts the representation a performer has about a music piece into a sequence of acoustic features.
Music genre classification using LBP textural features
2012, Signal ProcessingCitation Excerpt :In 2011, the amount of digital information produced in the year should be equal nearly 1800 exabytes, or 10 times that produced in 2006 [1]. Among all the different sources of information, music certainly is the one that can benefit from this impressive growing since it can be shared by users with different background and education, easily crossing cultural and language barriers [2]. In general, indexing and retrieving music is based on meta information tags such as ID3 tags.
An Overview of Audio Event Detection Methods from Feature Extraction to Classification
2017, Applied Artificial IntelligenceComprehensive non-repudiate speech communication involving geo-tagged featuremark
2015, Transactions on Engineering Technologies: World Congress on Engineering and Computer Science 2014ClipBoard: Augmenting movie entertainment1
2013, CEUR Workshop ProceedingsA framework for robust audio fingerprinting
2010, Journal of Communications