skip to main content
10.1145/2390848.2390862acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Novelty measures as cues for temporal salience in audio similarity

Published: 02 November 2012 Publication History

Abstract

Most algorithms for estimating audio similarity either completely disregard time or they treat each moment in time equally. However, many studies over the years have noted several factors that affect how much attention we give to certain sounds or parts of sounds (e.g. loudness, the attack, novelty). These findings suggest that some time segments of audio may be more salient than others when making similarity judgments. We believe that if we could estimate this information, we could improve audio similarity measures. This paper presents the results of a human subject study designed to test the hypothesis that sounds segments with high timbral change are more salient than segments with low timbral change. We then investigate whether we can use this information to improve two audio similarity measures: a "bag-of-frames" approach and a dynamic time warping approach.

References

[1]
J.-J. Aucouturier and F. Pachet. Improving timbre similarity: How high's the sky? Journal of Negative Results in Speech and Audio Sciences, 1(1), 2004.
[2]
J. C. Brown. Calculation of a constant q spectral transform. The Journal of the Acoustical Society of America, 89(1):425--434, 1991.
[3]
M. Caetano, J. Burred, and X. Rodet. Automatic segmentation of the temporal evolution of isolated acoustic musical instrument sounds using spectro-temporal cues. In Proc. of 13th International Conference on Digital Audio Effects (DAFx-10), Graz, Austria, 2010.
[4]
D. Ellis. Spectrograms: Constant-q (log-frequency) and conventional (linear). http://labrosa.ee.columbia.edu/matlab/sgram, 2004.
[5]
A. Eronen. Comparison of features for musical instrument recognition. In Proc. of IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics, pages 19--22, 2001.
[6]
C. Escera, K. Alho, I. Winkler, and R. Näätänen. Neural mechanisms of involuntary attention to acoustic novelty and change. Journal of cognitive neuroscience, 10(5):590--604, 1998.
[7]
S. Essid, P. Leaveau, G. Richard, L. Daudet, and B. David. On the usefulness of differentiated transient/steady-state processing in machine recognition of musical instruments. In Proc. of 118th AES Convention, Barcelona, Spain, 2005.
[8]
J. Foote. Content-based retrieval of music and audio. In Multimedia Storage and Archiving Systems II, Proc. of SPIE, volume 3229, pages 138--147, 1997.
[9]
J. Foote. Automatic audio segmentation using a measure of audio novelty. In Proc. of IEEE International Conference on Multimedia and Expo, volume 1, pages 452--455 vol. 1, 2000.
[10]
B. Glasberg and B. Moore. A model of loudness applicable to time-varying sounds. Journal of the Audio Engineering Society, 50(5):331--342, 2002.
[11]
N. Goyal, Y. Lifshits, and H. Schütze. Disorder inequality: a combinatorial approach to nearest neighbor search. In Proc. of the International Conference on Web search and Web Data Mining, pages 25--32, Palo Alto, California, USA, 2008. ACM.
[12]
J. M. Grey. Multidimensional perceptual scaling of musical timbres. The Journal of the Acoustical Society of America, 61(5):1270--1277, 1977.
[13]
J. Hajda. A new model for segmenting the envelope of musical signals: The relative salience of steady state versus attack, revisited. In Proc. of the 101st AES Convention, 11 1996.
[14]
D. Huron. An Ear for Music. http://musicog.ohio-state.edu/Music838/course.notes/ear.toc.html, 2002.
[15]
P. Iverson and C. L. Krumhansl. Isolating the dynamic attributes of musical timbre. The Journal of the Acoustical Society of America, 94(5):2595--2603, 1993.
[16]
T. Jehan. Creating music by listening. PhD thesis, 2005.
[17]
K. Jensen. Envelope model of isolated musical sounds. In Proc. of the Workshop on Digital Audio Effects (DAFx99), Trondheim, Norway, 1999.
[18]
A. Karbasi, S. Ioannidis, and L. Massoulié. Content search through comparisons automata, languages and programming. volume 6756 of Lecture Notes in Computer Science, pages 601--612. Springer Berlin / Heidelberg, 2011.
[19]
J. Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory, 37(1):145--151, 1991.
[20]
R. Lyon, M. Rehn, S. Bengio, T. Walters, and G. Chechik. Sound retrieval and ranking using sparse auditory representations. Neural Computation, 22(9):2390--2416, 2010.
[21]
B. Moore, B. Glasberg, and T. Baer. A model for the prediction of thresholds, loudness, and partial loudness. Journal of the Audio Engineering Society, 45(4):224--240, 1997.
[22]
G. Peeters. A large set of audio features for sound description (similarity and classification) in the cuidado project. Technical report, IRCAM, 2003.
[23]
H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1):43--49, 1978.
[24]
J. O. Smith. Spectral Audio Signal Processing. http://ccrma.stanford.edu/~jos/sasp/, 2012.
[25]
J. Thayer, Ralph C. The effect of the attack transient on aural recognition of instrumental timbres. Psychology of Music, 2(1):39--52, 1974.
[26]
F. Vachon, R. Hughes, and D. Jones. Broken expectations: Violation of expectancies, not novelty, captures auditory attention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 2011.
[27]
D. L. Wessel. Timbre space as a musical control structure. Computer Music Journal, 3(2):45--52, 1979.
[28]
U. Zölzer and X. Amatriain. DAFX : digital audio effects. Wiley, Chichester; New York, N.Y., 2002.

Cited By

View all
  • (2015)An Approach for Mining Similarity Profiled Temporal Association Patterns Using Gaussian Based Dissimilarity MeasureProceedings of the The International Conference on Engineering & MIS 201510.1145/2832987.2833069(1-6)Online publication date: 24-Sep-2015
  • (2015)A Survey on Temporal Databases and Data miningProceedings of the The International Conference on Engineering & MIS 201510.1145/2832987.2833064(1-6)Online publication date: 24-Sep-2015
  • (2012)2nd international ACM workshop on music information retrieval with user-centered and multimodal strategies (MIRUM)Proceedings of the 20th ACM international conference on Multimedia10.1145/2393347.2396541(1509-1510)Online publication date: 29-Oct-2012

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MIRUM '12: Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies
November 2012
82 pages
ISBN:9781450315913
DOI:10.1145/2390848
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. audio novelty measures
  2. audio similarity
  3. audio temporal salience
  4. query-by-example

Qualifiers

  • Research-article

Conference

MM '12
Sponsor:
MM '12: ACM Multimedia Conference
November 2, 2012
Nara, Japan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2015)An Approach for Mining Similarity Profiled Temporal Association Patterns Using Gaussian Based Dissimilarity MeasureProceedings of the The International Conference on Engineering & MIS 201510.1145/2832987.2833069(1-6)Online publication date: 24-Sep-2015
  • (2015)A Survey on Temporal Databases and Data miningProceedings of the The International Conference on Engineering & MIS 201510.1145/2832987.2833064(1-6)Online publication date: 24-Sep-2015
  • (2012)2nd international ACM workshop on music information retrieval with user-centered and multimodal strategies (MIRUM)Proceedings of the 20th ACM international conference on Multimedia10.1145/2393347.2396541(1509-1510)Online publication date: 29-Oct-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media