Auditory Sketches: Sparse Representations of Sounds Based on Perceptual Models

Suied, Clara; Drémeau, Angélique; Pressnitzer, Daniel; Daudet, Laurent

doi:10.1007/978-3-642-41248-6_9

Clara Suied^18,20,
Angélique Drémeau^19,21,
Daniel Pressnitzer¹⁸ &
…
Laurent Daudet¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7900))

Included in the following conference series:

International Symposium on Computer Music Modeling and Retrieval

3531 Accesses
2 Citations

Abstract

An important question for both signal processing and auditory science is to understand which features of a sound carry the most important information for the listener. Here we approach the issue by introducing the idea of “auditory sketches”: sparse representations of sounds, severely impoverished compared to the original, which nevertheless afford good performance on a given perceptual task. Starting from biologically-grounded representations (auditory models), a sketch is obtained by reconstructing a highly under-sampled selection of elementary atoms. Then, the sketch is evaluated with a psychophysical experiment involving human listeners. The process can be repeated iteratively. As a proof of concept, we present data for an emotion recognition task with short non-verbal sounds. We investigate 1/ the type of auditory representation that can be used for sketches 2/ the selection procedure to sparsify such representations 3/ the smallest number of atoms that can be kept 4/ the robustness to noise. Results indicate that it is possible to produce recognizable sketches with a very small number of atoms per second. Furthermore, at least in our experimental setup, a simple and fast under-sampling method based on selecting local maxima of the representation seems to perform as well or better than a more traditional algorithm aimed at minimizing the reconstruction error. Thus, auditory sketches may be a useful tool for choosing sparse dictionaries, and also for identifying the minimal set of features required in a specific perceptual task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Mallat, S.: A Wavelet Tour of Signal Processing - The Sparse Way, 3rd edn. Academic Press (December 2008)
Google Scholar
Gabor, D.: Acoustical quanta and the theory of hearing. Nature 159, 591–594 (1947)
Article Google Scholar
Plumbley, M., Blumensath, T., Daudet, L., Gribonval, R., Davies, M.: Sparse representations in audio and music: From coding to source separation. Proceedings of IEEE 98(6), 995–1005 (2010)
Article Google Scholar
Elad, M.: Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer (2010)
Google Scholar
Aharon, M., Elad, M., Bruckstein, A.: K-svd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. on Signal Processing 54(11), 4311–4322 (2006)
Article Google Scholar
Shannon, R., Zeng, F., Kamath, V., Wygonski, J., Ekelid, M.: Speech recognition with primarily temporal cues. Science 270(5234), 303–304 (1995)
Article Google Scholar
Patterson, R., Allerhand, M., Giguére, C.: Time-domain modeling of peripheral auditory processing: A modular architecture and a software platform. Journal of the Acoustical Society of America 98(4), 1890–1894 (1995)
Article Google Scholar
Chi, T., Ru, P., Shamma, S.A.: Multiresolution spectrotemporal analysis of complex sounds. Journal of the Acoustical Society of America 118(2), 887–906 (2005)
Article Google Scholar
Patil, K., Pressnitzer, D., Shamma, S., Elhilali, M.: Music in our ears: the biological bases of musical timbre perception. PLoS Comp. Biol. 8(11), e1002759 (2012)
Google Scholar
Portilla, J.: Image restoration through l0 analysis-based sparse optimization in tight frames. In: Proc. IEEE Int’l Conference on Image Processing (ICIP), pp. 3865–3868 (2009)
Google Scholar
Belin, P., Fillion-Bilosdeau, S., Gosselin, F.: The montreal affective voices: A validated set of nonverbal affect bursts for research on auditory affective processing. Behavior Research Methods 40(2), 531–539 (2008)
Article Google Scholar
Elhiliali, M., Chi, T., Shamma, S.A.: A spectro-temporal modulatio index (stmi) for assessment of speech intelligibility. Speech Communication 41(2-3), 331–348 (2003)
Article Google Scholar
Griffin, D., Lim, J.: Signal reconstruction from short-time fourier transform magnitude. IEEE Trans. Acoust., Speech, and Signal Proc. 32(2), 236–243 (1984)
Article Google Scholar
Sturmel, N., Daudet, L.: Signal reconstruction from its STFT magnitude: a state of the art. In: Proc. International Conference on Digital Audio Effects, DAFx 2011 (2011)
Google Scholar
Yang, X., Wang, K., Shamma, S.A.: Auditory representations of acoustic signals. IEEE Trans. on Information Theory 38(2), 824–839 (1992)
Article Google Scholar
Drémeau, A., Herzet, C., Daudet, L.: Boltzmann machine and mean-field approximation for structured sparse decompositions. IEEE Trans. on Signal Processing 60(7), 3425–3438 (2012)
Article Google Scholar
Elad, M., Milanfar, P., Rubinstein, R.: Analysis versus synthesis in signal priors. Inverse problems 23(3), 947–968 (2007)
Article MathSciNet MATH Google Scholar
Blumensath, T., Davies, M.E.: Iterative thresholding for sparse approximations. Journal of Fourier Analysis and Applications 14(5-6), 629–654 (2008)
Article MathSciNet MATH Google Scholar
Hoogenboom, R., Lew, M.: Face detection using local maxima. In: Proc. Int’l Conference on Automatic Face and Gesture Recognition, 334–339 (1996)
Google Scholar
Schwartzman, A., Gavrilov, Y., Adler, R.J.: Multiple testing of local maxima for detection of peaks in 1d. Annals of Statistics 39(6), 3290–3319 (2011)
Article MathSciNet MATH Google Scholar
Chambolle, A.: An algorithm for total variation minimization and application. Journal of Mathematical Imaging and Vision 20(1-2), 89–97 (2004)
MathSciNet Google Scholar
Peyré, G., Fadili, J.: Learning analysis sparsity priors. In: Int’l Conference on Sampling Theory and Applications, SAMPTA (2011)
Google Scholar
Nam, S., Davies, M., Elad, M., Gribonval, R.: Cosparse analysis modeling - uniqueness and algorithms. In: Proc. IEEE Int’l Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5804–5807 (2011)
Google Scholar
Balazs, P., Laback, B., Eckel, G., Deutsch, W.: Time-frequency sparsity by removing perceptually irrelevant components using a simple model of simultaneous masking. IEEE Transactions on Audio, Speech and Language Processing 18(1), 34–39 (2010)
Article Google Scholar
Mesgarani, N., Shamma, S.A.: Speech enhancement using spectro-temporal modulations. EURASIP Journal on Audio, Speech, and Music Processing V, ID 42357 (2007)
Google Scholar
Agus, T.A., Suied, C., Thorpe, S.J., Pressnitzer, D.: Fast recognition of musical sounds based on timbre. Journal of the Acoustical Society of America 131(5), 4124–4133 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire Psychologie de la Perception, CNRS - Université Paris Descartes & Ecole Normale Supérieure, 29 rue d’Ulm, 75230, Paris Cedex 5, France
Clara Suied & Daniel Pressnitzer
Institut Langevin, ESPCI ParisTech and CNRS UMR 7587, Université Paris Diderot, 1 rue Jussieu, 75005, Paris, France
Angélique Drémeau & Laurent Daudet
Département Action et Cognition en Situation Opérationnelle, Institut de Recherche Biomédicale des Armées (IRBA), 91223, Brétigny sur Orge, France
Clara Suied
Institut Mines-Telecom - Telecom ParisTech - CNRS/LTCI UMR 5141, 75014, Paris, France
Angélique Drémeau

Authors

Clara Suied
View author publications
You can also search for this author in PubMed Google Scholar
Angélique Drémeau
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Pressnitzer
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Daudet
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CNRS - LMA, 31 Chemin Joseph Aiguier, 13402, Marseille Cedex 20, France
Mitsuko Aramaki , Richard Kronland-Martinet & Sølvi Ystad , &
Centre for Digital Music, Queen Mary University of London, Mile End Road, E1 4NS, London, UK
Mathieu Barthet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Suied, C., Drémeau, A., Pressnitzer, D., Daudet, L. (2013). Auditory Sketches: Sparse Representations of Sounds Based on Perceptual Models. In: Aramaki, M., Barthet, M., Kronland-Martinet, R., Ystad, S. (eds) From Sounds to Music and Emotions. CMMR 2012. Lecture Notes in Computer Science, vol 7900. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41248-6_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-41248-6_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41247-9
Online ISBN: 978-3-642-41248-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics