Abstract
An important question for both signal processing and auditory science is to understand which features of a sound carry the most important information for the listener. Here we approach the issue by introducing the idea of “auditory sketches”: sparse representations of sounds, severely impoverished compared to the original, which nevertheless afford good performance on a given perceptual task. Starting from biologically-grounded representations (auditory models), a sketch is obtained by reconstructing a highly under-sampled selection of elementary atoms. Then, the sketch is evaluated with a psychophysical experiment involving human listeners. The process can be repeated iteratively. As a proof of concept, we present data for an emotion recognition task with short non-verbal sounds. We investigate 1/ the type of auditory representation that can be used for sketches 2/ the selection procedure to sparsify such representations 3/ the smallest number of atoms that can be kept 4/ the robustness to noise. Results indicate that it is possible to produce recognizable sketches with a very small number of atoms per second. Furthermore, at least in our experimental setup, a simple and fast under-sampling method based on selecting local maxima of the representation seems to perform as well or better than a more traditional algorithm aimed at minimizing the reconstruction error. Thus, auditory sketches may be a useful tool for choosing sparse dictionaries, and also for identifying the minimal set of features required in a specific perceptual task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Mallat, S.: A Wavelet Tour of Signal Processing - The Sparse Way, 3rd edn. Academic Press (December 2008)
Gabor, D.: Acoustical quanta and the theory of hearing. Nature 159, 591–594 (1947)
Plumbley, M., Blumensath, T., Daudet, L., Gribonval, R., Davies, M.: Sparse representations in audio and music: From coding to source separation. Proceedings of IEEE 98(6), 995–1005 (2010)
Elad, M.: Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer (2010)
Aharon, M., Elad, M., Bruckstein, A.: K-svd: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. on Signal Processing 54(11), 4311–4322 (2006)
Shannon, R., Zeng, F., Kamath, V., Wygonski, J., Ekelid, M.: Speech recognition with primarily temporal cues. Science 270(5234), 303–304 (1995)
Patterson, R., Allerhand, M., Giguére, C.: Time-domain modeling of peripheral auditory processing: A modular architecture and a software platform. Journal of the Acoustical Society of America 98(4), 1890–1894 (1995)
Chi, T., Ru, P., Shamma, S.A.: Multiresolution spectrotemporal analysis of complex sounds. Journal of the Acoustical Society of America 118(2), 887–906 (2005)
Patil, K., Pressnitzer, D., Shamma, S., Elhilali, M.: Music in our ears: the biological bases of musical timbre perception. PLoS Comp. Biol. 8(11), e1002759 (2012)
Portilla, J.: Image restoration through l0 analysis-based sparse optimization in tight frames. In: Proc. IEEE Int’l Conference on Image Processing (ICIP), pp. 3865–3868 (2009)
Belin, P., Fillion-Bilosdeau, S., Gosselin, F.: The montreal affective voices: A validated set of nonverbal affect bursts for research on auditory affective processing. Behavior Research Methods 40(2), 531–539 (2008)
Elhiliali, M., Chi, T., Shamma, S.A.: A spectro-temporal modulatio index (stmi) for assessment of speech intelligibility. Speech Communication 41(2-3), 331–348 (2003)
Griffin, D., Lim, J.: Signal reconstruction from short-time fourier transform magnitude. IEEE Trans. Acoust., Speech, and Signal Proc. 32(2), 236–243 (1984)
Sturmel, N., Daudet, L.: Signal reconstruction from its STFT magnitude: a state of the art. In: Proc. International Conference on Digital Audio Effects, DAFx 2011 (2011)
Yang, X., Wang, K., Shamma, S.A.: Auditory representations of acoustic signals. IEEE Trans. on Information Theory 38(2), 824–839 (1992)
Drémeau, A., Herzet, C., Daudet, L.: Boltzmann machine and mean-field approximation for structured sparse decompositions. IEEE Trans. on Signal Processing 60(7), 3425–3438 (2012)
Elad, M., Milanfar, P., Rubinstein, R.: Analysis versus synthesis in signal priors. Inverse problems 23(3), 947–968 (2007)
Blumensath, T., Davies, M.E.: Iterative thresholding for sparse approximations. Journal of Fourier Analysis and Applications 14(5-6), 629–654 (2008)
Hoogenboom, R., Lew, M.: Face detection using local maxima. In: Proc. Int’l Conference on Automatic Face and Gesture Recognition, 334–339 (1996)
Schwartzman, A., Gavrilov, Y., Adler, R.J.: Multiple testing of local maxima for detection of peaks in 1d. Annals of Statistics 39(6), 3290–3319 (2011)
Chambolle, A.: An algorithm for total variation minimization and application. Journal of Mathematical Imaging and Vision 20(1-2), 89–97 (2004)
Peyré, G., Fadili, J.: Learning analysis sparsity priors. In: Int’l Conference on Sampling Theory and Applications, SAMPTA (2011)
Nam, S., Davies, M., Elad, M., Gribonval, R.: Cosparse analysis modeling - uniqueness and algorithms. In: Proc. IEEE Int’l Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5804–5807 (2011)
Balazs, P., Laback, B., Eckel, G., Deutsch, W.: Time-frequency sparsity by removing perceptually irrelevant components using a simple model of simultaneous masking. IEEE Transactions on Audio, Speech and Language Processing 18(1), 34–39 (2010)
Mesgarani, N., Shamma, S.A.: Speech enhancement using spectro-temporal modulations. EURASIP Journal on Audio, Speech, and Music Processing V, ID 42357 (2007)
Agus, T.A., Suied, C., Thorpe, S.J., Pressnitzer, D.: Fast recognition of musical sounds based on timbre. Journal of the Acoustical Society of America 131(5), 4124–4133 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Suied, C., Drémeau, A., Pressnitzer, D., Daudet, L. (2013). Auditory Sketches: Sparse Representations of Sounds Based on Perceptual Models. In: Aramaki, M., Barthet, M., Kronland-Martinet, R., Ystad, S. (eds) From Sounds to Music and Emotions. CMMR 2012. Lecture Notes in Computer Science, vol 7900. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41248-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-41248-6_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41247-9
Online ISBN: 978-3-642-41248-6
eBook Packages: Computer ScienceComputer Science (R0)