Abstract
The idea that the gist of a visual scene is perceived before attention is focused on the details of a particular object is becoming increasingly popular. In the auditory system, on the other hand, it is typically assumed that the sensory signal is first broken down into streams and then attention is applied to select one of the streams. We consider evidence for an alternative: that, in close analogy with the visual system, the gist of an auditory scene is perceived and only afterwards attention is paid to relevant constituents. We find that much experimental evidence is consistent with such a proposal, and we suggest some possibilities for gist representations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Driver, J.: A selective review of selective attention research from the past century. Brit. J. Psychol. 92, 53–78 (2001)
Biederman, I.: Visual object recognition. In: Kosslyn, S.M., Osherson, D.N. (eds.) An Invitation to Cognitive Science: Visual Cognition, 2nd edn., vol. 2, pp. 121–165. MIT Press, Cambridge (1995)
Treisman, A.M., Gelade, G.: Feature-integration theory of attention. Cognitive Psychol. 12(1), 97–136 (1980)
Bregman, A.S.: Auditory scene analysis: The perceptual organization of sound. MIT Press, Cambridge, MA (1990)
Wolfe, J.M.: Visual memory: What do you know about what you saw? Curr. Biol. 8(9), 303–304 (1998)
Potter, M.C.: Short-term conceptual memory for pictures. J. Exp. Psychol. Hum. L. 2(5), 509–522 (1976)
Simons, D.J.: Current approaches to change blindness. Vis. Cogn. 7(1-3), 1–15 (2000)
Ramachandran, V.S.: Perception of shape from shading. Nature 331(6152), 163–166 (1988)
Enns, J.T., Rensink, R.A.: Influence of scene-based properties on visual-search. Science 247(4943), 721–723 (1990)
Liberman, A.M., Isenberg, D., Rakerd, B.: Duplex perception of cues for stop consonants: Evidence for a phonetic mode. Percept. Psychophys. 30, 133–143 (1981)
Jusczyk, P.W., Luce, P.A.: Speech perception and spoken word recognition: Past and present. Ear and Hearing 23(1), 2–40 (2002)
Slaney, M.: A critique of pure audition. In: Rosenthal, D., Okuno, H. (eds.) Proc. 1st Workshop CASA, IJCAI, Montreal, Canada, pp. 13–18 (1995)
Navon, D.: Forest before trees: The precedence of global features in visual perception. Cognitive Psychol. 9, 353–383 (1977)
Rensink, R.A.: The dynamic representation of scenes. Vis. Cogn. 7(1-3), 17–42 (2000)
Hochstein, S., Ahissar, M.: View from the top: Hierarchies and reverse hierarchies in the visual system. Neuron 36(5), 791–804 (2002)
Nelken, I., Ahissar, M.: High-level and low-level processing in the auditory system: The role of primary auditory cortex. In: Divenyi, P., Greenberg, S., Meyer, G. (eds.) Dynamics of speech production and perception. NATO Science Series, I: Life and Behavioural Sciences, vol. 374, pp. 343–353. IOS Press, Amsterdam (2006)
Cusack, R., Deeks, J., Aikman, G., Carlyon, R.P.: Effects of location, frequency region, and time course of selective attention on auditory scene analysis. J. Exp. Psychol. Hum. P. 30(4), 643–656 (2004)
Li, F.F., VanRullen, R., Koch, C., Perona, P.: Rapid natural scene categorization in the near absence of attention. P. Natl. Acad. Sci. USA 99(14), 9596–9601 (2002)
Oliva, A.: Gist of the scene. In: Itti, L., Rees, G., Tsotsos, J. (eds.) Neurobiology of Attention, pp. 251–256. Academic Press, Elsevier (2005)
Evans, K.K., Treisman, A.: Perception of objects in natural scenes: Is it really attention free? J. Exp. Psychol. Hum. P. 31(6), 1476–1492 (2005)
Schyns, P.G., Oliva, A.: From blobs to boundary edges: evidence for time-scale-dependent and spatial-scale-dependent scene recognition. Psychol. Sci. 5(4), 195–200 (1994)
Rousselet, G.A., Joubert, O.R., Fabre-Thorpe, M.: How long to get to the “gist” of real-world natural scenes? Vis. Cogn. 12(6), 852–877 (2005)
Bransford, J.D., Franks, J.J.: The abstraction of linguistic ideas: A review. Acta Acust. Acust. 1(2-3), 211–249 (1972)
Roediger, H.L., McDermott, K.B.: Creating false memories: remembering words not presented in lists. J. Exp. Psychol. Learn. 21(4), 803–814 (1995)
Koutstaal, W., Schacter, D.L.: Gist-based false recognition of pictures in older and younger adults. J. Mem. Lang. 37(4), 555–583 (1997)
Reyna, V.F., Brainerd, C.J.: Fuzzy-trace theory: An interim synthesis. Learn. Individ. Differ. 7(1), 1–75 (1995)
Crick, F., Koch, C.: A framework for consciousness. Nat. Neurosci. 6(2), 119–126 (2003)
Wolfe, J.M.: Inattentional amnesia. In: Coltheart, V. (ed.) Fleeting memories: Cognition of brief visual stimuli, MIT Press, Cambridge, MA (1999)
Kahneman, D., Treisman, A., Gibbs, B.J.: The reviewing of object files: object-specific integration of information. Cognitive Psychol. 24(2), 175–219 (1992)
Johnston, J.C., McLelland, J.L.: Perception of letters in words: Seek not and ye shall find. Science 184, 1192–1194 (1974)
Kimchi, R.: Primacy of wholistic processing and global/local paradigm: A critical review. Psychol. Bull. 112(1), 24–38 (1992)
Oliva, A., Schyns, P.G.: Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli. Cognitive Psychol. 34(1), 72–107 (1997)
Levin, D.T., Takarae, Y., Miner, A.G., Keil, F.: Efficient visual search by category: Specifying the features that mark the difference between artifacts and animals in preattentive vision. Percept. Psychophys. 63(4), 676–697 (2001)
Bar, M., Kassam, K.S., Ghuman, A.S., Boshyan, J., Schmidt, A.M., Dale, A.M., Hamalainen, M.S., Marinkovic, K., Schacter, D.L., Rosen, B.R., Halgren, E.: Top-down facilitation of visual recognition. P. Natl. Acad. Sci. USA 103(2), 449–454 (2006)
Rensink, R.A., O’Regan, J.K., Clark, J.J.: To see or not to see: The need for attention to perceive changes in scenes. Psychol. Sci. 8(5), 368–373 (1997)
Mack, A., Rock, I.: Inattentional blindness. MIT Press, Cambridge, MA (1998)
Palmer, S.E.: Vision science - photons to phenomenology. MIT Press, Cambridge MA (1999)
Browman, C.P., Goldstein, L.: Dynamics and articulatory phonology. In: van Gelder, T., Port, R.F. (eds.) Mind as Motion, MIT Press, Cambridge, MA (1995)
Moore, B.C.J.: An introduction to the psychology of hearing, 5th edn. Academic Press, London (2003)
Broadbent, D.E.: A note on binaural fusion. Q. J. Exp. Psychol. 7, 46–47 (1955)
Lindblom, B., Brownlee, S., Davis, B., Moon, S.J.: Speech transforms. Speech Commun. 11(4-5), 357–368 (1992)
Green, K.P., Tomiak, G.R., Kuhl, P.K.: The encoding of rate and talker information during phonetic perception. Percept. Psychophys. 59(5), 675–692 (1997)
Carlyon, R.P., Cusack, R., Foxton, J.M., Robertson, I.H.: Effects of attention and unilateral neglect on auditory stream segregation. J. Exp. Psychol. Hum. P. 27(1), 115–127 (2001)
Lippmann, R.P.: Speech recognition by machines and humans. Speech Commun. 22(1), 1–15 (1997)
Shannon, R.V., Zeng, F.-G., Kamath, V., Wygonski, J., Ekelid, M.: Speech recognition with primarily temporal cues. Science 270, 303–304 (1995)
Saberi, K., Perrott, D.R.: Cognitive restoration of reversed speech. Nature 398(6730), 760 (1999)
Warren, R.M.: Perceptual restoration of missing speech sounds. Science 167, 393–395 (1970)
Bailey, P.J., Dorman, M.F., Summerfield, A.Q.: Identification of sine-wave analogs of CV syllables in speech and non-speech modes. J. Acoust. Soc. Am. 61(S(A) (1977)
Lecumberri, M.L.G., Cooke, M.P.: Effect of masker type on native and non-native consonant perception in noise. J. Acoust. Soc. Am. 119, 2445–2454 (2006)
Robinson, K., Patterson, R.D.: The stimulus-duration required to identify vowels, their octave, and their pitch chroma. J. Acoust. Soc. Am. 98(4), 1858–1865 (1995)
Robinson, K., Patterson, R.D.: The duration required to identify the instrument, the octave, or the pitch chroma of a musical note. Music Perception 13(1), 1–15 (1995)
Moore, B.C.J., Gockel, H.: Factors influencing sequential stream segregation. Acta Acust. Acust. 88(3), 320–333 (2002)
Warren, R.M., Obusek, C.J., Farmer, R.M., Warren, R.P.: Auditory sequence: Confusion of patterns other than speech or music. Science 164(3879), 586 (1969)
Green, D.M.: Temporal acuity as a function of frequency. J. Acoust. Soc. Am. 54, 373–379 (1973)
Jacobsen, T., Schroger, E., Alter, K.: Pre-attentive perception of vowel phonemes from variable speech stimuli. Psychophysiology 41(4), 654–659 (2004)
Tervaniemi, M., Winkler, I., Naatanen, R.: Pre-attentive categorization of sounds by timbre as revealed by event-related potentials. Neuroreport 8(11), 2571–2574 (1997)
Murray, M.M., Camen, C., Andino, S.L.G., Bovet, P., Clarke, S.: Rapid brain discrimination of sounds of objects. J. Neurosci. 26(4), 1293–1302 (2006)
Alain, C., Reinke, K., He, Y., Wang, C.H., Lobaugh, N.: Hearing two things at once: Neurophysiological indices of speech segregation and identification. J. Cognitive Neurosci. 17(5), 811–818 (2005)
Alain, C., Izenberg, A.: Effects of attentional load on auditory scene analysis. J. Cognitive Neurosci. 15(7), 1063–1073 (2003)
Sussman, E.S.: Integration and segregation in auditory scene analysis. J. Acoust. Soc. Am. 117(3), 1285–1298 (2005)
Darwin, C.J.: Auditory grouping. Trends Cogn. Sci. 1(9), 327–333 (1997)
McKeown, J.D., Patterson, R.D.: The time-course of auditory segregation: Concurrent vowels that vary in duration. J. Acoust. Soc. Am. 98(4), 1866–1877 (1995)
Kewley-Port, D.: Vowel formant discrimination II: Effects of stimulus uncertainty, consonantal context, and training. J. Acoust. Soc. Am. 110(4), 2141–2155 (2001)
Lively, S.E., Pisoni, D.B., Yamada, R.A., Tohkura, Y., Yamada, T.: Training Japanese listeners to identify English /r/ and /l/. III. Long-term retention of new phonetic categories. J. Acoust. Soc. Am. 96(4), 2076–2087 (1994)
Cherry, E.C.: Some experiments on the recognition of speech with one and with two ears. J. Acoust. Soc. Am. 25, 975–979 (1953)
Brochard, R., Drake, C., Botte, M.C., McAdams, S.: Perceptual organization of complex auditory sequences: Effect of number of simultaneous subsequences and frequency separation. J. Exp. Psychol. Hum. P. 25(6), 1742–1759 (1999)
Vitevitch, M.S.: Change deafness: The inability to detect changes between two voices. J. Exp. Psychol. Hum. P. 29(2), 333–342 (2003)
Mackay, D.: Aspects of the theory of comprehension, memory and attention. Q. J. Exp. Psychol. 25, 22–40 (1973)
Banks, W.P., Roberts, D., Ciranni, M.: Negative priming in auditory attention. J. Exp. Psychol. Hum. P. 21(6), 1354–1361 (1995)
Rensink, R.A., Enns, J.T.: Preemption effects in visual search: Evidence for low-level grouping. Psychol. Rev. 102(1), 101–130 (1995)
Wolfe, J.M., Bennett, S.C.: Preattentive object files: Shapeless bundles of basic features. Vision Res. 37(1), 25–43 (1997)
Oliva, A., Torralba, A.: Building the gist of a scene: the role of global image features in recognition. In: Martinez-Conde, Macknik, Martinez, Alonso, Tze (eds.) Progress in Brain Research, vol. 155, pp. 23–36. Elsevier, Amsterdam (2006)
Siagian, C., Itti, L.: Rapid biologically-inspired scene classification using features shared with visual attention. IEEE T. Patt. Anal. Mach. Intell. 29(2), 300–312 (2007)
Cooke, M.: A glimpsing model of speech perception in noise. J. Acoust. Soc. Am. 119(3), 1562–1573 (2006)
Harding, S.M.: Multi-resolution auditory scene analysis for speech perception: experimental evidence and a model. PhD thesis, Keele University (2003)
Mesgarani, N., Slaney, M., Shamma, S.A.: Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations. IEEE T. Audi. Speech. Lang. P. 14(3), 920–930 (2006)
Itti, L., Koch, C.: Computational modelling of visual attention. Nat. Rev. Neurosci. 2(3), 194–203 (2001)
Torralba, A.: Modeling global scene factors in attention. J. Opt. Soc. Am. A 20(7), 1407–1418 (2003)
Kayser, C., Petkov, C.I., Lippert, M., Logothetis, N.K.: Mechanisms for allocating auditory attention: An auditory saliency map. Curr. Biol. 15(21), 1943–1947 (2005)
Cusack, R., Carlyon, R.P.: Perceptual asymmetries in audition. J. Exp. Psychol. Hum. P. 29(3), 713–725 (2003)
Fecteau, J.H., Munoz, D.P.: Salience, relevance, and firing: a priority map for target selection. Trends Cogn. Sci. 10(8), 382–390 (2006)
Laidler, J., Cooke, M., Lawrence, N.: Model-driven detection of clean speech patches in noise. In: Proc. Interspeech, Antwerp (2007)
Cooke, M.: Auditory organisation and speech perception: Arguments for an integrated computational theory. In: Ainsworth, W., Greenberg, S. (eds.) Proc. ESCA Workshop Aud. Basis Speech Perc., Keele, Worth Printing Ltd, pp. 186–193 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Harding, S., Cooke, M., König, P. (2007). Auditory Gist Perception: An Alternative to Attentional Selection of Auditory Streams?. In: Paletta, L., Rome, E. (eds) Attention in Cognitive Systems. Theories and Systems from an Interdisciplinary Viewpoint. WAPCV 2007. Lecture Notes in Computer Science(), vol 4840. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77343-6_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-77343-6_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77342-9
Online ISBN: 978-3-540-77343-6
eBook Packages: Computer ScienceComputer Science (R0)