Summary
A remarkable achievement of the perceptual system is its scene analysis capability, which involves two basic perceptual processes: the segmentation of a scene into a set of coherent patterns (objects) and the recognition of memorized ones. Although the perceptual system performs scene analysis with apparent ease, computational scene analysis remains a tremendous challenge as foreseen by Frank Rosenblatt. This chapter discusses scene analysis in the field of computational intelligence, particularly visual and auditory scene analysis. The chapter first addresses the question of the goal of computational scene analysis. A main reason why scene analysis is difficult in computational intelligence is the binding problem, which refers to how a collection of features comprising an object in a scene is represented in a neural network. In this context, temporal correlation theory is introduced as a biologically plausible representation for addressing the binding problem. The LEGION network lays a computational foundation for oscillatory correlation, which is a special form of temporal correlation. Recent results on visual and auditory scene analysis are described in the oscillatory correlation framework, with emphasis on real-world scenes. Also discussed are the issues of attention, feature-based versus model-based analysis, and representation versus learning. Finally, the chapter points out that the time dimension and David Marr's framework for understanding perception are essential for computational scene analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Allen JB (2005) Articulation and intelligibility. Morgan & Claypool
Arbib MA ed (2003) Handbook of brain theory and neural networks. 2nd ed, MIT Press, Cambridge MA
Barlow HB (1972) Single units and cognition: A neurone doctrine for perceptual psychology. Percept 1:371-394
Biederman I (1987) Recognition-by-component: A theory of human image understanding. Psychol Rev 94:115-147
Black MJ, Anandan P (1996) The robust estimation of multiple motions: parametric and piecewise-smooth flow fields. CVGIP: Image Understand-ing 63:75-104
Bregman AS (1990) Auditory scene analysis. MIT Press, Cambridge MA
Campbell SR, Wang DL, Jayaprakash C (1999) Synchrony and desynchrony in integrate-and-fire oscillators. Neural Comp 11:1595-1619
Cesmeli E, Wang DL (2000) Motion segmentation based on motion/ brightness integration and oscillatory correlation. IEEE Trans Neural Net 11:935-947
Chang P (2004) Exploration of behavioral, physiological, and computa-tional approaches to auditory scene analysis. MS Thesis, The Ohio State University Department of Computer Science and Engineering (available at http://www.cse.ohiostate.edu/pnl/theses.html)
Chen K, Wang DL, Liu X (2000) Weight adaptation and oscillatory correlation for image segmentation. IEEE Trans Neural Net 11:1106-1123
Cherry EC (1953) Some experiments on the recognition of speech, with one and with two ears. J Acoust Soc Am 25:975-979
Cowan N (2001) The magic number 4 in short-term memory: a reconsideration of mental storage capacity. Behav Brain Sci 24:87-185
Darwin CJ (1997) Auditory grouping. Trends Cogn Sci 1:327-333
Domijan D (2004) Recurrent network with large representational capacity. Neural Comp 16:1917-1942
Driver J, Baylis GC (1998) Attention and visual object recognition. In: Parasuraman R (ed) The attentive brain. MIT Press Cambridge MA, pp. 299-326
Duncan J, Humphreys GW (1989) Visual search and stimulus similarity. Psychol Rev, 96:433-458
Fabre-Thorpe M, Delorme A, Marlot C, Thorpe S (2001) A limit to the speed of processing in ultra-rapid visual categorization of novel natural scenes. J Cog Neurosci 13:1-10
Field DJ, Hayes A, Hess RF (1993) Contour integration by the human visual system: Evidence for a local “association field”. Vis Res 33:173-193
FitzHugh R (1961) Impulses and physiological states in models of nerve membrane. Biophys J 1:445-466
Fukushima K, Imagawa T (1993) Recognition and segmentation of connected characters with selective attention. Neural Net 6:33-41
Gibson JJ (1966) The senses considered as perceptual systems. Greenwood Press, Westport CT
Gold B, Morgan N (2000) Speech and audio signal processing. Wiley & Sons, New York
Gray CM (1999) The temporal correlation hypothesis of visual feature integration: still alive and well. Neuron 24:31-47
Kahneman D, Treisman A, Gibbs B (1992) The reviewing of object files: object-specific integration of information. Cognit Psychol 24:175-219
Kareev Y (1995) Through a narrow window: Working memory capacity and the detection of covariation. Cognition 56:263-269
Knill DC, Richards W eds (1996) Perception as Bayesian inference. Cambridge University Press, New York
Koffka K (1935) Principles of Gestalt psychology. Harcourt, New York
Konen W, von der Malsburg C (1993) Learning to generalize from single examples in the dynamic link architecture. Neural Comp 5:719-735
MacGregor JN (1987) Short-term memory capacity: Limitation or opti-mization? Psychol Rev 94:107-108
Marr D (1982) Vision. Freeman, New York
Mattingley JB, Davis G, Driver J (1997) Preattentive filling-in of visual surfaces in parietal extinction. Science 275:671-674
Milner, PM (1974) A model for visual shape recognition. Psychol Rev 81(6):521-535
Minsky ML, Papert SA (1969) Perceptrons. MIT Press, Cambridge MA
Minsky ML, Papert SA (1988) Perceptrons (Expanded ed). MIT Press, Cambridge MA
Morris C, Lecar H (1981) Voltage oscillations in the barnacle giant muscle fiber. Biophys J 35:193-213
Nagumo J, Arimoto S, Yoshizawa S (1962) An active pulse transmission line simulating nerve axon. Proc IRE 50:2061-2070
Nakayama K, He ZJ, Shimojo S (1995) Visual surface representation: A critical link between lower-level and higher-level vision. In: Kosslyn SM, Osherson DN (eds) An invitation to cognitive science. MIT Press, Cambridge MA, pp. 1-70
Norris M (2003) Assessment and extension of Wang's oscillatory model of auditory stream segregation. PhD Dissertation, University of Queensland School of Information Technology and Electrical Engineering
Olshausen BA, Anderson CH, Van Essen DC (1993) A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J Neurosci 13:4700-4719
Palmer SE (1999) Vision science. MIT Press, Cambridge MA
Parasuraman R ed (1998) The attentive brain. MIT Press, Cambridge MA
Pashler HE (1998) The psychology of attention. MIT Press, Cambridge MA
Reynolds JH, Desimone R (1999) The role of neural mechanisms of attention in solving the binding problem. Neuron 24:19-29
Riesenhuber M, Poggio T (1999) Are cortical models really bound by the “binding problem”? Neuron 24:87-93
Roman N, Wang DL, Brown GJ (2003) Speech segregation based on sound localization. J Acoust Soc Am 114:2236-2252
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65:386-408
Rosenblatt F (1962) Principles of neural dynamics. Spartan, New York
Rumelhart DE, McClelland JL eds (1986) Parallel distributed processing 1: Foundations. MIT Press, Cambridge MA
Russell S, Norvig P (2003) Artificial intelligence: A modern approach. 2nd ed Prentice Hall, Upper Saddle River, NJ
Shadlen MN, Movshon JA (1999) Synchrony unbound: a critical evaluation of the temporal binding hypothesis. Neuron 24:67-77.
Somers D, Kopell N (1993) Rapid synchrony through fast threshold modulation. Biol Cybern, 68:393-407
Terman D, Wang DL (1995) Global competition and local cooperation in a network of neural oscillators, Physica D 81:148-176
Thorpe S, Fabre-Thorpe M (2003) Fast visual processing. In: Arbib MA (ed) Handbook of Brain Theory and Neural Networks. MIT Press, Cambridge MA, pp. 441-444
Thorpe S, Fize D, Marlot C (1996) Speed of processing in the human visual system. Nature 381:520-522
Treisman A (1986) Features and objects in visual processing. Sci Am, November, Reprinted in The perceptual world, Rock I (ed). Freeman and Company, New York, pp. 97-110
Treisman A (1999) Solutions to the binding problem: progress through controversy and convergence. Neuron 24:105-110
Treisman A, Gelade G (1980) A feature-integration theory of attention. Cognit Psychol 12:97-136
van der Pol B (1926) On “relaxation oscillations”. Phil Mag 2(11):978-992
von der Malsburg C (1981) The correlation theory of brain function. Internal Report 81-2, Max-Planck-Institute for Biophysical Chemistry, Reprinted in Models of neural networks II, Domany E, van Hemmen JL, Schulten K, eds (1994) Springer, Berlin
von der Malsburg C (1999) The what and why of binding: the modeler's perspective. Neuron 24:95-104
von der Malsburg C, Schneider W (1986) A neural cocktail-party processor. Biol Cybern 54:29-40
Wang DL (1995) Emergent synchrony in locally coupled neural oscillators. IEEE Trans Neural Net 6(4):941-948
Wang DL (1996) Primitive auditory segregation based on oscillatory correlation. Cognit Sci 20:409-456
Wang DL (2000) On connectedness: a solution based on oscillatory correlation. Neural Comp 12:131-139
Wang DL (2005) The time dimension for scene analysis. IEEE Trans Neural Net 16:1401-1426
Wang DL, Brown GJ (1999) Separation of speech from interfering sounds based on oscillatory correlation. IEEE Trans Neural Net 10:684-697
Wang DL, Kristjansson A, Nakayama K (2005) Efficient visual search without top-down or bottom-up guidance. Percept Psychophys 67:239-253
Wang DL, Terman D (1995) Locally excitatory globally inhibitory oscillator networks. IEEE Trans Neural Net 6(1):283-286
Wang DL, Terman D (1997) Image segmentation based on oscillatory correlation. Neural Comp 9:805-836 (for errata see Neural Comp 9:1623-1626)
Wersing H, Steil JJ, Ritter H (2001) A competitive-layer model for feature binding and sensory segmentation. Neural Comp 13:357-388
Wertheimer M (1923) Untersuchungen zur Lehre von der Gestalt, II. Psychol Forsch 4:301-350
Wrigley SN, Brown GJ (2004) A computational model of auditory selective attention. IEEE Trans Neural Net 15:1151-1163
Yantis S (1998) Control of visual attention. In: Pashler H (ed) Attention.Psychology Press, London, pp. 223-256
Yen SC, Finkel LH (1998) Extraction of perceptually salient contours by striate cortical networks. Vis Res 38:719-741
Zhang X, Minai AA (2004) Temporally sequenced intelligent blockmatching and motion-segmentation using locally coupled networks. IEEE Trans Neural Net 15:1202-1214
Zhao L, Macau EEN (2001) A network of dynamically coupled chaotic maps for scene segmentation. IEEE Trans Neural Net 12:1375-1385
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Wang, D. (2007). Computational Scene Analysis. In: Duch, W., Mańdziuk, J. (eds) Challenges for Computational Intelligence. Studies in Computational Intelligence, vol 63. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71984-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-71984-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71983-0
Online ISBN: 978-3-540-71984-7
eBook Packages: EngineeringEngineering (R0)