Computational model for neural representation of multiple disparities

doi:10.1016/S0893-6080(02)00186-7

Neural Networks

Volume 16, Issue 1, January 2003, Pages 25-37

https://doi.org/10.1016/S0893-6080(02)00186-7 Get rights and content

Abstract

It has been known that the visual system can detect more than one disparity and/or motion direction at the same region in the image. These multiple (or transparent) surfaces can be perceived when, for example, we are looking scenes through a glass. However, many conventional models cannot deal with these multiple surfaces. The present paper investigates the neural encoding and decoding of multiple disparities with the binocular energy model, which is known as a biologically plausible model. Based on the analysis of the response of the energy model to multiple disparities, the present paper proposes a stereo model that can detect disparities of two overlapping surfaces.

Introduction

In the computational research of early vision, one of the important problems that have been left unsolved is multiple (or transparent) surface perception. Multiple surfaces can be perceived by superposing visual patterns, each of which has different disparity or motion direction. Fig. 1 shows an example of a random-dot stereogram that consists of two overlapping surfaces. In this stereogram, half the dots have a disparity at +3 pixels and half at −3 pixels. Fusing the stereogram, we can perceive two different disparities simultaneously. In motion perception, two different directions can be perceived at the same local region in the image produced by superposing two moving patterns. This situation is called transparent motion (Snowden et al., 1991, Qian and Andersen, 1994, Qian et al., 1994a, Qian et al., 1994b, Verstraten et al., 1994, van Wezel et al., 1996, Hida et al., 1998, Dakin and Mareschal, 2000, Treue et al., 2000).

In these transparent situations, the visual system has to represent more than one disparity and/or motion at the same region of the visual field; this is the problem of multiple surface perceptions. Many conventional models for stereopsis employ inhibitory interactions among binocular cells with different disparity preferences. For example, the uniqueness constraint (Marr & Poggio, 1976) and the ordering constraint (Hirai & Fukushima, 1978) require these interactions. Such interactions would silence neurons under transparent conditions. In addition, as Grimson (1993) pointed out, one of the important roles of stereopsis is to segment a complex visual scene into individual surfaces. Transparent surfaces present a particularly difficult problem for segmentation because two or more disparities exist at each local region in the image, and conventional stereo models fail in these circumstances.

Marr and Poggio, 1979a, Marr and Poggio, 1979b, Prazdny, 1985 proposed feature-based matching algorithms that can handle semi-transparency¹ and Panum's limiting case. However, these algorithms would have trouble treating scenes containing pure transparencies such as looking through a glass, because these models implicitly assume opacity of features (Shizawa, 1993). Weinshall, 1993, Marshall et al., 1996 pointed out the cases that the Prazdny's algorithm fails to detect multiple disparities, and proposed models for transparent stereopsis. Both their models can explain some psychological phenomena concerning transparent stereopsis. However, their models still require opacity of features. In addition, assuming sparseness of features implicitly, above described models ignore interference among features attached to different surfaces. To detect disparities in pure transparent scenes, Shizawa (1993) proposed the area-based stereo algorithm that can reconstruct two overlapping surfaces. His model will be discussed in detail in a later section.

The main purpose of the present research is to establish a biological model that can deal with multiple surfaces. The problem we should consider is how to represent multiple disparities with the population of binocular cells. There are some previous studies proposing population-coding models for neural representation and detection of multiple surfaces. Lehky and Sejnowski (1990) proposed a stereo model with population coding. Their model employs a template matching procedure; an arbitrary population is assigned a disparity by finding which canonical population matches it best. They used a root-mean-square (RMS) error as the matching measure. If there are two equal minima of the RMS error, it is interpreted as representing two disparities simultaneously. Zemel and Pillow (2000) proposed a simple model for transparent motion. Their model is based on the assumption that a population response to multiple surfaces is equal to the scaled sum of the population responses to the individual surfaces. This assumption is consistent with physiological studies on motion sensitive neurons (van Wezel et al., 1996, Hida et al., 1998, Treue et al., 2000). They showed that each surface can be decoded by the statistical decoding method (Zemel, Dayan, & Pouget, 1998). These models implicitly assume that a population response to multiple surfaces can be divided into the population responses to each single surface, uniquely. However, it is not known whether the population of the binocular neurons has such characteristic.

In our model, we employ the binocular energy model because of its biological plausibility (Ohzawa, DeAngelis, & Freeman, 1990). At first, we analyze the response of the energy model to two overlapping disparities and show how two overlapping disparities are encoded. The analysis indicates that the population of the binocular energy models responding to multiple surfaces cannot be divided into the population responses to each single surface. This ambiguity occurs because (1) an amplitude spectrum of each surface is unknown and (2) there is interference among surfaces. Next, based on the analysis, we propose a computational model decoding two overlapping disparities. To decode overlapping disparities uniquely, the proposed model employs (1) the assumption concerning amplitude spectra and (2) population pooling method. Important outcomes of this study are revealing the effect of interference between overlapping surfaces and proposing the stereo model considering the effect of interference. This effect is ignored by previous models for transparent stereopsis. Computer simulations show the present model can detect two overlapping disparities. Because feature opacity is not assumed, this model can handle with pure transparent scenes. Therefore, two disparities are assigned at the same position, simultaneously.

Section snippets

Definition of the binocular energy model

The energy model consists of two types of cells corresponding to simple and complex cells (Adelson & Bergen, 1985). This model is used as a theoretical model of a binocular cell and/or a motion sensitive cell because of its biological plausibility (Ohzawa et al., 1990).

For simplicity, one-dimensional receptive field is considered in this paper (Qian, 1994, Fleet et al., 1996, Qian and Zhu, 1997, Qian and Mikaelian, 2000). In the energy model, the receptive field of the simple cell is defined by

Response to multiple disparities

In motion perception, there are some psychophysical and physiological evidences that suggest the responses of motion-selective neurons to transparent motions can be approximated by a scaled sum of responses to individual motions (Verstraten et al., 1994, van Wezel et al., 1996, Hida et al., 1998, Treue et al., 2000). In binocular stereopsis, there are fewer evidences that suggest the response to multiple surfaces than in motion perception. Mallot, Roll, and Arndt (1996) studied disparity-evoked

Population pooling

Correspondence problem is one of the important problems in structure from stereopsis. On the other hand, it is considered that the binocular energy model can reduce false matches because of its selectivity to, for example, spatial frequency and orientation (Ohzawa, 1999). However, the binocular energy model has another problems known as the false peak problem (Fleet et al., 1996); a response of the energy model does not always equal to its canonical response shown by Eq. (12). In order to solve

Simulation

In this section, the results of the computer simulation with one-dimensional signals are shown. Fig. 4 shows the overview of the model. Each column consists of the one-dimensional binocular energy models of which the preferred disparities Δψ are different each other. A mean disparity $D ̄$ and a disparity difference ΔD are calculated after pooling.

In the simulations, the preferred spatial frequency was set at $ω_{0} /(2 π)=7.38×10^{−3} cycle / pixel .$ The variance of Gaussian in Eq. (2) was defined as σ²=8.8/ω₀

Discussion

The present paper has proposed the biological model that encodes and decodes two overlapping disparities with the binocularly energy model. The population of the energy model responding to multiple disparities forms a sinusoidal function. Therefore, it cannot be divided into the responses to each single disparity, uniquely. This ambiguity occurs because (1) an amplitude spectrum of each surface is unknown and (2) there is interference among surfaces. In order to decode two overlapping

Acknowledgements

The authors would like to acknowledge to Dr Masayuki Kikuchi for helpful comments and discussion.

References (43)

S.C. Dakin et al.
The role of relative motion computation in directional repulsion
Vision Research
(2000)
D.J. Fleet et al.
Neural encoding of binocular disparity: Energy models, position shifts and phase shifts
Vision Research
(1996)
S. Gepshtein et al.
Stereoscopic transparency: A test for binocular vision's disambiguating power
Vision Research
(1998)
M.J.M. Lankheet et al.
Stereoscopic segregation of transparent surfaces and the effect of motion contrast
Vision Research
(1998)
H.A. Mallot et al.
Disparity-evoked vergence is driven by interocular correlation
Vision Research
(1996)
S. Mikaelian et al.
A physiologically-based explanation of disparity attraction and repulsion
Vision Research
(2000)
N. Qian et al.
Physiological computation of binocular disparity
Vision Research
(1997)
A.M. Rohaly et al.
Disparity averaging across spatial scales
Vision Research
(1994)
S.B. Stevenson et al.
Depth attraction and repulsion in random dot stereograms
Vision Research
(1991)
F.A.J. Verstraten et al.
Movement aftereffect of bi-vectorial transparent motion
Vision Research
(1994)

R.J.A. van Wezel et al.

Responses of complex cells in area 17 of the cat to bi-vectorial transparent motion

Vision Research

(1996)

R.S. Zemel et al.

Encoding multiple orientations in a recurrent network

Neurocomputing

(2000)

E.H. Adelson et al.

Spatiotemporal energy models for the perception of motion

Journal of the Optical Society of America A

(1985)

R.A. Akerstrom et al.

The perception of stereoscopic transparency

Perception and Psychophysics

(1988)

M. Carandini et al.

Summation and division by neurons in primate visual cortex

Science

(1994)

D.J. Fleet et al.

Stability of phase information

IEEE Transactions on Pattern Analysis and Machine Intelligence

(1993)

W.E.L. Grimson

Why stereo vision is not always about 3D reconstruction

(1993)

Hida, E., Saito, H., Ohno, H., Odajima, K., Tanamori, D (1998). Neuronal correlate for the perception of...

Y. Hirai et al.

An interface upon the neural network finding binocular correspondence

Biological Cybernetics

(1978)

M. Idesawa

Perception of 3-D transparent illusory surface with binocular fusion

Japanese Journal of Applied Physics

(1991)

M. Kikuchi et al.

Perception of multiple-depth at a single retinal position

Perception

(2001)

Cited by (6)

A neural model for stereo transparency with the population of the disparity energy models
2008, Neurocomputing
The disparity energy model can interpret a variety of physiological properties of binocular neurons in the early visual cortex quantitatively. Therefore, many physiologically plausible models for binocular stereopsis employed the disparity energy model as a model neuron. These models can explain a variety of psychological findings concerning stereo perception. However, most of the models cannot handle with stereo transparency. Here, we develop a simple stereo model for transparency perception with the hybrid-type disparity energy model, and examine the ability to detect overlapping disparities. Computer simulations showed that the model properties of transparency detection are consistent with many psychophysical findings concerning stereo transparency.
Early computational processing in binocular vision and depth perception
2005, Progress in Biophysics and Molecular Biology
Citation Excerpt :
Understanding how input from V1 is interpreted by the different brain areas to which it projects is the major challenge currently facing theories of stereopsis. Although several physiologically-based models have been proposed (Fleet et al., 1996; Grossberg, 1994; Grossberg and Howe, 2003; Lehky and Sejnowski, 1990; Lippert and Wagner, 2002; Matthews et al., 2003; McLoughlin and Grossberg, 1998; Mikaelian and Qian, 2000; Prince and Eagle, 2000; Qian, 1994, 1997; Qian and Andersen, 1997; Read, 2002a,b; Tsai and Victor, 2003; Watanabe and Idesawa, 2003; Zhaoping, 2002), much work remains to be done in tying these theories more closely to the known physiology, and expanding them to provide a complete account of stereoscopic perception. Primary visual cortex, V1, contains a population of neurons specialized for binocular vision.
Stereoscopic depth perception is a fascinating ability in its own right and also a useful model of perception. In recent years, considerable progress has been made in understanding the early cortical circuitry underlying this ability. Inputs from left and right eyes are first combined in primary visual cortex (V1), where many cells are tuned for binocular disparity. Although the observation of disparity tuning in V1, combined with psychophysical evidence that stereopsis must occur early in visual processing, led to initial suggestions that V1 was the neural correlate of stereoscopic depth perception, more recent work indicates that this must occur in higher visual areas. The firing of cells in V1 appears to depend relatively simply on the visual stimuli within local receptive fields in each retina, whereas the perception of depth reflects global properties of the stimulus. However, V1 neurons appear to be specialized in a number of respects to encode ecologically relevant binocular disparities. This suggests that they carry out essential pre-processing underlying stereoscopic depth perception in higher areas. This article reviews recent progress in developing accurate models of the computations carried out by these neurons. We seem close to achieving a mathematical description of the initial stages of the brain's stereo algorithm. This is important in itself––for instance, it may enable improved stereopsis in computer vision––and paves the way for a full understanding of how depth perception arises.
Nonlinearity of the population activity to transparent motion
2005, Neural Networks
How to represent transparent motion with neuronal populations is important problem for the theory of multiple motion detection. Previous models are based on the assumption that the population activity to transparent motion is proportional to a linear combination of the responses to individual motions. However, there is a possibility that the population activity becomes a nonlinear combination of each motion's component due to the interference, or cross-talk, between two moving patterns. Here we show the model analysis of how a neuronal population represents multiple motions with the spatiotemporal energy model. The model analysis indicates there is a special case that the interference leads to the nonlinearity in the population response, although the linear combination assumption is satisfied in general. This special case corresponds to locally-paired-dot (LPD) stimuli that produce no transparency. Computer simulations show that a simple model for motion detection fails to discriminate two overlapping motions in this case due to the nonlinearity in population responses, and this failure is similar to human perception in LPD stimuli. This result suggests that non-transparency perception in LPD stimuli is naturally explained by the nonlinear property of neuronal responses.
Cross-Modal Cue Effects in Motion Processing
2019, Multisensory Research
Stereo transparency in ambiguous stereograms generated by overlapping two identical dot patterns
2009, Journal of Vision
A neural model for stereo transparency with the population of the disparity energy models
2006, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

View full text