Elsevier

Neural Networks

Volume 16, Issue 1, January 2003, Pages 25-37
Neural Networks

Computational model for neural representation of multiple disparities

https://doi.org/10.1016/S0893-6080(02)00186-7Get rights and content

Abstract

It has been known that the visual system can detect more than one disparity and/or motion direction at the same region in the image. These multiple (or transparent) surfaces can be perceived when, for example, we are looking scenes through a glass. However, many conventional models cannot deal with these multiple surfaces. The present paper investigates the neural encoding and decoding of multiple disparities with the binocular energy model, which is known as a biologically plausible model. Based on the analysis of the response of the energy model to multiple disparities, the present paper proposes a stereo model that can detect disparities of two overlapping surfaces.

Introduction

In the computational research of early vision, one of the important problems that have been left unsolved is multiple (or transparent) surface perception. Multiple surfaces can be perceived by superposing visual patterns, each of which has different disparity or motion direction. Fig. 1 shows an example of a random-dot stereogram that consists of two overlapping surfaces. In this stereogram, half the dots have a disparity at +3 pixels and half at −3 pixels. Fusing the stereogram, we can perceive two different disparities simultaneously. In motion perception, two different directions can be perceived at the same local region in the image produced by superposing two moving patterns. This situation is called transparent motion (Snowden et al., 1991, Qian and Andersen, 1994, Qian et al., 1994a, Qian et al., 1994b, Verstraten et al., 1994, van Wezel et al., 1996, Hida et al., 1998, Dakin and Mareschal, 2000, Treue et al., 2000).

In these transparent situations, the visual system has to represent more than one disparity and/or motion at the same region of the visual field; this is the problem of multiple surface perceptions. Many conventional models for stereopsis employ inhibitory interactions among binocular cells with different disparity preferences. For example, the uniqueness constraint (Marr & Poggio, 1976) and the ordering constraint (Hirai & Fukushima, 1978) require these interactions. Such interactions would silence neurons under transparent conditions. In addition, as Grimson (1993) pointed out, one of the important roles of stereopsis is to segment a complex visual scene into individual surfaces. Transparent surfaces present a particularly difficult problem for segmentation because two or more disparities exist at each local region in the image, and conventional stereo models fail in these circumstances.

Marr and Poggio, 1979a, Marr and Poggio, 1979b, Prazdny, 1985 proposed feature-based matching algorithms that can handle semi-transparency1 and Panum's limiting case. However, these algorithms would have trouble treating scenes containing pure transparencies such as looking through a glass, because these models implicitly assume opacity of features (Shizawa, 1993). Weinshall, 1993, Marshall et al., 1996 pointed out the cases that the Prazdny's algorithm fails to detect multiple disparities, and proposed models for transparent stereopsis. Both their models can explain some psychological phenomena concerning transparent stereopsis. However, their models still require opacity of features. In addition, assuming sparseness of features implicitly, above described models ignore interference among features attached to different surfaces. To detect disparities in pure transparent scenes, Shizawa (1993) proposed the area-based stereo algorithm that can reconstruct two overlapping surfaces. His model will be discussed in detail in a later section.

The main purpose of the present research is to establish a biological model that can deal with multiple surfaces. The problem we should consider is how to represent multiple disparities with the population of binocular cells. There are some previous studies proposing population-coding models for neural representation and detection of multiple surfaces. Lehky and Sejnowski (1990) proposed a stereo model with population coding. Their model employs a template matching procedure; an arbitrary population is assigned a disparity by finding which canonical population matches it best. They used a root-mean-square (RMS) error as the matching measure. If there are two equal minima of the RMS error, it is interpreted as representing two disparities simultaneously. Zemel and Pillow (2000) proposed a simple model for transparent motion. Their model is based on the assumption that a population response to multiple surfaces is equal to the scaled sum of the population responses to the individual surfaces. This assumption is consistent with physiological studies on motion sensitive neurons (van Wezel et al., 1996, Hida et al., 1998, Treue et al., 2000). They showed that each surface can be decoded by the statistical decoding method (Zemel, Dayan, & Pouget, 1998). These models implicitly assume that a population response to multiple surfaces can be divided into the population responses to each single surface, uniquely. However, it is not known whether the population of the binocular neurons has such characteristic.

In our model, we employ the binocular energy model because of its biological plausibility (Ohzawa, DeAngelis, & Freeman, 1990). At first, we analyze the response of the energy model to two overlapping disparities and show how two overlapping disparities are encoded. The analysis indicates that the population of the binocular energy models responding to multiple surfaces cannot be divided into the population responses to each single surface. This ambiguity occurs because (1) an amplitude spectrum of each surface is unknown and (2) there is interference among surfaces. Next, based on the analysis, we propose a computational model decoding two overlapping disparities. To decode overlapping disparities uniquely, the proposed model employs (1) the assumption concerning amplitude spectra and (2) population pooling method. Important outcomes of this study are revealing the effect of interference between overlapping surfaces and proposing the stereo model considering the effect of interference. This effect is ignored by previous models for transparent stereopsis. Computer simulations show the present model can detect two overlapping disparities. Because feature opacity is not assumed, this model can handle with pure transparent scenes. Therefore, two disparities are assigned at the same position, simultaneously.

Section snippets

Definition of the binocular energy model

The energy model consists of two types of cells corresponding to simple and complex cells (Adelson & Bergen, 1985). This model is used as a theoretical model of a binocular cell and/or a motion sensitive cell because of its biological plausibility (Ohzawa et al., 1990).

For simplicity, one-dimensional receptive field is considered in this paper (Qian, 1994, Fleet et al., 1996, Qian and Zhu, 1997, Qian and Mikaelian, 2000). In the energy model, the receptive field of the simple cell is defined by

Response to multiple disparities

In motion perception, there are some psychophysical and physiological evidences that suggest the responses of motion-selective neurons to transparent motions can be approximated by a scaled sum of responses to individual motions (Verstraten et al., 1994, van Wezel et al., 1996, Hida et al., 1998, Treue et al., 2000). In binocular stereopsis, there are fewer evidences that suggest the response to multiple surfaces than in motion perception. Mallot, Roll, and Arndt (1996) studied disparity-evoked

Population pooling

Correspondence problem is one of the important problems in structure from stereopsis. On the other hand, it is considered that the binocular energy model can reduce false matches because of its selectivity to, for example, spatial frequency and orientation (Ohzawa, 1999). However, the binocular energy model has another problems known as the false peak problem (Fleet et al., 1996); a response of the energy model does not always equal to its canonical response shown by Eq. (12). In order to solve

Simulation

In this section, the results of the computer simulation with one-dimensional signals are shown. Fig. 4 shows the overview of the model. Each column consists of the one-dimensional binocular energy models of which the preferred disparities Δψ are different each other. A mean disparity D̄ and a disparity difference ΔD are calculated after pooling.

In the simulations, the preferred spatial frequency was set at ω0/(2π)=7.38×10−3cycle/pixel. The variance of Gaussian in Eq. (2) was defined as σ2=8.8/ω0

Discussion

The present paper has proposed the biological model that encodes and decodes two overlapping disparities with the binocularly energy model. The population of the energy model responding to multiple disparities forms a sinusoidal function. Therefore, it cannot be divided into the responses to each single disparity, uniquely. This ambiguity occurs because (1) an amplitude spectrum of each surface is unknown and (2) there is interference among surfaces. In order to decode two overlapping

Acknowledgements

The authors would like to acknowledge to Dr Masayuki Kikuchi for helpful comments and discussion.

References (43)

  • R.J.A. van Wezel et al.

    Responses of complex cells in area 17 of the cat to bi-vectorial transparent motion

    Vision Research

    (1996)
  • R.S. Zemel et al.

    Encoding multiple orientations in a recurrent network

    Neurocomputing

    (2000)
  • E.H. Adelson et al.

    Spatiotemporal energy models for the perception of motion

    Journal of the Optical Society of America A

    (1985)
  • R.A. Akerstrom et al.

    The perception of stereoscopic transparency

    Perception and Psychophysics

    (1988)
  • M. Carandini et al.

    Summation and division by neurons in primate visual cortex

    Science

    (1994)
  • D.J. Fleet et al.

    Stability of phase information

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1993)
  • W.E.L. Grimson

    Why stereo vision is not always about 3D reconstruction

    (1993)
  • Hida, E., Saito, H., Ohno, H., Odajima, K., Tanamori, D (1998). Neuronal correlate for the perception of...
  • Y. Hirai et al.

    An interface upon the neural network finding binocular correspondence

    Biological Cybernetics

    (1978)
  • M. Idesawa

    Perception of 3-D transparent illusory surface with binocular fusion

    Japanese Journal of Applied Physics

    (1991)
  • M. Kikuchi et al.

    Perception of multiple-depth at a single retinal position

    Perception

    (2001)
  • Cited by (6)

    • Early computational processing in binocular vision and depth perception

      2005, Progress in Biophysics and Molecular Biology
      Citation Excerpt :

      Understanding how input from V1 is interpreted by the different brain areas to which it projects is the major challenge currently facing theories of stereopsis. Although several physiologically-based models have been proposed (Fleet et al., 1996; Grossberg, 1994; Grossberg and Howe, 2003; Lehky and Sejnowski, 1990; Lippert and Wagner, 2002; Matthews et al., 2003; McLoughlin and Grossberg, 1998; Mikaelian and Qian, 2000; Prince and Eagle, 2000; Qian, 1994, 1997; Qian and Andersen, 1997; Read, 2002a,b; Tsai and Victor, 2003; Watanabe and Idesawa, 2003; Zhaoping, 2002), much work remains to be done in tying these theories more closely to the known physiology, and expanding them to provide a complete account of stereoscopic perception. Primary visual cortex, V1, contains a population of neurons specialized for binocular vision.

    • A neural model for stereo transparency with the population of the disparity energy models

      2006, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    View full text