Elsevier

Pattern Recognition

Volume 31, Issue 5, 1 March 1998, Pages 561-574
Pattern Recognition

Relaxation by Hopfield network in stereo image matching

https://doi.org/10.1016/S0031-3203(97)00069-1Get rights and content

Abstract

This paper outlines a relaxation approach using the Hopfield neural network for solving the global stereovision matching problem. The primitives used are edge segments. The similarity, smoothness and uniqueness constraints are transformed into the form of an energy function whose minimum value corresponds to the best solution of the problem. We combine two methods: (a) optimization/relaxation[1]and (b) relaxation merit[2]with the above three constraints mapped in an energy function. The main contribution is made (1) by applying a learning strategy in the similarity constraint and (2) by introducing specific conditions to overcome the violation of the smoothness constraint and to avoid the serious problem arising from the required fixation of a disparity limit. So, we improve the stereovision matching process. A better performance of the proposed method is illustrated with a comparative analysis against a classical relaxation method.

Introduction

A major portion of the research efforts of the computer vision community has been directed towards the study of the three-dimensional (3-D) structure of the objects using machine analysis of images.3, 4, 5Analysis of video images in stereo has emerged as an important passive method for extracting the 3-D structure of a scene.

Following the Barnard and Fischler[6]terminology, we can view the problem of stereo analysis as consisting of the following steps: image acquisition, camera modeling, feature acquisition, image matching, depth determination and interpolation. The key step is that of image matching, that is, the process of identifying the corresponding points in two images that are cast by the same physical point in 3-D space. This paper is devoted solely to this problem.

The basic principle involved in the recovery of depth using passive imaging is triangulation. In stereopsis the triangulation needs to be achieved with the help of only the existing environmental illumination. Hence, a correspondence needs to be established between features from two images that correspond to some physical feature in space. Then, provided the position of centers of projection, the effective local length, the orientation of the optical axis, and the sampling interval of each camera are known, the depth can be established using triangulation.[7]

A review of the state-of-the-art-of stereovision matching allows us to distinguish two sorts of techniques broadly used in this discipline, area-based and feature-based.2, 4, 8Area-based stereo techniques use correlation between brightness (intensity) patterns in the local neighborhood of a pixel in one image with brightness patterns in the local neighborhood of the other image9, 10, 11, 12, 13where the number of possible matches becomes high, while feature-based methods use sets of pixels with similar attributes, normally, either pixels belonging to edges14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25or the corresponding edges themselves.2, 5, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36These latter methods lead to a sparse depth map only, leaving the rest of the surface to be reconstructed by interpolation; but they are faster than area-based methods, because there are many fewer points (features) to be considered.[4]We select a feature-based method as justified below.

Due to the nature of the stereovision system, there are intrinsic and extrinsic factors affecting it: (a) extrinsic, in a practical stereovision system, the left and right images are obtained at different positions/angles; (b)intrinsic, the stereovision system is equipped with two different physical cameras (i.e. with different components), which are always placed at the same relative position (left and right). A systematic noise appears for each one.

As a result of the above-mentioned factors the corresponding features in both images may display different values. This may lead to incorrect matches. Thus, it is very important to find features in both images which are unique or independent of possible variation in the images.[37]This research paper uses a feature-based method where edge segments are to be matched, because our experiment has been carried out in an artificial environment where the edge segments are abundant. Such features have been studied in terms of reliability8, 38and robustness[37]and, as mentioned before, have also been used in previous stereovision matching works. This fact justifies our choice of features, although they may be too local. Four average attribute values (module and direction gradient, variance and Laplace) are computed for each edge segment, as we will see latter.

Our stereo correspondence problem can be defined in terms of finding pairs of true matches, namely, pairs of edge segments in two images that are generated by the same physical edge segment in space. These true matches satisfy some competing constraints, generally three, proposed by Marr and Poggio:21, 22(1) uniqueness, each edge segment in an image should be matched to a unique edge segment in the other image; (2) similarity, matched edge segments have similar local properties or attributes; (3) smoothness, disparity values in a given neighborhood change smoothly, except at a few depth discontinuities.

We, subsequently, introduce the very important disparity concept. Assume a stereo pair of edge segments, where a segment belongs to the left image and the other to the right one. If we superpose the right image over the left one, for example, the two edge segments of the stereo pair appear horizontally displaced. Following horizontal (epipolar) lines from top to bottom in the superimposed images, we determine the common length of the two edge segments from the parallelogram formed by the set of epipolar lines containing a point of the left edge segment and other of the right one. We compute the disparity value for the pair of edge segments as the average displacement between points belonging to both edge segments along the common length.

Following the above-introduced constraints, we can say that the similarity and smoothness constraints are associated to local and global matching processes, respectively. The major difficulty of stereo processing arises due to the need to make global correspondences. A local edge segment in one image may match equally well with a number of edge segments in the other image (this problem is compounded by the fact that the local matches are not perfect due to the above-mentioned extrinsic and intrinsic factors). These ambiguities in local matches can only be resolved by considering sets of local matches globally. Hence, to make global correspondences given a pair of edge segments, we consider a set of neighboring edge segments, where a bound on the disparity range defines the neighborhood. Relaxation is a technique commonly used to find the best matches globally and it refers to any computational mechanism that employs a set of locally interacting parallel processes, one associated with each image unit, that in an iterative fashion update each unit’s current labeling in order to achieve a globally consistent interpretation of image data.39, 40

We, next, perform a review of relaxation approaches in stereovision matching. Relaxation labeling is a technique (proposed by Rosenfeld et al.4, 5, 11developed to deal with uncertainty in sensory data interpretation systems and to find the best matches. It uses contextual information as an aid in classifying a set of interdependent objects by allowing interactions among the possible classifications of related objects. In the stereo paradigm the problem involves assigning unique labels (or matches) to a set of features in an image from a given list of possible matches. So, the goal is to assign each feature (edge segment) a value corresponding to disparity in a manner consistent with certain predefined constraints.

Two main relaxation processes can be distinguished in stereo correspondence: optimization based and probabilistic/merit based. In the optimization-based processes, stereo correspondence is carried out by minimizing an energy function which is formulated from the applicable constraints. It represents a mechanism for the propagation of constraints among neighboring match features for the removal of ambiguity of multiple stereo matches in an iterative manner. The optimal solution is ground state, that is, the state (or states) of the lowest energy. In the probabilistic/merit-based processes, the initial probabilities/merits, established from a local stereo correspondence process and computed from similarity in the feature values, are updated iteratively depending on the matching probabilities/merits of neighboring features and also from the applicable constraints. The following papers use a relaxation technique: (a) probabilistic/merit5, 13, 16, 17, 20, 21, 24, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56(b) optimization through a Hopfield neural network.11, 13, 23, 28, 57We use (b).

These systems impose global constraints, i.e. similarity, uniqueness and smoothness. The smoothness constraint requires, as stated, to fix a bound on the disparity range for any feature. As a result, they give good performance on the images shown. In our experiments, we have found complex images with highly repetitive structures (features) or with objects very close to the cameras and occlusions violating the smoothness constraint. Hence, setting a disparity limit is a difficult task, because the global correspondence, for a given pair of features, is based on how well their neighboring pairs of features match. The neighborhood is a concept related strongly to the disparity limit. Therefore, global correspondence for complex images is the difficult issue with which this paper is mainly concerned. We propose an approach for solving the edge-segment stereovision matching problem, in which we have made good use of existing global matching strategies, where the similarity, uniqueness and continuity are applied as a result of the combination of two methods (we have followed a very important conclusion of Pavlidis[58]: (1) the Yu and Tsai (YT)[1]optimization relaxation process, implemented by a Hopfield neural network; (2) the Minimum Differential Disparity (MDD) algorithm of Medioni and Nevatia.[2]The main contribution of this paper is made (1) by introducing specific conditions during the relaxation process to overcome the violation of the smoothness constraint and (2) to avoid the problem arising from the fixation of the disparity limit. We have also applied a learning strategy in the similarity constraint. So, our method works properly even if the images are complex. We have chosen the Hopfield neural network because the convergence of the relaxation process is guaranteed and has been used in previous stereovision matching works with success.11, 13, 23, 25, 28, 57Additionally, this computational model has a massively parallel execution capability, as we are only worried with the method effectiveness (from the viewpoint of the percentage of matching successes), this is a possibility not implemented in our work.

The paper is organized as follows: in Section 2the stereovision optimization relaxation matching schema and the relationship to Hopfield neural networks are described. In Section 3we summarize the proposed matching procedure. The performance of the method is illustrated in Section 4, where a comparative study against an existing global relaxation matching method is carried out. The necessity of a relaxation matching strategy against a local matching technique is also made clear. Finally in Section 5, there is a discussion of some related topics.

Section snippets

Stereovision relaxation matching by the hopfield neural network

As mentioned before, this paper proposes an optimization relaxation technique which is a synthesis of two existing methods, YT and MDD, where the presence of repetitive structures is taken into account and discontinuities in the disparity values are allowed (violation of the smoothness constraint). This last idea is based upon the work of Dhond and Aggarwal[59]which deals with narrow near objects that produce occlusion problems.

From YT we take the optimization relaxation scheme and the Hopfield

Summary of the matching procedure

After mapping the energy function onto the Hopfield neural network, the correspondence process is achieved by letting the network to evolve so that it reaches a stable state, i.e. when no change occurs in the states of its neurons during the updating procedure.

The whole matching procedure can be summarized as follows (according to the steps provided in Ruichek and Postaire):[25]

  • 1.

    (A) Neural network topology

  • 2.

    (1) The network is organized with a set of neurons representing pairs of edge segments (from

Design of a test strategy

In order to assess the validity and performance of the proposed method (i.e. how well our approach works in images with and without additional complexity), we have selected 30 stereo pairs of realistic stereoimages from an indoor environment (each pair consists of two left and right original images and two left and right images of labeled edge segments). All tested images are 512×512 pixels in size, with 256 gray levels. The two cameras have equal focal lengths and are situated in a parallel

Concluding remarks

An approach to stereo correspondence using the Hopfield neural network is presented. The stereo correspondence problem is formulated as an optimization task where an energy function, which represents the mapping of three constraints (similarity, smoothness and uniqueness) on the solution, is minimized, thanks to a Hopfield neural network. The similarity constraint is doubly applied: (a) during the computation of the local matching probability (see Section 2.2); and (b) during the mapping of the

About the Author—G. PAJARES received B.S. and Ph.D. degrees in Physics from UNED (Distance University of Spain) (1987, 1995) discussing a thesis on the application of pattern recognition techniques to stereovision. Since 1990 he worked at ENOSA in critical software development. He joined the Complutense University in 1995 as an Associate Professor in Robotics. His current research interests include robotics vision systems and applications of automatic control to robotics.

References (71)

  • C.Y Wang et al.

    Some experiments in relaxation image matching using corner features

    Pattern Recognition

    (1983)
  • T Pavlidis

    Why progress in machine vision is so slow

    Pattern Recognition Lett.

    (1992)
  • J.G Leu et al.

    Detecting the dislocations in metal crystals from microscopic images

    Pattern Recognition

    (1991)
  • R Nevatia et al.

    Linear feature extraction and description

    Computer Vision Graphics Image Processing

    (1980)
  • A.R Dhond et al.

    Structure from stereo — a review

    IEEE Trans Systems Man Cybernnet

    (1989)
  • T Ozanian

    Approaches for stereo matching—A review

    Modeling Identification Control

    (1995)
  • G Pajares

    Estrategia de solucion al problema de la correspondencia en vision estereoscopica por la jerarquıa metodologica y la integracion de criterios, Ph.D. thesis

    (1995)
  • S Barnard et al.

    Computational stereo

    ACM Comput. Surveys

    (1982)
  • K.S Fu et al.

    RobóticaControl, detección, visión e inteligencia

    (1988)
  • H. H. Baker, Building and using scene representations in image understanding, AGARD-LS-185 Machine Perception 3.1–3.11...
  • P Fua

    A parallel algorithm that produces dense depth maps and preserves image features

    Machine Vision Appl.

    (1993)
  • Y Shirai

    Three-Dimensional Computer Vision

    (1983)
  • Y Zhou et al.

    Artificial Neural Networks for Computer Vision

    (1992)
  • W.E.L Grimson

    Computational experiments with a feature-based stereo algorithm

    IEEE Trans Pattern Anal. Mach. Intell.

    (1985)
  • A Khotanzad et al.

    Stereopsis by constraint learning feed-forward neural networks

    IEEE Trans. Neural Networks

    (1993)
  • Y.C Kim et al.

    Positioning three-dimensional objects using stereo images

    IEEE J. Robotics Automat.

    (1987)
  • V.R Lutsiv et al.

    On the use of a neurocomputer for stereoimage processing

    Pattern Recognition Image Anal.

    (1992)
  • D. Maravall and E. Fernandez, Contribution to the matching problem in stereovision, Proc. 11th IAPR: Internat. Conf....
  • D Marr

    La Vision

    (1985)
  • D Marr

    Vision

    (1982)
  • D Marr et al.

    A computational theory of human stereovision

    Proc. Roy. Society of London

    (1979)
  • D Marr et al.

    Cooperative computation of stereo disparity

    Science

    (1976)
  • M.S Mousavi et al.

    ANN implementation of stereo vision using a multi-layer feedback architecture

    IEEE Trans. Systems Man Cybernnet

    (1994)
  • P Rubio

    RPun algoritmo eficiente para la búsqueda de correspondencias en visión estereoscópica

    Informática y Automática

    (1993)
  • Y Ruycheck et al.

    A neural network algorithm for 3-D reconstruction from stereo pairs of linear images

    Pattern Recognition Lett.

    (1996)
  • Cited by (43)

    • Combining Support Vector Machines and simulated annealing for stereovision matching with fish eye lenses in forest environments

      2011, Expert Systems with Applications
      Citation Excerpt :

      This is a global approach belonging to the category of methods that incorporate explicit smoothness assumption and determine all disparities simultaneously by applying an energy minimization process. Other methods, considered as global approaches, are those based on graph cuts (Bleyer & Gelautz, 2005b), belief propagation (Felzenszwalb & Huttenlocher, 2004) or Hopfield Neural Networks (Pajares, Cruz, & Aranda, 1998) among others. As reported in Klaus, Sormann, and Karner (2006) some advances and good performances in stereovision matching have been obtained by applying consecutive processes under different layers (Bleyer & Gelautz, 2005a).

    • Projective invariant object recognition by a Hopfield network

      2004, Neurocomputing
      Citation Excerpt :

      Nasrabadi and Choo [14] first used a Hopfield network to solve the correspondence problem for a set of feature points extracted from a pair of stereo images. In [17], a Hopfield network was also employed for solving the global stereo matching problem using edge segments. A fifth-order relaxation network has been proposed by Branca et al. [3,4] to find the feature correspondences for motion estimation by taking advantage of some good initial guess.

    View all citing articles on Scopus

    About the Author—G. PAJARES received B.S. and Ph.D. degrees in Physics from UNED (Distance University of Spain) (1987, 1995) discussing a thesis on the application of pattern recognition techniques to stereovision. Since 1990 he worked at ENOSA in critical software development. He joined the Complutense University in 1995 as an Associate Professor in Robotics. His current research interests include robotics vision systems and applications of automatic control to robotics.

    About the Author—J.M. CRUZ received M.Sc. degree in Physics and Ph.D. from the Complutense University in 1979 and 1984, respectively. From 1985 to 1990 he was with the Department of Automatic Control, UNED (Distance University of Spain), and from October 1990 to 1992 with the Department of Electronic, University of Santander. In October 1992, he joined the Department of Computer Science and Automatic Control of the Complutense University where he is a Professor. His current research interests include robotics vision systems, fusion sensors and applications of automatic control to robotics and flight control.

    About the Author—J. ARANDA received B.S. and M.S. degrees in Physics from Complutense University of Madrid (1983) and Ph.D. degree from the UNED (Distance University of Spain) in 1989. From 1985 to 1987 he was a Teaching Assistant in the Department of Automatic Control and Computer Science at the Complutense University. He joined in the UNED in 1987 as an Assistant Professor and he has since then held positions as Associate Professor (1991). His current research interests include robotics vision systems, fusion sensors and applications of automatic control to robotics and flight control.

    View full text