Elsevier

Pattern Recognition

Volume 38, Issue 7, July 2005, Pages 985-996
Pattern Recognition

Integration of multiresolution image segmentation and neural networks for object depth recovery

https://doi.org/10.1016/j.patcog.2005.01.005Get rights and content

Abstract

A novel technique for three-dimensional depth recovery based on two coaxial defocused images of an object with added pattern illumination is presented. The approach integrates object segmentation with depth estimation. Firstly segmentation is performed by a multiresolution based approach to isolate object regions from the background given the presence of blur and pattern illumination. The segmentation has three sub-procedures: image pyramid formation; linkage adaptation; and unsupervised clustering. These maximise the object recognition capability while ensuring accurate position information. For depth estimation, lower resolution information with a strong correlation to depth is fed into a three-layered neural network as input feature vectors and processed using a Back-Propagation algorithm. The resulting depth model of object recovery is then used with higher resolution data to obtain high accuracy depth measurements. Experimental results are presented that show low error rates and the robustness of the model with respect to pattern variation and inaccuracy in optical settings.

Introduction

Depth measurement is one of the most important tasks in many computer vision applications, including three dimensional object recognition, scene interpretation, post inspection and part manipulation. Three-dimensional (3-D) positional information restoration can be obtained using various techniques, among which depth from defocus (DFD) methods have the advantage that they require only two co-axial images obtained with different optical settings. DFD avoids the missing part and correspondence problems that occur with stereo. Despite these merits, DFD shares one inherent weakness with stereo and motion techniques in that it requires the scene to contain natural or projected textures. In the work presented here we used projected texture (active illumination).

Several researchers have developed accurate, dense depth estimations from defocused images in the past decade. The DFD method, originally developed by Pentland [1], uses the relative defocus in two images taken with different camera settings to determine scene structures. Many other techniques have followed, and these fall into the two main categories of Fourier and spatial domain based modelling. Subbarao has successively presented depth models for both domains [2]. Nayar et al. [3] give a precise blur analysis in the frequency domain using focus operators as models. They considered both actively and passively illuminated scenes [4]. Furthermore, they proposed telecentric optics [5] to achieve magnification invariance under changes in the focus setting. Their technique employed a small bank of broadband rational filters [6] able to handle arbitrary textures. The method computed efficiently and produced accurate results even for weak textures. Ghita and Whelan [7] reported a practical DFD implementation based on simple filters and a striped illumination pattern. They later use this algorithm in a bin picking application [8].

Recently, the techniques of artificial neural networks (ANNs), an empirical modelling in the spatial domain, have been applied to the DFD problem. ANNs have the properties of robustness and adaptation to approximate any non-linear function, so the stringent requirements for optical settings have been reduced compared to the earlier techniques. Tsai [9] proposed an algorithm to estimate the amount of blur from a single camera, in which the blur is calculated using a moment-preserving technique. The ANNs are only used to compensate for certain depth errors. Pham and Aslantas [10] have presented a technique employing a multi-layer perceptron (MLP) network to compute distances from derivative images of blurred edges. The theory of the MLP is described by Pinkus [11]. In addition, Jong and Huang [12] have explored the Radical basis Function (RBF) neural network for blur scale detection of the point-spread function (PSF). Presently, there are few ANN based approaches to DFD oriented object recovery in the literature. The main problems for ANN based depth models are to build robust, accurate depth estimates with reasonably small networks, and the trade-off between the amount of pre-processed input data required and the efficiency achieved by the training procedure. In this paper, a novel ANN based approach for depth measurement is reported that simplifies model architecture and improves model performance. We have integrated image segmentation with neural network learning, to solve depth recovery by a two-stage procedure, in which two-dimensional (2-D) object segmentation is followed by 3-D depth model formation. The first stage can be viewed as data pre-processing before the depth modelling stage. A multiresolution scheme, used for edge detection in [13], [14], was applied at the first stage with the objectives of reducing the data needed to form the depth model in the later stage, and to provide a reliable segmentation for the pattern-based image. Firstly, the data from one defocused image is processed to form a multiresolution pyramid, in which the subsequent levels have progressively lower image resolution, but preserve the essential depth information in the similarity measures between parent–child nodes at the neighbouring levels. Only one image is required in this stage as a telecentric lens was employed to eliminate any magnification of objects between images. Finally an unsupervised fuzzy clustering was applied at a working level, defined in Section 3.4, to produce isolated object regions. In the depth estimation stage, a depth model in a three-layered neural network, whose architecture is determined by depth feature extractions, was generated using a back-propagation algorithm with the training data derived from the previous stage at a low resolution level and camera calibration data. The basic framework of our approach is shown in Fig. 1. Firstly, the two defocused grey level images with the projected illumination pattern are segmented to decompose the scene into distinct meaningful regions. The resulting data are derived from the object regions at a low resolution in order to ease the burden of network learning and to reduce the uncertainty in object detection. To estimate the 3-D information, reliable feature vectors from the first stage, that are to be input to the nodes of the neural network, are selected to provide useful data related to depth. Finally, the ANN model is generated to perform the object depth recovery. After building the ANN model, it can be used to calculate the depth of objects in unseen or partly seen images, that is images that were not part of the training set. This work is the first reported in the literature to use neural networks to calculate the depth from a pair of defocused images. It is also novel in using an object detection stage followed by the depth recovery stage. Experimental results with different illumination patterns are demonstrated to show the effectiveness of the approach.

This paper is organized as follows: Section 2 overviews the theory of DFD and the derivation of the depth formulae. Section 3 gives a detailed illustration of the multiresolution techniques for 2-D object detection using the image segmentation and boundary forming algorithms on blurred images. Section 4 presents a MLP solution to object recovery using the pre-processed images and depth related features. Section 5 demonstrates the experimental results and includes discussion on the depth accuracy and the effect of varying the illumination pattern. The paper is concluded in Section 6.

Section snippets

Depth from defocus

To illustrate the concept of recovering the depth from defocus, the basic image formation geometry is shown in Fig. 2. When an image is in focus, all the photons that are radiated by a point object O pass through the aperture A and are refracted by the lens to converge at the point Q in the focused image plane If. The focused plane position v depends on the depth u of the object and the focal length f of the lens. According to the lens law1u+1v=1f.However, when the point object O is not in

The image segmentation algorithm

The first stage of depth estimation is to isolate each object within a scene. Here we need to accurately segment a scene onto which an illumination pattern has been projected to ensure high spatial resolution in the eventual computed depth map. A crucial problem for segmentation is to manage two sources of uncertainty. These are the uncertainty in estimating the feature property in each small object region, and the spatial uncertainty of where the region's boundary lies. Moreover, these two

Depth estimation using neural networks

The properties and training algorithms of MLP networks are well documented, but less information is available on how to configure the network. In general, a feed forward neural network with one hidden layer, composed of neurons with activation functions, and a linear output layer can approximate any continuous function to the desired accuracy. In our case, the use of the three layer MLP was predetermined as its simplicity and good performance was required. The number of hidden neurons were

Experiments

We have implemented the proposed techniques for depth recovery. The scene was imaged using a TAMRON 25 mm lens converted to be telecentric by an additional aperture. The aperture diameter was normally set to 7.9 mm unless otherwise stated, which gave an f-number of 3.17. A Pulnix monochrome camera model TM-745E and a frame grabber were set to capture images of size 512×512 with 256 grey levels. Different illumination patterns, based on checkerboards and stripes were simulated by attaching printed

Conclusions

The proposed technique for object recovery consisted of two main components, image segmentation and 3-D depth estimation. Pairs of defocused images captured with different optical settings were processed to produce dense depth maps of various objects. The implementation was based on active DFD, although it was achieved by attaching an illumination pattern to the objects. The image segmentation was able to decompose the image into disjoint meaningful regions that had a strong correlation with

Summary

In this paper, a novel neural-network based approach to depth measurement, based on two defocused images with added pattern illumination, is reported. It simplifies the model architectures and improves model performance. We have integrated image segmentation with neural network learning. The solution to depth recovery is therefore a two-stage procedure: firstly objects are detected in 2-D and then 3-D depth is estimated. The object detection is performed by a multiresolution image segmentation

Acknowledgements

The authors gives acknowledgements to the China Scholarship Council for providing financial support, and the University of Warwick, UK for the research facilities provided. Thanks are also given to Mr. C. Claxton for his help with the experiments.

About the Author—LI MA received the B.S. and Ph.D. degrees in Electrical Engineering from Central South University, PR China, in 1976 and 1998 respectively. She is currently a professor at the department of computer science, Zhengzhou Institute of Light Industry, PR China. She has been an academic visitor to the College of Cardiff University of Wales, UK during 1993–1994, and she is currently a senior visiting fellow at the University of Warwick. Her research interests include signal and image

References (16)

There are more references available in the full text version of this article.

Cited by (9)

  • Hierarchical Object Relationship Constrained Monocular Depth Estimation

    2021, Pattern Recognition
    Citation Excerpt :

    However, the multi-task assignments, such as Xu et al. [17] and Liu et al. [18], not only require hundreds of thousands of ground-truth images, but also face huge challenges in defining loss functions to jointly train the entire network. For the methods respectively proposed by Ji et al. [9] and Ma et al. [19], they all resorted to combining the CNN network with CRF to extract pixel-level features. However, the frameworks with CRF are cumbersome and difficult to make tradeoff between efficiency and accuracy.

  • Rational filter design for depth from defocus

    2012, Pattern Recognition
    Citation Excerpt :

    Video-rate processing is a requirement for 3D TV, and fast processing extends the use of DfD for robotics and production line applications. Efficient DfD computation methods have been proposed [4,15,16]; however, in this paper, since we are concerned with video-rate depth estimation for every pixel in the image, and passive illumination, we have chosen an approach based on rational filters [4] as detailed more fully below. The optical arrangement is as shown in Fig. 1, where a point on an object Q would be in-focus at point q in an image plane if.

  • A modified fuzzy C-means image segmentation algorithm for use with uneven illumination patterns

    2007, Pattern Recognition
    Citation Excerpt :

    These bias fields are slowly changing and multiplicatively imposed onto the captured images and the naturally occurring textures within them. A multiresolution analysis has been proposed to expedite the task of detecting the object in a defocused illumination image by building a pyramid, as described in Ref. [12]. This multiresolution approach is functional but with a relatively higher computational cost than the proposed method.

View all citing articles on Scopus

About the Author—LI MA received the B.S. and Ph.D. degrees in Electrical Engineering from Central South University, PR China, in 1976 and 1998 respectively. She is currently a professor at the department of computer science, Zhengzhou Institute of Light Industry, PR China. She has been an academic visitor to the College of Cardiff University of Wales, UK during 1993–1994, and she is currently a senior visiting fellow at the University of Warwick. Her research interests include signal and image processing, pattern recognition, neural networks.

About the Author—RICHARD C. STAUNTON received the B.Sc. (honours) degree in electronic engineering from the City University, UK, in 1973, and the Ph.D. degree in engineering from the University of Warwick, UK, in 1992. From 1973 to 1977 he worked for the aerospace industry, and from 1977 to 1986 for the UK National Health Service, where he engaged in research and development of medical image processing systems. Since 1986 he has been a lecturer at the University of Warwick. His current research interests include industrial image processing, hexagonal sampling systems, colour image processing, manufactured surface analysis, and depth from defocus.

View full text