Integration of multiresolution image segmentation and neural networks for object depth recovery

doi:10.1016/j.patcog.2005.01.005

Pattern Recognition

Volume 38, Issue 7, July 2005, Pages 985-996

https://doi.org/10.1016/j.patcog.2005.01.005 Get rights and content

Abstract

A novel technique for three-dimensional depth recovery based on two coaxial defocused images of an object with added pattern illumination is presented. The approach integrates object segmentation with depth estimation. Firstly segmentation is performed by a multiresolution based approach to isolate object regions from the background given the presence of blur and pattern illumination. The segmentation has three sub-procedures: image pyramid formation; linkage adaptation; and unsupervised clustering. These maximise the object recognition capability while ensuring accurate position information. For depth estimation, lower resolution information with a strong correlation to depth is fed into a three-layered neural network as input feature vectors and processed using a Back-Propagation algorithm. The resulting depth model of object recovery is then used with higher resolution data to obtain high accuracy depth measurements. Experimental results are presented that show low error rates and the robustness of the model with respect to pattern variation and inaccuracy in optical settings.

Introduction

Depth measurement is one of the most important tasks in many computer vision applications, including three dimensional object recognition, scene interpretation, post inspection and part manipulation. Three-dimensional (3-D) positional information restoration can be obtained using various techniques, among which depth from defocus (DFD) methods have the advantage that they require only two co-axial images obtained with different optical settings. DFD avoids the missing part and correspondence problems that occur with stereo. Despite these merits, DFD shares one inherent weakness with stereo and motion techniques in that it requires the scene to contain natural or projected textures. In the work presented here we used projected texture (active illumination).

Several researchers have developed accurate, dense depth estimations from defocused images in the past decade. The DFD method, originally developed by Pentland [1], uses the relative defocus in two images taken with different camera settings to determine scene structures. Many other techniques have followed, and these fall into the two main categories of Fourier and spatial domain based modelling. Subbarao has successively presented depth models for both domains [2]. Nayar et al. [3] give a precise blur analysis in the frequency domain using focus operators as models. They considered both actively and passively illuminated scenes [4]. Furthermore, they proposed telecentric optics [5] to achieve magnification invariance under changes in the focus setting. Their technique employed a small bank of broadband rational filters [6] able to handle arbitrary textures. The method computed efficiently and produced accurate results even for weak textures. Ghita and Whelan [7] reported a practical DFD implementation based on simple filters and a striped illumination pattern. They later use this algorithm in a bin picking application [8].

Recently, the techniques of artificial neural networks (ANNs), an empirical modelling in the spatial domain, have been applied to the DFD problem. ANNs have the properties of robustness and adaptation to approximate any non-linear function, so the stringent requirements for optical settings have been reduced compared to the earlier techniques. Tsai [9] proposed an algorithm to estimate the amount of blur from a single camera, in which the blur is calculated using a moment-preserving technique. The ANNs are only used to compensate for certain depth errors. Pham and Aslantas [10] have presented a technique employing a multi-layer perceptron (MLP) network to compute distances from derivative images of blurred edges. The theory of the MLP is described by Pinkus [11]. In addition, Jong and Huang [12] have explored the Radical basis Function (RBF) neural network for blur scale detection of the point-spread function (PSF). Presently, there are few ANN based approaches to DFD oriented object recovery in the literature. The main problems for ANN based depth models are to build robust, accurate depth estimates with reasonably small networks, and the trade-off between the amount of pre-processed input data required and the efficiency achieved by the training procedure. In this paper, a novel ANN based approach for depth measurement is reported that simplifies model architecture and improves model performance. We have integrated image segmentation with neural network learning, to solve depth recovery by a two-stage procedure, in which two-dimensional (2-D) object segmentation is followed by 3-D depth model formation. The first stage can be viewed as data pre-processing before the depth modelling stage. A multiresolution scheme, used for edge detection in [13], [14], was applied at the first stage with the objectives of reducing the data needed to form the depth model in the later stage, and to provide a reliable segmentation for the pattern-based image. Firstly, the data from one defocused image is processed to form a multiresolution pyramid, in which the subsequent levels have progressively lower image resolution, but preserve the essential depth information in the similarity measures between parent–child nodes at the neighbouring levels. Only one image is required in this stage as a telecentric lens was employed to eliminate any magnification of objects between images. Finally an unsupervised fuzzy clustering was applied at a working level, defined in Section 3.4, to produce isolated object regions. In the depth estimation stage, a depth model in a three-layered neural network, whose architecture is determined by depth feature extractions, was generated using a back-propagation algorithm with the training data derived from the previous stage at a low resolution level and camera calibration data. The basic framework of our approach is shown in Fig. 1. Firstly, the two defocused grey level images with the projected illumination pattern are segmented to decompose the scene into distinct meaningful regions. The resulting data are derived from the object regions at a low resolution in order to ease the burden of network learning and to reduce the uncertainty in object detection. To estimate the 3-D information, reliable feature vectors from the first stage, that are to be input to the nodes of the neural network, are selected to provide useful data related to depth. Finally, the ANN model is generated to perform the object depth recovery. After building the ANN model, it can be used to calculate the depth of objects in unseen or partly seen images, that is images that were not part of the training set. This work is the first reported in the literature to use neural networks to calculate the depth from a pair of defocused images. It is also novel in using an object detection stage followed by the depth recovery stage. Experimental results with different illumination patterns are demonstrated to show the effectiveness of the approach.

This paper is organized as follows: Section 2 overviews the theory of DFD and the derivation of the depth formulae. Section 3 gives a detailed illustration of the multiresolution techniques for 2-D object detection using the image segmentation and boundary forming algorithms on blurred images. Section 4 presents a MLP solution to object recovery using the pre-processed images and depth related features. Section 5 demonstrates the experimental results and includes discussion on the depth accuracy and the effect of varying the illumination pattern. The paper is concluded in Section 6.

Section snippets

Depth from defocus

To illustrate the concept of recovering the depth from defocus, the basic image formation geometry is shown in Fig. 2. When an image is in focus, all the photons that are radiated by a point object O pass through the aperture A and are refracted by the lens to converge at the point Q in the focused image plane $I_{f}$ . The focused plane position $v$ depends on the depth u of the object and the focal length f of the lens. According to the lens law $\frac{1}{u} + \frac{1}{v} = \frac{1}{f} .$ However, when the point object O is not in

The image segmentation algorithm

The first stage of depth estimation is to isolate each object within a scene. Here we need to accurately segment a scene onto which an illumination pattern has been projected to ensure high spatial resolution in the eventual computed depth map. A crucial problem for segmentation is to manage two sources of uncertainty. These are the uncertainty in estimating the feature property in each small object region, and the spatial uncertainty of where the region's boundary lies. Moreover, these two

Depth estimation using neural networks

The properties and training algorithms of MLP networks are well documented, but less information is available on how to configure the network. In general, a feed forward neural network with one hidden layer, composed of neurons with activation functions, and a linear output layer can approximate any continuous function to the desired accuracy. In our case, the use of the three layer MLP was predetermined as its simplicity and good performance was required. The number of hidden neurons were

Experiments

We have implemented the proposed techniques for depth recovery. The scene was imaged using a TAMRON 25 mm lens converted to be telecentric by an additional aperture. The aperture diameter was normally set to 7.9 mm unless otherwise stated, which gave an f-number of 3.17. A Pulnix monochrome camera model TM-745E and a frame grabber were set to capture images of size $512 \times 512$ with 256 grey levels. Different illumination patterns, based on checkerboards and stripes were simulated by attaching printed

Conclusions

The proposed technique for object recovery consisted of two main components, image segmentation and 3-D depth estimation. Pairs of defocused images captured with different optical settings were processed to produce dense depth maps of various objects. The implementation was based on active DFD, although it was achieved by attaching an illumination pattern to the objects. The image segmentation was able to decompose the image into disjoint meaningful regions that had a strong correlation with

Summary

In this paper, a novel neural-network based approach to depth measurement, based on two defocused images with added pattern illumination, is reported. It simplifies the model architectures and improves model performance. We have integrated image segmentation with neural network learning. The solution to depth recovery is therefore a two-stage procedure: firstly objects are detected in 2-D and then 3-D depth is estimated. The object detection is performed by a multiresolution image segmentation

Acknowledgements

The authors gives acknowledgements to the China Scholarship Council for providing financial support, and the University of Warwick, UK for the research facilities provided. Thanks are also given to Mr. C. Claxton for his help with the experiments.

About the Author—LI MA received the B.S. and Ph.D. degrees in Electrical Engineering from Central South University, PR China, in 1976 and 1998 respectively. She is currently a professor at the department of computer science, Zhengzhou Institute of Light Industry, PR China. She has been an academic visitor to the College of Cardiff University of Wales, UK during 1993–1994, and she is currently a senior visiting fellow at the University of Warwick. Her research interests include signal and image

References (16)

M. Noguchi et al.
Microscopic shape from focus using a projected illumination pattern
Math. Comput. Modelling
(1996)
O. Ghita et al.
A video-rate range sensor based on depth defocus
Opt. Laser Technol.
(2001)
D. Tsai et al.
A moment-preserving approach for depth from defocus
Pattern Recognition
(1998)
D.T. Pham et al.
Depth from defocusing using a neural network
Pattern Recognition
(1999)
D.J. Park et al.
Multiresolution edge detection techniques
Pattern Recognition
(1995)
P. Schroeter et al.
Hierarchical image segmentation by multi-dimensional clustering and orientation-adaptive boundary refinement
Pattern Recognition
(1995)
A. Bandera
Scale-dependent hierarchical unsupervised segmentation of textured images
Pattern Recognition Lett.
(2001)
D.P. Mukherjee et al.
Sodar image segmentation by fuzzy C-means
Signal Processing
(1996)

There are more references available in the full text version of this article.

Cited by (9)

Hierarchical Object Relationship Constrained Monocular Depth Estimation
2021, Pattern Recognition
Citation Excerpt :
However, the multi-task assignments, such as Xu et al. [17] and Liu et al. [18], not only require hundreds of thousands of ground-truth images, but also face huge challenges in defining loss functions to jointly train the entire network. For the methods respectively proposed by Ji et al. [9] and Ma et al. [19], they all resorted to combining the CNN network with CRF to extract pixel-level features. However, the frameworks with CRF are cumbersome and difficult to make tradeoff between efficiency and accuracy.
Monocular depth estimation has been gaining growing momentum in recent years. Despite significant advances of this task, due to the inherent difficulty of reliably capturing contextual cues from RGB images, it remains challenging to accurately predict depth in scenes with complicated and cluttered spatial arrangement of objects. Instead of naively utilizing the primary features in the single RGB image, in this paper we propose a hierarchical object relationship constrained network for monocular depth estimation, which could enable accurate and smooth depth prediction from monocular RGB image. The key idea of our method is to exploit object-centric hierarchical relationship as contextual constraints to compensate for the regularity of spatial depth changing. In particular, we design a semantics-guided CNN network to encode the original image into a global context feature map and encode the objects’ relationship into a local relationship feature map simultaneously, so that we can leverage such effective and consolidated coding scheme over scenario samples to guide the depth prediction in a more accurate way. Benefiting from the local-to-global context constraints, our method can well respect the global depth changing and preserve the local depth details at the same time. In addition, our approach could make full use of the hierarchical semantic relationship across inner-object components and neighboring objects to define depth changing constraints. We conduct extensive experiments and make comprehensive evaluations on widely-used public datasets, and the experiments confirm that our method outperforms most state-of-the-art depth estimation methods in preserving the local details in depth.
Rational filter design for depth from defocus
2012, Pattern Recognition
Citation Excerpt :
Video-rate processing is a requirement for 3D TV, and fast processing extends the use of DfD for robotics and production line applications. Efficient DfD computation methods have been proposed [4,15,16]; however, in this paper, since we are concerned with video-rate depth estimation for every pixel in the image, and passive illumination, we have chosen an approach based on rational filters [4] as detailed more fully below. The optical arrangement is as shown in Fig. 1, where a point on an object Q would be in-focus at point q in an image plane if.
The paper describes a new, simple procedure to determine the rational filters that are used in the depth from defocus (DfD) procedure previously researched by Watanabe and Nayar (1998) [4]. Their DfD uses two differently defocused images and the filters accurately model the relative defocus in the images and provide a fast calculation of distance. This paper presents a simple method to determine the filter coefficients by separating the M/P ratio into a linear and a cubic error correction model. The method avoids the previous iterative minimisation technique and computes efficiently. The model has been verified by comparison with the theoretical M/P ratio. The proposed filters have been compared with the previous for frequency response, closeness of fit to M/P, rotational symmetry, and measurement accuracy. Experiments were performed for several defocus conditions. It was observed that the new filters were largely insensitive to object texture and modelled the blur more precisely than the previous. Experiments with real planar images demonstrated a maximum RMS depth error of 1.18% for the proposed, compared to 1.54% for the previous filters. Complicated objects were also accurately measured.
A modified fuzzy C-means image segmentation algorithm for use with uneven illumination patterns
2007, Pattern Recognition
Citation Excerpt :
These bias fields are slowly changing and multiplicatively imposed onto the captured images and the naturally occurring textures within them. A multiresolution analysis has been proposed to expedite the task of detecting the object in a defocused illumination image by building a pyramid, as described in Ref. [12]. This multiresolution approach is functional but with a relatively higher computational cost than the proposed method.
A novel fuzzy C-mean (FCM) algorithm is proposed for use when active or structured light patterns are projected onto a scene. The underlying inhomogeneous illumination intensity due to the point source nature of the projection, surface orientation and curvature has been estimated and its effect on the object segmentation minimized. Firstly, we modified the recursive FCM algorithm to include biased illumination field estimation. New clustering center and fuzzy clustering functions resulted based on the intensity and average intensity of a pixel neighborhood based object function. Finally, a dilation operator was used on the initial segmented image for further refinement. Experimental results showed the proposed method was effective for segmenting images illuminated by patterns containing underlying biased intensity fields. A higher accuracy was obtained than for traditional FCM and thresholding techniques.
A comprehensive review on nature inspired neural network based adaptive filter for eliminating noise in medical images
2020, Current Medical Imaging Reviews
Cat swarm optimization based functional link multilayer perceptron for suppression of gaussian and impulse noise from computed tomography images
2020, Current Medical Imaging Reviews
Convolutional neural network based clustering and manifold learning method for diabetic plantar pressure imaging dataset
2017, Journal of Medical Imaging and Health Informatics

View all citing articles on Scopus

About the Author—RICHARD C. STAUNTON received the B.Sc. (honours) degree in electronic engineering from the City University, UK, in 1973, and the Ph.D. degree in engineering from the University of Warwick, UK, in 1992. From 1973 to 1977 he worked for the aerospace industry, and from 1977 to 1986 for the UK National Health Service, where he engaged in research and development of medical image processing systems. Since 1986 he has been a lecturer at the University of Warwick. His current research interests include industrial image processing, hexagonal sampling systems, colour image processing, manufactured surface analysis, and depth from defocus.

View full text

Integration of multiresolution image segmentation and neural networks for object depth recovery

Abstract

Introduction

Section snippets

Depth from defocus

The image segmentation algorithm

Depth estimation using neural networks

Experiments

Conclusions

Summary

Acknowledgements

Math. Comput. Modelling

Opt. Laser Technol.

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition

Pattern Recognition Lett.

Signal Processing