Integration of multiresolution image segmentation and neural networks for object depth recovery
Introduction
Depth measurement is one of the most important tasks in many computer vision applications, including three dimensional object recognition, scene interpretation, post inspection and part manipulation. Three-dimensional (3-D) positional information restoration can be obtained using various techniques, among which depth from defocus (DFD) methods have the advantage that they require only two co-axial images obtained with different optical settings. DFD avoids the missing part and correspondence problems that occur with stereo. Despite these merits, DFD shares one inherent weakness with stereo and motion techniques in that it requires the scene to contain natural or projected textures. In the work presented here we used projected texture (active illumination).
Several researchers have developed accurate, dense depth estimations from defocused images in the past decade. The DFD method, originally developed by Pentland [1], uses the relative defocus in two images taken with different camera settings to determine scene structures. Many other techniques have followed, and these fall into the two main categories of Fourier and spatial domain based modelling. Subbarao has successively presented depth models for both domains [2]. Nayar et al. [3] give a precise blur analysis in the frequency domain using focus operators as models. They considered both actively and passively illuminated scenes [4]. Furthermore, they proposed telecentric optics [5] to achieve magnification invariance under changes in the focus setting. Their technique employed a small bank of broadband rational filters [6] able to handle arbitrary textures. The method computed efficiently and produced accurate results even for weak textures. Ghita and Whelan [7] reported a practical DFD implementation based on simple filters and a striped illumination pattern. They later use this algorithm in a bin picking application [8].
Recently, the techniques of artificial neural networks (ANNs), an empirical modelling in the spatial domain, have been applied to the DFD problem. ANNs have the properties of robustness and adaptation to approximate any non-linear function, so the stringent requirements for optical settings have been reduced compared to the earlier techniques. Tsai [9] proposed an algorithm to estimate the amount of blur from a single camera, in which the blur is calculated using a moment-preserving technique. The ANNs are only used to compensate for certain depth errors. Pham and Aslantas [10] have presented a technique employing a multi-layer perceptron (MLP) network to compute distances from derivative images of blurred edges. The theory of the MLP is described by Pinkus [11]. In addition, Jong and Huang [12] have explored the Radical basis Function (RBF) neural network for blur scale detection of the point-spread function (PSF). Presently, there are few ANN based approaches to DFD oriented object recovery in the literature. The main problems for ANN based depth models are to build robust, accurate depth estimates with reasonably small networks, and the trade-off between the amount of pre-processed input data required and the efficiency achieved by the training procedure. In this paper, a novel ANN based approach for depth measurement is reported that simplifies model architecture and improves model performance. We have integrated image segmentation with neural network learning, to solve depth recovery by a two-stage procedure, in which two-dimensional (2-D) object segmentation is followed by 3-D depth model formation. The first stage can be viewed as data pre-processing before the depth modelling stage. A multiresolution scheme, used for edge detection in [13], [14], was applied at the first stage with the objectives of reducing the data needed to form the depth model in the later stage, and to provide a reliable segmentation for the pattern-based image. Firstly, the data from one defocused image is processed to form a multiresolution pyramid, in which the subsequent levels have progressively lower image resolution, but preserve the essential depth information in the similarity measures between parent–child nodes at the neighbouring levels. Only one image is required in this stage as a telecentric lens was employed to eliminate any magnification of objects between images. Finally an unsupervised fuzzy clustering was applied at a working level, defined in Section 3.4, to produce isolated object regions. In the depth estimation stage, a depth model in a three-layered neural network, whose architecture is determined by depth feature extractions, was generated using a back-propagation algorithm with the training data derived from the previous stage at a low resolution level and camera calibration data. The basic framework of our approach is shown in Fig. 1. Firstly, the two defocused grey level images with the projected illumination pattern are segmented to decompose the scene into distinct meaningful regions. The resulting data are derived from the object regions at a low resolution in order to ease the burden of network learning and to reduce the uncertainty in object detection. To estimate the 3-D information, reliable feature vectors from the first stage, that are to be input to the nodes of the neural network, are selected to provide useful data related to depth. Finally, the ANN model is generated to perform the object depth recovery. After building the ANN model, it can be used to calculate the depth of objects in unseen or partly seen images, that is images that were not part of the training set. This work is the first reported in the literature to use neural networks to calculate the depth from a pair of defocused images. It is also novel in using an object detection stage followed by the depth recovery stage. Experimental results with different illumination patterns are demonstrated to show the effectiveness of the approach.
This paper is organized as follows: Section 2 overviews the theory of DFD and the derivation of the depth formulae. Section 3 gives a detailed illustration of the multiresolution techniques for 2-D object detection using the image segmentation and boundary forming algorithms on blurred images. Section 4 presents a MLP solution to object recovery using the pre-processed images and depth related features. Section 5 demonstrates the experimental results and includes discussion on the depth accuracy and the effect of varying the illumination pattern. The paper is concluded in Section 6.
Section snippets
Depth from defocus
To illustrate the concept of recovering the depth from defocus, the basic image formation geometry is shown in Fig. 2. When an image is in focus, all the photons that are radiated by a point object O pass through the aperture A and are refracted by the lens to converge at the point Q in the focused image plane . The focused plane position depends on the depth u of the object and the focal length f of the lens. According to the lens lawHowever, when the point object O is not in
The image segmentation algorithm
The first stage of depth estimation is to isolate each object within a scene. Here we need to accurately segment a scene onto which an illumination pattern has been projected to ensure high spatial resolution in the eventual computed depth map. A crucial problem for segmentation is to manage two sources of uncertainty. These are the uncertainty in estimating the feature property in each small object region, and the spatial uncertainty of where the region's boundary lies. Moreover, these two
Depth estimation using neural networks
The properties and training algorithms of MLP networks are well documented, but less information is available on how to configure the network. In general, a feed forward neural network with one hidden layer, composed of neurons with activation functions, and a linear output layer can approximate any continuous function to the desired accuracy. In our case, the use of the three layer MLP was predetermined as its simplicity and good performance was required. The number of hidden neurons were
Experiments
We have implemented the proposed techniques for depth recovery. The scene was imaged using a TAMRON 25 mm lens converted to be telecentric by an additional aperture. The aperture diameter was normally set to 7.9 mm unless otherwise stated, which gave an f-number of 3.17. A Pulnix monochrome camera model TM-745E and a frame grabber were set to capture images of size with 256 grey levels. Different illumination patterns, based on checkerboards and stripes were simulated by attaching printed
Conclusions
The proposed technique for object recovery consisted of two main components, image segmentation and 3-D depth estimation. Pairs of defocused images captured with different optical settings were processed to produce dense depth maps of various objects. The implementation was based on active DFD, although it was achieved by attaching an illumination pattern to the objects. The image segmentation was able to decompose the image into disjoint meaningful regions that had a strong correlation with
Summary
In this paper, a novel neural-network based approach to depth measurement, based on two defocused images with added pattern illumination, is reported. It simplifies the model architectures and improves model performance. We have integrated image segmentation with neural network learning. The solution to depth recovery is therefore a two-stage procedure: firstly objects are detected in 2-D and then 3-D depth is estimated. The object detection is performed by a multiresolution image segmentation
Acknowledgements
The authors gives acknowledgements to the China Scholarship Council for providing financial support, and the University of Warwick, UK for the research facilities provided. Thanks are also given to Mr. C. Claxton for his help with the experiments.
About the Author—LI MA received the B.S. and Ph.D. degrees in Electrical Engineering from Central South University, PR China, in 1976 and 1998 respectively. She is currently a professor at the department of computer science, Zhengzhou Institute of Light Industry, PR China. She has been an academic visitor to the College of Cardiff University of Wales, UK during 1993–1994, and she is currently a senior visiting fellow at the University of Warwick. Her research interests include signal and image
References (16)
- et al.
Microscopic shape from focus using a projected illumination pattern
Math. Comput. Modelling
(1996) - et al.
A video-rate range sensor based on depth defocus
Opt. Laser Technol.
(2001) - et al.
A moment-preserving approach for depth from defocus
Pattern Recognition
(1998) - et al.
Depth from defocusing using a neural network
Pattern Recognition
(1999) - et al.
Multiresolution edge detection techniques
Pattern Recognition
(1995) - et al.
Hierarchical image segmentation by multi-dimensional clustering and orientation-adaptive boundary refinement
Pattern Recognition
(1995) Scale-dependent hierarchical unsupervised segmentation of textured images
Pattern Recognition Lett.
(2001)- et al.
Sodar image segmentation by fuzzy C-means
Signal Processing
(1996)
Cited by (9)
Hierarchical Object Relationship Constrained Monocular Depth Estimation
2021, Pattern RecognitionCitation Excerpt :However, the multi-task assignments, such as Xu et al. [17] and Liu et al. [18], not only require hundreds of thousands of ground-truth images, but also face huge challenges in defining loss functions to jointly train the entire network. For the methods respectively proposed by Ji et al. [9] and Ma et al. [19], they all resorted to combining the CNN network with CRF to extract pixel-level features. However, the frameworks with CRF are cumbersome and difficult to make tradeoff between efficiency and accuracy.
Rational filter design for depth from defocus
2012, Pattern RecognitionCitation Excerpt :Video-rate processing is a requirement for 3D TV, and fast processing extends the use of DfD for robotics and production line applications. Efficient DfD computation methods have been proposed [4,15,16]; however, in this paper, since we are concerned with video-rate depth estimation for every pixel in the image, and passive illumination, we have chosen an approach based on rational filters [4] as detailed more fully below. The optical arrangement is as shown in Fig. 1, where a point on an object Q would be in-focus at point q in an image plane if.
A modified fuzzy C-means image segmentation algorithm for use with uneven illumination patterns
2007, Pattern RecognitionCitation Excerpt :These bias fields are slowly changing and multiplicatively imposed onto the captured images and the naturally occurring textures within them. A multiresolution analysis has been proposed to expedite the task of detecting the object in a defocused illumination image by building a pyramid, as described in Ref. [12]. This multiresolution approach is functional but with a relatively higher computational cost than the proposed method.
A comprehensive review on nature inspired neural network based adaptive filter for eliminating noise in medical images
2020, Current Medical Imaging ReviewsCat swarm optimization based functional link multilayer perceptron for suppression of gaussian and impulse noise from computed tomography images
2020, Current Medical Imaging ReviewsConvolutional neural network based clustering and manifold learning method for diabetic plantar pressure imaging dataset
2017, Journal of Medical Imaging and Health Informatics
About the Author—LI MA received the B.S. and Ph.D. degrees in Electrical Engineering from Central South University, PR China, in 1976 and 1998 respectively. She is currently a professor at the department of computer science, Zhengzhou Institute of Light Industry, PR China. She has been an academic visitor to the College of Cardiff University of Wales, UK during 1993–1994, and she is currently a senior visiting fellow at the University of Warwick. Her research interests include signal and image processing, pattern recognition, neural networks.
About the Author—RICHARD C. STAUNTON received the B.Sc. (honours) degree in electronic engineering from the City University, UK, in 1973, and the Ph.D. degree in engineering from the University of Warwick, UK, in 1992. From 1973 to 1977 he worked for the aerospace industry, and from 1977 to 1986 for the UK National Health Service, where he engaged in research and development of medical image processing systems. Since 1986 he has been a lecturer at the University of Warwick. His current research interests include industrial image processing, hexagonal sampling systems, colour image processing, manufactured surface analysis, and depth from defocus.