Elsevier

Pattern Recognition

Volume 36, Issue 2, February 2003, Pages 439-449
Pattern Recognition

Coarse view synthesis using shape-from-shading

https://doi.org/10.1016/S0031-3203(02)00077-8Get rights and content

Abstract

This paper investigates the use of shape-from-shading for coarse view synthesis. The aim of our study is to determine whether needle-maps delivered by a new shape-from-shading (SFS) algorithm can be used as a compact object-representation for the purposes of efficiently generating appearance manifolds. Specifically, we aim to show that the needle-maps can be used to generate novel object views under changing light source and viewer directions. To this end we conduct two sets of experiments. Firstly, we use the recovered needle-maps to re-illuminate objects under varying lighting directions. Here we show that a single input image can be used to construct relatively faithful re-illuminations under radical illumination changes. Secondly, we investigate how the needle-map can be used to generate new object poses. Here we show that needle-maps can be used for both view interpolation and view extrapolation.

Introduction

Appearance-based object recognition has recently attracted considerable interest in the computer vision literature [1], [2]. Although there are various realisations of the idea, the unifying principle is to compute a compact representation of the 2D appearance of 3D objects under multiple viewing and illumination conditions. This means that for each object a relatively large number of images must be collected of different object poses or viewing directions and different illumination conditions. For instance, Ullman has shown how different views may be combined for the purposes of recognising intermediate object poses [2]. The parametric eigenspace of Nayar and Murase [1] achieves a degree of data compression by storing the model views implicitly as a manifold in eigenspace. Here the idea is to represent the variability of object appearance under different viewing and lighting directions using the manifold to interpolate the leading components of eigenvectors extracted from the raw images.

One of the criticisms of appearance-based object recognition is the demands it places on data collection. Sufficient image data must be accumulated so that accurate object representations can be constructed. Images must be collected and stored so as to span a sufficient number and range of viewpoints so as to allow recognition from any novel viewpoint that is likely to be encountered. Here, the concepts of characteristic views [3], [4], [5] or aspect graphs [6] may prove useful for organising or condensing the amount of information required. In particular, if we treat the idea of a characteristic view (CV) as a natural grouping or clustering of similar views, then it should be possible to represent an object by storing a single representative from within each CV.

In any event, for each viewpoint we must also store, or find some way to model, all appearance changes due to light source variations which are likely to be encountered. It is easy to see that the storage and matching requirements for such a recognition scheme have the potential to grow rapidly beyond the bounds of practicality. Moreover, the view-collection process itself may also prove expensive, and requires that the lighting conditions be carefully controlled. This means that object appearance may only be learned under highly controlled conditions. As a result, autonomously learning object appearance in an uncontrolled environment is difficult.

The observation underpinning this paper is that a more efficient strategy is to collect a small sample of object images. From this sample a set of images is generated so as to span the appearance space for a particular object. In other words, we replace data collection under controlled lighting conditions with view synthesis from a set of representative images. Since they are to be used for the purposes of constructing an appearance-based representation, the synthetic views need only be fairly coarse.

View synthesis has recently attracted considerable interests in the computer vision literature [7], [8]. The topic has been the subject of intense activity in the graphics community for some time [9] and has lead to the development of a variety of techniques for photo-realistic object rendering. However, the computer vision approach to the problem is somewhat different and revolves around automatically acquiring information concerning object geometry and appearance [9]. For instance, Poggio and Vetter, together with their co-workers have shown how to learn the appearance of faces for the purposes of synthesis [10], [11]. This approach uses a linear model. There has recently been interest in the use of least-squares estimation techniques to improve the statistical robustness of the method [12]. Several authors including Sengupta and Ohya [13], and Avidan and Shashua [14] have used affine structure for the purposes of 3D object synthesis from 2D views. There has also been considerable effort aimed at showing how the geometry of different image tokens such as lines [9], [15], [16] and regions [17] can be exploited for view synthesis. Finally, several authors have used view synthesis as a means of object morphing [18], [19].

However, here we require only relatively coarse synthetic views. We take a photometric approach to the problem using the output of shape-from-shading as the starting point. Some steps in this direction have recently been taken by Nayar and Murase [20], and by Georghiades and Kriegman [22] who show how an illumination cone can be used for view synthesis after object reconstruction has been performed. The reconstruction process is based on photometric stereo and requires a minimum of three views with known lighting. The second capability is the synthesis of new or intermediate object poses from a representative set of model views. This second point is to some extent addressed by Ullman's view interpolation work [23]. However, this method operates primarily with pictorial descriptions of objects rather than intensity images.

The observation underpinning this paper is that shape-from-shading, and by extension other shape-from-X modules such as shape-from-texture, provides an obvious yet hitherto unexplored route to coarse view synthesis. Shape-from-shading has long been a subject of active research within the vision community [24]. Its role has been to deliver a dense map of local surface orientation information from shading patterns. Shape-from-shading aims to recover the required orientation information by solving the image irradiance equation. However, there are few reported attempts to use shape-from-shading for any practical object recognition or shape analysis tasks [25]. Whilst it appears clear that one of the motivations of early shape-from-shading research was to enable a 3D representation to be derived from a single image, the difficulties encountered in achieving accurate and robust needle-map recovery have proved a serious obstacle to progress in this direction.

Many of the difficulties encountered by existing shape-from-shading schemes can be attributed to the fact that they over-smooth the recovered needle-map. We have recently developed a new framework for shape from shading [26] which addresses this problem. It offers two advantages over existing schemes. Firstly, we can impose compliance with the image irradiance equation as a hard constraint. Secondly, we impose more sophisticated constraints on the local consistency of the recovered needle-map. The needle-maps delivered by the new shape-from-shading framework contain fine surface detail and can be used to identify topographic structure not recoverable with alternative algorithms [27], [28]. Moreover, we have recently shown how surface topography information extracted from the needle-maps can be used for the purposes of 3D object recognition from 2D views [29].

Once a needle-map has been obtained, object re-illumination is a straightforward task. All that needs to be done is to modify the light source direction in the image irradiance equation and to compute the resulting image brightness using Lambert's law for matte surface reflectance. It is important to stress that Ullman [2], [30] and others have proposed the use of view-interpolation or view-combination techniques for appearance-based object recognition. However, much of the work in this area has focused upon interpolating pictorial descriptions of objects. As Ullman notes [23], interpolation between smooth objects presents greater difficulties, and shape-from-shading represents one possible method of addressing them. However, this has remained only a suggestion with no concrete substantiation in the literature. Here we present concrete experimental results which point to the practical feasibility of this approach. Specifically, we investigate both extrapolation using the needle-map obtained from a single image, and interpolation between two or more views with different object poses. The novel contribution in the current paper is therefore to investigate how our improved needle-maps can be used for object re-illumination and view synthesis.

The outline of the remainder of this paper is as follows. In Section 2 we review the new shape-from-shading algorithm. Section 3 we describe the re-illumination process, and in Section 4 our approach to novel view synthesis. Section 5 shows the experimental results of these approaches, and in Section 6 we draw conclusions and consider the outlook for developing these ideas.

Section snippets

Data-driven shape-from-shading

Our new shape-from-shading algorithm has been demonstrated to deliver needle-maps which preserve fine surface detail [26], [31]. The observation underpinning the method is that for Lambertian reflectance from a matte surface, the image irradiance equation defines a cone of possible surface normal directions. The axis of this cone points in the light-source direction and the opening angle is determine by the measured brightness. If the recovered needle-map is to satisfy the image irradiance

Object re-illumination using shape-from-shading

The needle map returned by shape-form-shading may be re-illuminated by a new light source from any chosen direction. This provides an estimate of the appearance of the object under the new lighting conditions. The input information required is a single image and a single application of the shape-from-shading algorithm. Suppose that ni,j(final) is the recovered surface normal at the pixel indexed (i,j) when the shape-from-shading scheme has reached convergence. Further suppose that the new light

Novel view generation using shape-from-shading

The generation of novel views using shape-from-shading information is significantly more complex than re-illumination, and here involves a hybrid approach between appearance-based and model-based recognition. We consider both extrapolation from a single view, and interpolation using two views. However, both operate in much the same manner. To perform these two tasks we require height data. We extract height data using the method of Wu and Li [32]. This is a relatively simple method in which the

Experiments

We have investigated using shape-from-shading information for both re-illumination and novel-view generation, using the USC range image database of busts of famous composers. Fig. 1 shows some of the images used in this study. Each bust has several range images which have been collected in different viewing directions. For each range image we have generated illuminated intensity images using the Lambertian lighting model outlined in Section 3.

Our experiments have focused upon using the images

Conclusions and outlook

Making the most of the available data is an important issue in appearance-based object recognition. It seems simply impractical to collect and store vast quantities of model views to represent an object under all possible, or even all likely, viewing conditions. In this paper we have begun to demonstrate the potential of shape-from-shading, and by extension other Shape-from-X modules, to increase of the utility of a given image for object recognition. In the case of re-illumination of objects,

About the Author—PHILIP WORTHINGTON attained BA (Hons) in Engineering and Computer Science from University College, Oxford, in 1996. From 1996–1999 he pursued an EPSRC-funded DPhil in the Computer Vision group at the University of York under the supervision of Professor Edwin Hancock. In 1998, he spent 2 months in Dr Hiroshi Murase's research group at NTT Basic Research Laboratories in Japan, courtesy of JISTEC and the British Council. In 1999 he was awarded the University of York Gibbs-Plessey

References (32)

  • R. Wang, H. Freeman, Object recognition based on characteristic view classes, Proceedings of the International...
  • S. Chen et al.

    Characteristic view modelling of curved-surface solids

    Int. J. Pattern Recognition Artif. Intell.

    (1996)
  • K.W. Bowyer et al.

    Aspect graphs: an introduction and survey of recent results

    Int. J. Imaging Systems Technol.

    (1990)
  • D.P. Greenberg

    A framework for realistic image synthesis

    Commun. ACM

    (1999)
  • B. Johansson

    View synthesis and 3D reconstruction of piecewise planar scenes using intersection lines between the planes

    Proc. ICCV

    (1999)
  • T. Vetter et al.

    Linear object classes and image synthesis from a single example image

    Pattern Anal. Machine Intell.

    (1997)
  • Cited by (2)

    • Facial view synthesis from a single image using shape from shading

      2004, Proceedings - 2nd International Symposium on 3D Data Processing, Visualization, and Transmission. 3DPVT 2004

    About the Author—PHILIP WORTHINGTON attained BA (Hons) in Engineering and Computer Science from University College, Oxford, in 1996. From 1996–1999 he pursued an EPSRC-funded DPhil in the Computer Vision group at the University of York under the supervision of Professor Edwin Hancock. In 1998, he spent 2 months in Dr Hiroshi Murase's research group at NTT Basic Research Laboratories in Japan, courtesy of JISTEC and the British Council. In 1999 he was awarded the University of York Gibbs-Plessey award to visit academic and commercial vision groups around the US. Philip was awarded his DPhil in 2000 for his thesis on shape-from-shading for view-based object recognition. Following a spell in industry, he was appointed as a Lecturer in Computer Vision in the Department of Computation, UMIST, in March 2001. He has published several papers on shape-from-shading and view-based object recognition in a range of journals and refereed conference proceedings.

    About the Author—EDWIN HANCOCK studied Physics as an undergraduate at the University of Durham and graduated with honours in 1977. He remained at Durham to complete a Ph.D. in the area of High Energy Physics in 1981. Following this he worked for ten years as a researcher in the fields of high-energy nuclear physics and pattern recognition at the Rutherford–Appleton Laboratory (now the Central Research Laboratory of the Research Councils). During this period he also held adjunct teaching posts at the University of Surrey and the Open University. In 1991 he moved to the University of York as a lecturer in the Department of Computer Science. He was promoted to Senior Lecturer in 1997 and to Reader in 1998. In 1998 he was appointed to a Chair in Computer Vision.

    Professor Hancock now leads a group of some 15 faculty, research staff and Ph.D. students working in the areas of computer vision and pattern recognition. His main research interests are in the use of optimisation and probabilistic methods for high and intermediate level vision. He is also interested in the methodology of structural and statistical pattern recognition. He is currently working on graph-matching, shape-from-X, image data-bases and statistical learning theory. His work has found applications in areas such as radar terrain analysis, seismic section analysis, remote sensing and medical imaging. Professor Hancock has published some 60 journal papers and 200 refereed conference publications. He was awarded the Pattern Recognition Society medal in 1991 for the best paper to be published in the journal Pattern Recognition. The journal also awarded him an outstanding paper award in 1997.

    Professor Hancock has been a member of the Editorial Boards of the journals IEEE Transactions on Pattern Analysis and Machine Intelligence, and, Pattern Recognition. He has also been a guest editor for special editions of the journals Image and Vision Computing and Pattern Recognition, and he is currently a guest editor of a special edition of IEEE Transactions on Pattern Analysis and Machine Intelligence devoted to energy minimization methods in computer vision. He has been on the programme committees for numerous national and international meetings. In 1997 he established a new series of international meetings on energy minimization methods in computer vision and pattern recognition. He was awarded a Fellowship of the International Association for Pattern Recognition in 2000.

    View full text