Translational photometric alignment of single-view image sequences

https://doi.org/10.1016/j.cviu.2012.01.005Get rights and content

Abstract

Photometric stereo is a well-established method to estimate surface normals of an object. When coupled with depth-map estimation, it can be used to reconstruct an object’s height field. Typically, photometric stereo requires an image sequence of an object under the same viewpoint but with differing illumination directions. One crucial assumption of this configuration is perfect pixel correspondence across images in the sequence. While this assumption is often satisfied, certain setups are susceptible to translational errors or misalignments across images. Current methods to align image sequences were not designed specifically for single-view photometric stereo. Thus, they either struggle to account for changing illumination across images, require training sets, or are overly complex for these conditions. However, the unique nature of single-view photometric stereo allows one to model misaligned image sequences using the underlying image formation model and a set of translational shifts. This paper introduces such a technique, entitled translational photometric alignment, that employs the Lambertian model of image formation. This reduces the alignment problem to minimizing a nonlinear sum-squared error function in order to best reconcile the observed images with the generative model. Thus, the end goal of translational photometric alignment is not only to align image sequences, but also to produce the best surface-normal estimates given the observed images. Controlled experiments on the Yale Face Database B demonstrate the high accuracy of translational photometric alignment. The utility and benefits of the technique are further illustrated by additional experiments on image sequences suffering from uncontrolled real-world misalignments.

Highlights

► Translational mismatch is modelled using image formation equation. ► Misalignments are corrected using nonlinear least squares. ► Structure of problem allows efficient gradient and Hessian computation. ► Severely misaligned and challenging image sequences are aligned with high accuracy. ► Tangible benefits are provided to a real-world scenario.

Introduction

Photometric stereo, introduced by Woodham [1], is a well-established method to calculate surface normals of an object. When followed by depth-map estimation, photometric stereo is also a crucial step in reconstructing object surfaces. Typically, photometric stereo is performed using a sequence of images of an object with known and controlled illumination all under the same viewpoint. When employing single-view photometric stereo, it is assumed that every image in the sequence shares perfect pixel correspondence.

Normally, this poses no problems, as photometric stereo is usually applied using a fixed camera setup in environments stable enough to ensure adequate pixel correspondence. However, certain image capture setups may not guarantee pixel correspondence between images. In particular, image frames may be corrupted by relative translational misalignments.

Important examples of such scenarios include performing photometric stereo with reflected light [2] or scanning electron microscopy [3], [4]. Apart from microscopy, it may be difficult in other imaging setups to completely fix the object of interest. For instance, as this paper will demonstrate, when capturing faces a human subject may move his or her head enough to affect photometric stereo performance.

The changing pixel intensities across light directions in an image sequence make it challenging to correct for translational misalignments. However, the distinct nature of single-view photometric stereo image sequences offers a specific means of alignment. Nonetheless, as image alignment and changing pose is a well-researched area, the scope of this problem overlaps with important areas of related work.

In the past decade, published works have introduced powerful Multi-View Photometric Stereo techniques [5], [6], [7], [8]. As these works are designed for image sequences taken with a camera whose position varies in world coordinates over time, they have none of the pixel correspondence requirements of traditional single-view photometric stereo. Nevertheless, while these techniques are very powerful, they are also unavoidably complex as they are designed for highly challenging image-capture scenarios where the viewpoint is intentionally changed. This complexity is not needed for image capture scenarios that attempt to use the same viewpoint, but suffer from translational errors in alignment. Instead, translational misalignments can be modelled in image space, avoiding much of the complexity inherent in these techniques.

Silhouette-Based Alignment is a common approach to image registration. These techniques are often based on global criteria, such as moments. When performing translational alignment, centreing images using silhouette centroids is the appropriate moment-based method to use [9]. This is called centroid alignment. However, global silhouette-based methods are sensitive to errors or changes in the silhouette boundary [10]. This poses a problem with photometric-stereo image sequences, as boundaries of extracted silhouettes may differ at every illumination direction due to differences in pixel intensities. Other more robust alternatives use methods that attempt to model noise and occlusions into the silhouette description. These methods can be based on representations describing the boundary or the entire silhouette [11]. Unfortunately, when using non-global silhouette-based methods, it is not clear how to best evaluate the results [11]; thus, there is no agreed upon way to determine when error-prone silhouettes have been successfully aligned. Finally, both global and non-global silhouette matching require an object mask, which is not always possible to generate.

Intensity-Based Alignment methods involve selecting features between a target and source image, determining correspondences between these features, and finally computing a transform aligning the target image with the source [12]. Typical features include corners, lines, and image patches. Two very well known examples of registration techniques using image patches as features are cross-correlation [13] and optical flow [14]. Regardless of the features used for alignment, intensity-based techniques are all based in some way upon pixel values in the images. As a result, traditional image patch-based methods will struggle when aligning photometric stereo image sequences, and the changing intensity levels across images make consistent and reliable detection of lower-level features, such as lines or corners, a much more difficult problem. These reasons motivate the use of features outside of traditional intensity-based ones.

Along those lines, Appearance-Varying Alignment enhancements to the optical flow procedure have provided a means to match intensity values under varying illumination [15], [16]. Yet, aside from simple variations such as gain and offset, these techniques require computing basis images using a training set of the object. This is not necessarily a drawback for tracking applications, which these techniques are designed in part to address. In fact, computing basis images beforehand can often be desirable for real-time tracking, as it delegates a significant amount of computing to an offline task. However, training sets are not practical for aligning photometric-stereo image sequences, especially for large datasets of objects.

In the context of photometric stereo, a better option is to use Model-Based Alignment, which employs basis images derived from models of image formation. Armed with such a model, no training set is required. When imaging objects with diffuse reflectance characteristics, the Lambertian model of image formation is well-established for relating pixel intensities to surface normals and illumination direction. Thus, by incorporating the Lambertian model of image formation, one can exploit intensity-based information in the alignment technique. This paper describes such an image alignment technique. Called translational photometric alignment (TPA), this novel alignment technique is designed to align single-view images of an object illuminated from differing directions.

Section snippets

Method

The foundation of TPA is a model explaining the generation of a misaligned image sequence. Section 2.1 develops such a model, whose framework incorporates Lambertian reflectance and translational misalignments. Using this model, the problem of estimating misalignments corrupting image sequences is reduced to minimizing a nonlinear function. The practical steps required to minimize this nonlinear function are detailed in Section 2.2.

Results

This section outlines the results of two experiments that explore the performance and benefits of TPA. The first experiment, discussed in Section 3.1, tests the ability of TPA to correct for known translations on the Yale Face Database B [27]. This allows the performance of TPA to be judged under a controlled but challenging scenario. On the other hand, Section 3.2 tests whether TPA can provide tangible benefits to a real-world scenario suffering from uncontrolled translational misalignments.

Discussion

TPA is applicable to any image capture setup susceptible to translational misalignments. In many situations, to automatically obtain images of an object illuminated from differing directions, it is more feasible to rotate the object rather than the light source. This is particularly true when imaging microscopic objects. For instance, rotating the particle is often the only option when executing photometric stereo using scanning electron microscopy (SEM), due to the inherent complexities and

Conclusion

This paper presented a novel alignment routine called TPA that is designed specifically to correct for translational misalignments in a sequence of photometric stereo images. The routine requires that the object in each image is illuminated by known directions. By incorporating a model of image formation directly into its error term, TPA can align images where aligned pixels have varying intensities. This paper detailed a solution using Lambertian reflectance, but other models are possible.

Acknowledgments

The authors gratefully acknowledge feedback provided by anonymous reviewers, as well as the support provided by the Natural Sciences and Engineering Research Council (NSERC) of Canada and Alberta Innovates–Technology Futures.

References (32)

  • L. Zhang, B. Curless, A. Hertzmann, S.M. Seitz, Shape and motion under varying illumination: unifying structure from...
  • R.C. Veltkamp et al.

    State of the art in shape matching

  • A.A. Goshtasby

    2-D and 3-D Image Registration: for Medical, Remote Sensing, and Industrial Applications

    (2005)
  • R.C. Gonzalez et al.

    Digital Image Processing

    (2008)
  • B. Lucas, T. Kanade, An iterative image registration technique with an application to stereo vision, in: Proceedings of...
  • S. Baker, R. Gross, I. Matthews, Lucas-kanade 20 Years on: A Unifying Framework: Part 3, Tech. Rep. CMU-RI-TR-03-35,...
  • Cited by (1)

    This paper has been recommended for acceptance by Siome Klein Goldenstein.

    View full text