Keywords

1 Introduction

This paper presents a novel surface reconstruction algorithm that uses polarisation information from two light sources. The method initially applies a form of two-source photometric stereo (PS) to estimate part of the surface normal before polarisation information is then used in order to fully constrain the normals at each pixel. The motivation for this approach is to overcome some of the weaknesses in related approaches. Firstly, the need for three or more light sources in PS is often debilitating for real world applications. Secondly, many polarisation-based methods suffer from a range of ambiguities due to the non-monotonic mapping between the polarisation information and surface normals. The proposed method overcomes this issue using intensity information. Finally, a novel region-growing approach is applied that allows for reconstruction in the presence of both specular and diffuse reflection.

Of the many approaches to 3D vision found in the literature, polarisation vision (i.e. the use of polarised light for computer vision) is one of the less-studied. The principle of most methods of polarisation vision is that light changes its polarisation state on reflection from surfaces [1]. This is typically from unpolarised to partially linearly polarised form.

Most previous computer vision research using polarisation relies on Fresnel theory applied to specular reflection [2]. The specific properties of the polarisation correlate to the relationship between the surface orientation and the viewing direction [1, 3]. Unfortunately, there are inherent ambiguities present and the refractive index of the target is typically required. Further, different equations are required to model reflection if a diffuse component is present.

Miyazaki et al. [4], Atkinson and Hancock [5] and Berger et al. [6] all use multiple viewpoints to overcome the ambiguity issue. In [4], specular reflection is used on transparent objects – a class of material that causes difficulty for most computer vision methods. In [5], a patch matching approach is used for diffuse surfaces to find stereo correspondence and, hence, 3D data. In [6], an energy functional for a regularization-based stereo vision is applied. Taamazyan et al. [7] use a combination of multiple views and a physics-based reflection model to simultaneously separate specular and diffuse reflection and estimate shape.

Drbohlav and Šára [8, 9] use PS (as in this paper) but with linearly polarized incident illumination. Atkinson and Hancock [10] also use PS but with more light sources and less resilience to inter-reflections than the method of this paper. Garcia et al. [11] use a circularly polarised light source to overcome the ambiguities while Morel et al. [12] extend the methods of polarisation to metallic surfaces by allowing for a complex index of refraction.

In an effort to overcome the need for the refractive index, Miyazaki et al. constrain the histogram of surface normals [13] while Rahmann and Canterakis [14, 15] use multiple views and the orientation of the plane of polarisation for correspondence. Huynh et al. [16] actually estimate the refractive index using spectral information.

2 Polarisation Vision

This paper is based on the premise that light undergoes a partial polarisation process upon reflection from smooth surfaces. Consider the experimental arrangement shown in Fig. 1, which is used for all of this work. Fresnel reflectance theory [2] is able to quantify the polarisation process that occurs when initially unpolarised incident light is reflected towards the camera. The reflected light can be parametrised by three values. First is the intensity, I, of the light. Second is the phase angle, \(\phi \), which defines the principle angle of the electric field component of the light as shown in the figure. Finally, the degree of polarisation, \(\rho \), indicates the level of polarisation from 0 (unpolarised) to 1 (completely linearly polarised) [17].

Fig. 1.
figure 1

Arrangement and definitions. Two polarisation images are acquired using the camera and rotating polariser. Each image has a different source illuminated. The surface normal, \(\mathbf {N}\), is defined by its zenith angle, \(\theta \), and azimuth angle, \(\alpha \).

As explained elsewhere in the literature [1, 17], for specularly reflected light the phase angle aligns perpendicularly to the projection of the surface normal onto the image plane. Assuming that the Wolff subsurface scattering model [18] applies, diffuse reflection by contrast causes parallel alignment. Since there is no distinction in phase angle shifts of \(\pi \) radians, there are therefore four possible surface azimuthal angles, \(\alpha \) (defined in Fig. 1), for an unknown reflectance type:

$$\begin{aligned} \alpha \in \left[ \phi ,\phi +\frac{\pi }{2},\phi +\pi ,\phi +\frac{3\pi }{4}\right] \quad \mathrm {where}\quad 0\le \phi <\pi \end{aligned}$$
(1)

The degree of polarisation contains information principally related to the zenith angle of the surface. Unfortunately, the relationship between zenith angle and degree of polarisation varies substantially between specular and diffuse reflection and depends on the refractive index of the reflecting surface, which is typically unknown [17]. For this reason, its use for surface normal estimation is avoided in this paper. The method does however use the degree of polarisation as a measure of reliability of the phase data: any phase data with a corresponding degree of polarisation below a threshold (fixed at \(1\%\)) is deemed unreliable due their correspondingly high associated noise levels.

The method described in the next section requires two polarisation images of an object for input (a single polarisation image comprises a separate intensity, phase and degree of polarisation value for each pixel). There are now several commercially available cameras that capture such data directly such as the Fraunhofer “POLKA” [19]. For this paper however, a simple method is adopted where a motorised polariser is placed in front of the camera and images taken at various orientations [17]. The camera was a Dalsa Genie HM1400 fitted with a Schneider KMP-IR Xenoplan 23/1,4-M30,5 lens. Images are taken at polariser angles of \(0^{\circ }\) (horizontal), \(45^{\circ }\), \(90^{\circ }\) and \(135^{\circ }\). If the intensities measured for these angles are called \(I_0\), \(I_{45}\), \(I_{90}\) and \(I_{135}\), then the polarisation data for each cell is then calculated via the Stoke’s parameters \(S_0\), \(S_1\) and \(S_2\) (assuming no circular polarisation is present) [2]:

Stoke’s parameters:

$$\begin{aligned} S_0 = \frac{I_0 + I_{45} + I_{90} + I_{135}}{2} \end{aligned}$$
(2)
$$\begin{aligned} S_1 = I_{0} - I_{90} \end{aligned}$$
(3)
$$\begin{aligned} S_2 = I_{45} - I_{135} \end{aligned}$$
(4)

Polarisation image data:

$$\begin{aligned} I = S_0 \end{aligned}$$
(5)
$$\begin{aligned} \phi = \frac{1}{2}\arctan _2\left( S_2,S_1\right) \end{aligned}$$
(6)
$$\begin{aligned} \rho = \frac{\sqrt{S_1^2+S_2^2}}{S_0} \end{aligned}$$
(7)

where \(\arctan _2\) is the four quadrant inverse tangent [20].

Examples of the three components of the polarisation image for a white snooker ball are shown in Figs. 2 and 3. Note that the intensity is normalised to the range [0,1]. For Fig. 2, the capture conditions were ideal, with the ball illuminated by a single small white LED placed close to the camera and blackout curtains surrounding the ball to diminish all inter-reflections from the environment. The reflection is therefore of diffuse type throughout the surface and so the first or third solutions to (1) must be true for all pixels. For Fig. 3, a plain matte white board was placed behind the ball. A sudden phase shift is present near the occluding contours of the ball due to an inter-reflection from the white backing. Since this region is effectively undergoing a specular reflection, either the second or fourth solution to (1) is true. By contrast, the first or third solutions are true for the rest of the ball, which is exhibiting diffuse reflection. Note also, that the degree of polarisation is higher near the inter-reflection. This is also predicted by the reflection theory but is of less significance to this paper [17]. The noise-ridden background to Fig. 3 is due to the very low degree of polarisation for the backing and the shadow cast by the ball; neither of which are of significance here.

Fig. 2.
figure 2

Polarisation image of a white snooker ball with no inter-reflections and only one specularity. (a) Intensity I, (b) phase angle \(\phi \), (c) degree of polarisation, \(\rho \).

Fig. 3.
figure 3

Polarisation image of a white snooker ball with inter-reflections from a white sheet behind the ball. Otherwise, conditions were the same as in Fig. 2. Regions where \(\rho >0.3\) in (c) are shown white here to improve the clarity of the rest of the figure.

3 Method

The approach can be broadly divided into the following steps assuming that the starting point is two full polarisation images corresponding to the two light source locations shown in Fig. 1.

  1. 1.

    Extract estimates of 2D (y-z) surface normals from two-source PS.

  2. 2.
    1. (a)

      Extract estimates of 2D (x-y) surface normals from polarisation aided by PS.

    2. (b)

      Disambiguate spurious data or specular inter-reflections using region growing.

  3. 3.

    Combine data from each modality to obtain 3D (x-y-z) surface normals.

The first step essentially applies a 2D version of PS to obtain a 2D surface normal at each pixel. This is not merely a projection of the 3D normal onto the y-z plane but rather a vector representing one particular degree of freedom of the orientation of the surface at each point. The second step uses the polarisation phase angle of the incoming light and assumes diffuse reflection (for now) to estimate a 2D version of the surface normal in the x-y plane. PS data is applied to disambiguate most of these vectors but region growing is needed where specularities occur or where previous disambiguation is deemed incorrect. Finally, the information above is combined to form a 3D surface normal map.

3.1 Application of PS

The aim here is to use the method of PS to estimate a 2D normal in the y-z plane at each point. To do this, the original methodology of Woodham [21] is adapted to two images in order to obtain a set of n normals as follows:

$$\begin{aligned} \left\{ \mathbf {N}^\mathrm {(ps)}_i= \left[ \begin{array}{ c } N^\mathrm {(ps)}_{y,i} \\ N^\mathrm {(ps)}_{z,i} \end{array} \right] \right\} _{i=1}^n\leftarrow \left[ \begin{array}{ c c } -\cos (\beta _R) &{} \cos (\beta _L) \\ \sin (\beta _R) &{} -\sin (\beta _L) \end{array} \right] \left[ \begin{array}{ c } I_{L,i} \\ I_{R,i} \end{array} \right] \,\forall \, i \end{aligned}$$
(8)

where the angles \(\beta _L\) and \(\beta _R\) are defined in Fig. 1 and the suffix “ps” is a reminder that these estimates are from photometric stereo. The “\(\leftarrow \)” symbol is used throughout this paper to refer to variable assignment.

3.2 Application of Polarisation Vision

This section describes the method to estimate 2D normals in the x-y plane. It assumes that the method for polarisation image acquisition described in Sect. 2 has been applied to arrive at corresponding intensity, phase and degree of polarisation values for all n pixels: \(\left\{ I_i,\phi _i,\rho _i\right\} _{i=1}^n\). The result of PS, \(\left\{ \mathbf {N}_i^\mathrm {(ps)}\right\} _{i=1}^n\), is also used. This part of the algorithm is in two stages (Algorithm 1).

figure a

The raw data yields two estimates of polarisation values for each pixel (one corresponding to each light source direction). In theory, the data should be identical between each polarisation image, except for the locations of specularities and shadows. The proposed algorithm first forms a new polarisation image based on the polarisation data from the image corresponding to the highest intensity at each pixel.

The first three lines of Algorithm 1 are designed to obtain an initial azimuth angle estimation. The first line applies a sharpening operator to the phase data. The reason for this is that the subsequent specularity detection/disambiguation algorithm (“localAlign”) is more reliable in the presence of sharp transitions between specular inter-reflections and diffuse regions.

The method makes the initial assumption that all surface normal projections onto the x-y (image) plane are aligned parallel to, or anti-parallel to, the phase angle, as predicted by the theory for diffuse reflection covered in Sect. 2. Line 2 sets the azimuthal angle, \(\alpha \), of each point accordingly. Whether said projections are parallel or anti-parallel depend on the best match to the estimated 2D normal from PS. Line 3 simply generates a set of 2D surface normals on the x-y plane using the calculated azimuth angles. The suffix “po” reminds us that these estimates are primarily from polarisation.

It is expected that most of the normal estimates will be correct at this point. However, it is likely that some regions of the image will be incorrect due to one of the following reasons:

  • Diffusely reflecting regions with incorrect disambiguation. This is typically where \(N_y^\mathrm {(ps)}\approx 0\), meaning the initial disambiguation is not robust. These regions have an azimuth error of \(\pi \) radians.

  • Specular (direct or inter-reflective) regions. In these areas the azimuth angle error is \(\pi /2\) radians, as predicted by the theory described in Sect. 2.

Lines 4 to 7 of Algorithm 1 are intended to address these possibilities, while enforcing (1). First, a set of pixels, \(\left\{ R_i\right\} _{i=1}^n\) are determined that should not be further considered by the algorithm. In the first instance, these pixels correspond to image border pixels and those with very low (<1%) degree of polarisation.

Next, a seed point is chosen. This can easily be done either manually or randomly using a point of high confidence (i.e where the degree of polarisation is high and either \(N_y^\mathrm {(ps)}\ll 0\) or \(N_y^\mathrm {(ps)}\gg 0\)). One weakness here however, is that this point must be of diffuse reflection and so may benefit from some heuristics-based selection in future work. Assume for now that only one seed point is needed, but note that the code permits more if necessary (e.g. for more complicated shapes). The rest of the process involves a recursive call to the region growing function, which progressively aligns spurious normals according to the constraints of (1).

figure b

The region growing works as follows (see Algorithm 2). First, the function checks whether the point in question j should be considered by reference to \(\left\{ R_i\right\} \) (lines 1 and 2). On occasions where \(R_j=1\), the function call terminates. Where this is not the case, point j is added to \(\left\{ R_i\right\} \), i.e. so it is not considered again. The remainder of the function is completed for each neighbour of the point (line 4). The neighbourhood is defined as simple 4-connected region for this paper.

Next, the angle between neighbouring x-y surface normals is calculated, \(\varDelta \alpha \) (line 5). There are four possibilities for \(\varDelta \alpha \):

  • \(\varDelta \alpha \) is close to zero: assume both azimuth angles are correct, move from j to neighbouring point, k, and continue the process (lines 6 and 7).

  • \(\varDelta \alpha \) is close to \(\pi \): as above but assume azimuth disambiguation was incorrect so rotate by \(\pi \) (lines 8 to 10).

  • \(\varDelta \alpha \) is close to \(\pi /2\): rotate azimuth by \(\pi /2\) as this region has specular reflection. The direction of rotation is chosen to minimise the angle between normals at j and k. Again, move from j to k and continue (lines 11 to 17).

  • Otherwise: there could be a boundary of orientation so there is no reason to believe that the azimuth estimate is erroneous.

The definition of “close” for this recursive function remains open for now but only minor effects on results were observed as the threshold for closeness was varied between \(5^{\circ }\) and \(10^{\circ }\).

3.3 Fusion of Data into Full 3D Surface Normals

The methods from the previous two sections give two sets of 2D surface normals: \(\left\{ \mathbf {N}^\mathrm {(ps)}\right\} \) on the y-z plane and \(\left\{ \mathbf {N}^\mathrm {(po)}\right\} \) on the x-y plane. These can be combined into a single 3D vector. First assume that the 2D vectors are normalised such that the following is true for each point:

$$\begin{aligned} {N_x^\mathrm {(po)}}^2+{N_y^\mathrm {(po)}}^2=1= {N_y^\mathrm {(ps)}}^2 + {N_z^\mathrm {(ps)}}^2 \end{aligned}$$
(9)

where the subscripts i are omitted for the sake of brevity.

Denote the 3D surface normal at a particular point \(\mathbf {N}=[N_x,N_y,N_z]^T\). Further, choose the components estimated by polarisation for x and y so there is only one unknown, \(N_z\):

$$\begin{aligned} \mathbf {N} = \left[ N_x,N_y,N_z\right] ^T = \left[ N_x^\mathrm {(po)},N_y^\mathrm {(po)},N_z\right] ^T \end{aligned}$$
(10)

Normalising this vector to unit length gives:

$$\begin{aligned} \mathbf {n} = \frac{\left[ N_x^\mathrm {(po)}, N_y^\mathrm {(po)}, N_z\right] ^T}{\sqrt{1+N_z^2}} = \left[ n_x^\mathrm {(po)},n_y^\mathrm {(po)},n_z\right] ^T \end{aligned}$$
(11)

This can then be re-normalised such that \({n_y^\mathrm {(po)}}^2 + n_z^2 = 1\), matching the form of the 2D estimate from PS as in (9). For the z component specifically:

$$\begin{aligned} \frac{n_z}{\sqrt{{n_y^\mathrm {(po)}}^2+n_z^2}}=N_z^\mathrm {(ps)}\end{aligned}$$
(12)

Substituting the components of \(\mathbf {n}\) from (11) and simplifying:

$$\begin{aligned} N_z^\mathrm {(ps)}=\frac{N_z}{\sqrt{{N_y^\mathrm {(po)}}^2+N_z^2}} \end{aligned}$$
(13)

Rearranging:

$$\begin{aligned} N_z = \left| N_y^\mathrm {(po)}\right| \left( \frac{1}{{N_z^\mathrm {(ps)}}^2}-1\right) ^{-1/2} \end{aligned}$$
(14)

where the \(\left| \cdot \right| \) sign is used since \(N_z\) must be positive. This means that the only unknown in (10) is resolved and so the surface normal is fully determined.

As stated earlier, the regions of the images where the degree of polarisation is less that 1% are not used in the algorithm to ensure robustness. For the results in the next section, bi-cubic interpolation was used to estimate the normals for these areas. It was also found that improvements could be made by interpolating over regions of very low \(N_y\), although this was kept to a minimum. It is acknowledged that more sophisticated methods may be more appropriate for future work.

After completion of the surface normal calculations, the depth can be obtained by integration. For this paper, the well-established Frankot-Chellappa method [22] is used. The method is fast and highly robust to noise, but can suffer from over-smoothing.

4 Results

Consider Fig. 2, the polarisation image for a white snooker ball captured in ideal conditions as described in Sect. 2. The angle between the camera and light sources was \(\beta _L=\beta _R=19.6^{\circ }\) (which was experimentally determined to be a reasonable trade-off between reconstruction accuracy and practicality). One hundred images were captured at a frame rate of \(60\,\mathrm {Hz}\) at each of the four polariser angles (0, \(45^{\circ }\), \(90^{\circ }\) and \(135^{\circ }\)) and the mean intensity used at each pixel to minimise noise. As expected, the phase angle directly relates to the surface azimuth up to a \(180^{\circ }\) ambiguity and the degree of polarisation is highest near the occluding contours [17].

As mentioned earlier in this paper and elsewhere in the literature, one of the issues facing polarisation vision algorithms is the set of complications caused by the simultaneous presence of specular and diffuse reflection. To test the robustness of the method proposed here, a second polarisation image was captured in exactly the same conditions as for Fig. 2 but with a large planar matte white surface placed \(128\,\mathrm {mm}\) behind the target object. The resulting image is shown in Fig. 3, as also discussed in Sect. 2. The inter-reflection can only just be seen in the intensity image, yet the \(90^{\circ }\) phase shift and increased degree of polarisation make it clear in those two components of the polarisation image.

Fig. 4.
figure 4

Surface normals and depth estimated from the polarisation image shown in Fig. 2 (left) and Fig. 3 (right). Surface normals are encoded by colour (azimuth) and saturation (zenith). (Color figure online)

The results of applying the surface normal estimation algorithm to the polarisation images shown in Figs. 2 and 3 are shown in Fig. 4. The results are qualitatively good although the region of interpolation (described in Sect. 3.3) near the top-centre and bottom-centre of the normal maps is clearly apparent. Note that the data has been cropped here so only the object itself is being integrated.

Figure 5 shows a comparison of the profile of the reconstructions from Fig. 4 to ground truth data. To aid comparison, the profiles are aligned such that the tops of each curve are touching. Very similar results were found for pink and yellow balls, while blue and green results were slightly poorer due to their lower brightness causing higher noise levels.

Fig. 5.
figure 5

Comparison between the estimated depth profile of the snooker ball and ground truth for the data represented in Fig. 4. (Color figure online)

One potential weakness of the method is the presence of high noise levels for areas of low degree of polarisation. Results so far in this paper overcame this in a rather expensive manner by capturing 100 images in very rapid succession for each of the four polariser angles. The effects of using fewer images is illustrated by Fig. 6 which shows the estimated surface normal (z-component only for compactness) using 1, 5, 20 and 100 images. At first sight, the quality of results is poor when only few images are used. However, the increased noise has a relatively low impact on surface height reconstruction, as shown in the figure. This is due to the facts that (1) the Frankot-Chellappa surface integrator is smoothing over the noise, and (2) the higher noise regions correspond to areas of low zenith angle which always have smaller effects on the integration than those of larger zenith angle.

Fig. 6.
figure 6

Estimated (a) z-component of the surface normals and (b) surface height from the method using varying number of images per polariser angle.

Tests on objects of more complicated geometry are shown in Fig. 7. The general geometry of the apple has clearly been reproduced. The model dinosaur also has reasonable shape, despite a few areas of incorrect disambiguation on the foot and head. It is hoped that such issues might be resolved in future work by optimising the location and number of seed points used.

Fig. 7.
figure 7

Surface reconstructions of an apple and a porcelain dinosaur model.

5 Conclusion

This paper has demonstrated how PS and polarisation vision can be combined in order to overcome some of the weaknesses of each. The method may be useful, for example, in robotics where only two-source PS is feasible (it is often easy to add one light each side of a robot but difficult to place a third high above the ground). Results are comparable or better than baseline methods for PS or polarisation alone in the conditions considered. By way of example, the snooker ball in Fig. 2 yielded an \(\ell _2\)-norm error of 0.066 for the proposed method compared to 0.089 for standard four-source PS. Use of polarisation alone yielded 0.070 but this required knowledge of refractive index and manual azimuth angle disambiguation [17].

At present, the data is captured using a rotating polariser and switching light sources meaning the data takes several seconds to capture. However, the method can be applied to data captured from a polarisation camera such as the Polka. This means that the only significant limitation in capture time relates to the light source switching. Plans for future work include testing the method at high-speed with a polarisation camera, adapting the method to work with metallic and rougher surfaces, and attempting to recover the geometry of an entire scene rather than a single object.