Combining Shape from Shading and Stereo: A Joint Variational Method for Estimating Depth, Illumination and Albedo

Maurer, Daniel; Ju, Yong Chul; Breuß, Michael; Bruhn, Andrés

doi:10.1007/s11263-018-1079-1

Combining Shape from Shading and Stereo: A Joint Variational Method for Estimating Depth, Illumination and Albedo

Published: 27 March 2018

Volume 126, pages 1342–1366, (2018)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Daniel Maurer ORCID: orcid.org/0000-0002-3835-2138¹,
Yong Chul Ju¹,
Michael Breuß² &
…
Andrés Bruhn¹

1909 Accesses
22 Citations
6 Altmetric
Explore all metrics

Abstract

Shape from shading (SfS) and stereo are two fundamentally different strategies for image-based 3-D reconstruction. While approaches for SfS infer the depth solely from pixel intensities, methods for stereo are based on a matching process that establishes correspondences across images. This difference in approaching the reconstruction problem yields complementary advantages that are worthwhile being combined. So far, however, most “joint” approaches are based on an initial stereo mesh that is subsequently refined using shading information. In this paper we follow a completely different approach. We propose a joint variational method that combines both cues within a single minimisation framework. To this end, we fuse a Lambertian SfS approach with a robust stereo model and supplement the resulting energy functional with a detail-preserving anisotropic second-order smoothness term. Moreover, we extend the resulting model in such a way that it jointly estimates depth, albedo and illumination. This in turn makes the approach applicable to objects with non-uniform albedo as well as to scenes with unknown illumination. Experiments for synthetic and real-world images demonstrate the benefits of our combined approach: They not only show that our method is capable of generating very detailed reconstructions, but also that joint approaches are feasible in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Shading-Aware Multi-view Stereo

A Variational Approach to Shape-from-Shading Under Natural Illumination

Multi-view Shape from Shading Constrained by Stereo Image Analysis

Notes

Model by Ben Dansie (www.thingiverse.com/thing:144775).
A 3D computer graphics software (www.blender.org).

References

Agarwal, S., Snavely, N., Simon, I., Seitz, S. M., & Szeliski, R. (2009). Building Rome in a day. In Proceedings of IEEE International Conference on Computer Vision (pp. 72–79).
Baillard, C., & Maître, H. (1999). 3-D reconstruction of urban scenes from aerial stereo imagery: A focusing strategy. Computer Vision and Image Understanding, 76(3), 244–258.
Article Google Scholar
Basha, T., Moses, Y., & Kiryati, N. (2012). Multi-view scene flow estimation: A view centered variational approach. International Journal of Computer Vision, 101(1), 6–21.
Article MathSciNet Google Scholar
Black, M. J., & Anandan, P. (1991). Robust dynamic motion estimation over time. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 296–302).
Blake, A., Zisserman, A., & Knowles, G. (1985). Surface descriptions from stereo and shading. Image and Vision Computing, 3(4), 183–191.
Article Google Scholar
Bleyer, M., Rhemann, C., & Rother, C. (2011). PatchMatch stereo: Stereo matching with slanted support windows. In Proceedings of British Machine Vision Conference (pp. 14:1–14:11).
Boulanger, J., Elbau, P., Pontow, C., & Scherzer, O. (2011). Non-local functionals for imaging. In Fixed-point algorithms for inverse problems in science and engineering, Springer Optimization and Its Applications (pp. 131–154). Springer. https://doi.org/10.1007/978-1-4419-9569-8_8
Google Scholar
Bredies, K., Kunisch, K., & Pock, T. (2010). Total generalized variation. SIAM Journal on Imaging Sciences, 3(3), 492–526.
Article MathSciNet Google Scholar
Breuß, M., Cristiani, E., Dorou, J.-D., Falcone, M., & Vogel, O. (2012). Perspective shape from shading: Ambiguity analysis and numerical approximations. SIAM Journal on Imaging Science, 5(1), 311–342.
Article MathSciNet Google Scholar
Brox, T., Bruhn, A., Papenberg, N., & Weickert, J. (2004). High accuracy optical flow estimation based on a theory for warping. In Proceedings of European Conference on Computer Vision (pp. 25–36).
Google Scholar
Bruhn, A., & Weickert, J. (2005). Towards ultimate motion estimation: Combining highest accuracy with real-time performance. In Proceedings of IEEE International Conference on Computer Vision (pp. 749–755).
Camilli, F., & Tozza, S. (2017). A unified approach to the well-posedness of some non-Lambertian models in shape-from-shading theory. SIAM Journal on Imaging Science, 10(1), 26–46.
Article MathSciNet Google Scholar
Charbonnier, P., Blanc-Féraud, L., Aubert, G., & Barlaud, M. (1997). Deterministic edge-preserving regularization in computed imaging. IEEE Transactions on Image Processing, 6(2), 298–311.
Article Google Scholar
Chen, Q., & Koltun, V. (2013). A simple model for intrinsic image decomposition with depth cues. In Proceedings of IEEE International Conference on Computer Vision (pp. 241–248).
Cryer, J. E., Tsai, P.-S., & Shah, M. (1995). Integration of shape from shading and stereo. Pattern Recognition, 28(7), 1033–1043.
Article Google Scholar
Di Zenzo, S. (1986). A note on the gradient of a multi-image. Computer Vision, Graphics and Image Processing, 33, 116–125.
Article Google Scholar
Förstner, W., & Gülch, E. (1987). A fast operator for detection and precise location of distinct points, corners and centres of circular features. In Proceedings of ISPRS Intercommission Conference on Fast Processing of Photogrammetric Data (pp. 281–305).
Fua, P., & Leclerc, Y. G. (1995). Object-centered surface reconstruction: Combining multi-image stereo and shading. International Journal of Computer Vision, 16(1), 35–56.
Article Google Scholar
Galliani, S., Lasinger, K., & Schindler, K. (2015). Massively parallel multiview stereopsis by surface normal diffusion. In Proceedings of IEEE International Conference on Computer Vision (pp. 873–881).
Gelfand, I. M., & Fomin, S. V. (2000). Calculus of variations. New York: Dover.
MATH Google Scholar
Graber, G., Balzer, J., Soatto, S., & Pock, T. (2015). Efficient minimal-surface regularization of perspective depth maps in variational stereo. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 511–520).
Hafner, D., Schroers, C., & Weickert, J. (2015). Introducing maximal anisotropy into second order coupling models. In Proceedings of German Conference on Pattern Recognition (pp. 79–90).
Google Scholar
Haines, T. S. F., & Wilson, R. C. (2007). Integrating stereo with shape-from-shading derived orientation information. In Proceedings of British Machine Vision Conference (pp. 84:1–84:10).
Han, Y., Lee, J.-Y., & Kweon, I. S. (2013). High quality shape from a single RGB-D image under uncalibrated natural illumination. In Proceedings of IEEE International Conference on Computer Vision (pp. 1617–1624).
Hirschmüller, H. (2008). Stereo processing by semi-global matching and mutual information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 328–341.
Article Google Scholar
Horn, B. (1970). Shape from shading: A method for obtaining the shape of a smooth opaque object from one view. Ph.D. thesis, Massachusetts Institute of Technology.
Hougen, D. R., & Ahuja, N. (1994). Adaptive polynomial modelling of the reflectance map for shape estimation from stereo and shading. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 991–994).
Immel, D. S., Cohen, M. F., & Greenberg, D. P. (1986). A radiosity method for non-diffuse environments. In Proceedings of SIGGRAPH (pp. 133–142).
Article Google Scholar
Jin, H., Cremers, D., Wang, D., Prados, E., Yezzi, A., & Soatto, S. (2008). 3-D reconstruction of shaded objects from multiple images under unknown illumination. International Journal of Computer Vision, 76(3), 245–256.
Article Google Scholar
Ju, Y. C., Maurer, D., Breuß, M., & Bruhn, A. (2016). Direct variational perspective shape from shading with Cartesian depth parametrisation. In Perspectives in shape analysis, Mathematics and visualization (pp. 43–72). Springer. https://doi.org/10.1007/978-3-319-24726-7_3
Google Scholar
Kajiya, J. T. (1986). The rendering equation. In Proceedings of SIGGRAPH (pp. 143–150).
Article Google Scholar
Langguth, F., Sunkavalli, K., Hadap, S., & Goesele, M. (2016). Shading-aware multi-view stereo. In Proceedings of European Conference on Computer Vision (pp. 469–485).
Chapter Google Scholar
Leclerc, Y. G., & Bobick, A. (1991). The direct computation of height from shading. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 552–558).
Liu-Yin, Q., Yu, R., Fitzgibbon, A., Agapito, L., & Russell, C. (2016). Better together: Joint reasoning for non-rigid 3D reconstruction with specularities and shading. In Proceedings of British Machine Vision Conference (pp. 42:1–42:12).
Marr, D., & Poggio, T. (1976). Cooperative computation of stereo disparity. Science, 194, 283–287.
Article Google Scholar
Maurer, D., Ju, Y. C., Breuß, M., & Bruhn, A. (2015). An efficient linearisation approach for variational perspective shape from shading. In Proceedings of German Conference on Pattern Recognition (pp. 249–261).
Google Scholar
Maurer, D., Ju, Y. C., Breuß, M., & Bruhn, A. (2016). Combining shape from shading and stereo: A variational approach for the joint estimation of depth, illumination and albedo. In Proceedings of British Machine Vision Conference (pp. 76:1–76:14).
Mecca, R., Quau, Y., Logothetis, F., & Cipolla, R. (2016). A single-lobe photometric stereo approach for heterogeneous material. SIAM Journal on Imaging Science, 9(4), 1858–1888.
Article MathSciNet Google Scholar
Mémin, E., & Pérez, P. (1998). A multigrid approach for hierarchical motion estimation. In Proceedings of IEEE International Conference on Computer Vision (pp. 933–938).
Newcombe, R. A., Lovegrove, S. J., & Davison, A. J. (2011). DTAM: Dense tracking and mapping in real-time. In Proceedings of IEEE International Conference on Computer Vision (pp. 2320–2327).
Okatani, T., & Deguchi, K. (1997). Shape reconstruction from an endoscope image by shape from shading technique for a point light source at the projection center. Computer Vision and Image Understanding, 66(2), 119–131.
Article Google Scholar
Or-El, R., Hershkovitz, R., Wetzler, A., Rosman, G., Bruckstein, A. M., & Kimmel, R. (2016). Real-time depth refinement for specular objects. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 4378–4386).
Perona, P., & Malik, J. (1990). Scale space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2, 629–639.
Article Google Scholar
Phong, B. T. (1975). Illumination for computer generated pictures. Communications of the ACM, 18(6), 311–317.
Article Google Scholar
Prados, E., & Faugeras, O. (2005). Shape from shading: A well-posed problem? In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 870–877).
Quéau, Y., Lauze, F., & Durou, J.-D. (2015). A $L^1$-TV algorithm for robust perspective photometric stereo with spatially-varying lightings. In Proceedings of International Conference on Scale Space and Variational Methods in Computer Vision (pp. 498–510).
Google Scholar
Ranftl, R., Gehrig, S., Pock, T., & Bischof, H. (2012). Pushing the limits of stereo using variational stereo estimation. In Proceedings of IEEE Intelligent Vehicles Symposium (pp. 401–407).
Robert, L., & Deriche, R. (1996). Dense depth map reconstruction: A minimization and regularization approach which preserves discontinuities. In Proceedings of European Conference on Computer Vision (pp. 439–451).
Google Scholar
Rouy, E., & Tourin, A. (1992). A viscosity solutions approach to shape-from-shading. SIAM Journal on Numerical Analysis, 29, 867–884.
Article MathSciNet Google Scholar
Samaras, D., Metaxas, D., Fua, P., & Leclerc, Y. G. (2000). Variable Albedo surface reconstruction from stereo and shape from shading. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 480–487).
Schroers, C., Hafner, D., & Weickert, J. (2015). Multiview depth parameterisation with second order regularisation. In Proceedings of International Conference on Scale Space and Variational Methods in Computer Vision (pp. 551–562).
Google Scholar
Semerjian, B. (2014). A new variational framework for multiview surface reconstruction. In Proceedings of European Conference on Computer Vision (pp. 719–734).
Google Scholar
Simoncelli, E. P., Adelson, E. H., & Heeger, D. J. (1991). Probability distributions of optical flow. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 310–315).
Strecha, C., von Hansen, W., Van Gool, L., Fua, P., & Thoennessen, U. (2008) On benchmarking camera calibration and multi-view stereo for high resolution imagery. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–8).
Valgaerts, L., Wu, C., Bruhn, A., Seidel, H.-P., & Theobalt, C. (2012). Lightweight binocular facial performance capture under uncontrolled lighting. ACM Transactions on Graphics, 31(6), 1–11.
Article Google Scholar
van Diggelen, J. (1951). A photometric investigation of the slopes and the heights of the ranges of hills in the Maria of the Moon. Bulletin of the Astronomical Institutes of the Netherlands, 1, 283–289.
Google Scholar
Vogel, O., Bruhn, A., Weickert, J., & Didas, S. (2007). Direct shape-from-shading with adaptive higher order regularisation. In Proceedings of International Conference on Scale Space and Variational Methods in Computer Vision (pp. 871–882).
Volz, S., Bruhn, A., Valgaerts, L., & Zimmer, H. (2011). Modeling temporal coherence for optical flow. In Proceedings of IEEE International Conference on Computer Vision (pp. 1116–1123).
Weickert, J. (1998). Anisotropic diffusion in image processing. Stuttgart: Teubner.
MATH Google Scholar
Weickert, J., Welk, M., & Wickert, M. (2013). $L^2$-stable nonstandard finite differences for anisotropic diffusion. In Proceedings of International Conference on Scale Space and Variational Methods in Computer Vision (pp. 380–391).
Wu, C., Narasimhan, S., & Jaramaz, B. (2010). A multi-image shape-from-shading framework for near-lighting perspective endoscopes. International Journal of Computer Vision, 86, 211–228.
Article MathSciNet Google Scholar
Wu, C., Wilburn, B., Matsushita, Y., & Theobalt, C. (2011). High-quality shape from multi-view stereo and shading under general illumination. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 969–976).
Wu, C., Zollhöfer, M., Nießner, M., Stamminger, M., Izadi, S., & Theobalt, C. (2014). Real-time shading-based refinement for consumer depth cameras. ACM Transactions on Graphics, 33(6), 200:1–200:10.
MATH Google Scholar
Xu, D., Duan, Q., Zheng, J., Zhang, J., Cai, J., & Cham, T.-J. (2014). Recovering surface details under general unknown illumination using shading and coarse multi-view stereo. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1526–1533).
Yoon, K.-J., Prados, E., & Sturm, P. (2010). Joint estimation of shape and reflectance using multiple images with known illumination conditions. International Journal of Computer Vision, 86(2–3), 192–210.
Article MathSciNet Google Scholar
Young, D. M. (1971). Iterative solution of large linear systems. New York: Academic Press.
Google Scholar
Yu, L. F., Yeung, S. K., Tai, Y. W., & Lin, S. (2013). Shading-based shape refinement of RGB-D images. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1415–1422).
Yu, T., Xu, N., & Ahuja, N. (2004). Recovering shape and reflectance model of non-Lambertian objects from multiple views. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 226–233).
Yu, T., Xu, N., & Ahuja, N. (2007). Shape and view independent reflectance map from multiple views. International Journal of Computer Vision, 73(2), 123–138.
Article Google Scholar
Zimmer, H., Bruhn, A., & Weickert, J. (2011). Optic flow in harmony. International Journal of Computer Vision, 93(3), 368–388.
Article MathSciNet Google Scholar
Zollhöfer, M., Dai, A., Innmann, M., Wu, C., Stamminger, M., Theobalt, C., et al. (2015). Shading-based refinement on volumetric signed distance functions. ACM Transactions on Graphics, 34(4), 96:1–96:14.
Article Google Scholar

Download references

Acknowledgements

This work has been partly funded by the German Research Foundation (DFG) within the joint Project BR 2245/3-1 and BR 4372/1-1. Moreover, we thank the DFG for financial support within Project B04 of SFB/Transregio 161.

Author information

Authors and Affiliations

Institute for Visualization and Interactive Systems, University of Stuttgart, Stuttgart, Germany
Daniel Maurer, Yong Chul Ju & Andrés Bruhn
Institute for Applied Mathematics and Scientific Computing, BTU Cottbus-Senftenberg, Cottbus, Germany
Michael Breuß

Authors

Daniel Maurer
View author publications
You can also search for this author in PubMed Google Scholar
Yong Chul Ju
View author publications
You can also search for this author in PubMed Google Scholar
Michael Breuß
View author publications
You can also search for this author in PubMed Google Scholar
Andrés Bruhn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Maurer.

Additional information

Communicated by Edwin Hancock, Richard Wilson, Will Smith, Adrian Bors and Nick Pears.

Appendices

Appendix A: Pixel-Based Versus Image-Based Coordinates

1.1 Image Coordinates

In the SfS literature the surface is often parametrised in terms of image coordinates and it is typically assumed to lie at negative Z-coordinates; see e.g. Prados and Faugeras (2005). If we denote the image coordinates by $\hat{\mathbf {x}}_0=(\hat{x}_0,\hat{y}_0)^\top $, such a parametrisation is for instance given by Ju et al. (2016); Maurer et al. (2016):

$$\begin{aligned} \mathcal {S} \left( \hat{\mathbf {x}}_0 \right) = \left( \begin{array}{c} \dfrac{z \; \hat{x}_0}{\mathtt {f}_0} \\ \dfrac{z \; \hat{y}_0}{\mathtt {f}_0} \\ - z \end{array} \right) \; . \end{aligned}$$

(58)

The resulting surface normal then reads

$$\begin{aligned} \tilde{\mathbf {n}}(\hat{\mathbf {x}}_0)= & {} \mathcal {S}_{x}(\hat{\mathbf {x}}_0) \times \mathcal {S}_{y}(\hat{\mathbf {x}}_0) = \frac{z}{\mathtt {f}_0^2} \left( \begin{array}{c} \mathtt {f}_0 \; \hat{\nabla } z \\ \big (\hat{\nabla } z^\top \hat{\mathbf {x}}_0\big ) + z \end{array} \right) , \end{aligned}$$

(59)

where $\hat{\nabla }=(\partial _{\hat{x}_0},\partial _{\hat{y}_0})^\top $ denotes the gradient operator in image coordinates. Please note that this surface normal points outwards, i.e. away from the object.

1.2 Pixel Coordinates

In contrast, in our case, the parametrisation of the surface is given in pixel coordinates and the surface is assumed to lie at positive Z-coordinates; see also Graber et al. (2015). The corresponding surface parametrisation then reads

$$\begin{aligned} \mathcal {S} \left( \mathbf {x}_0 \right) = \left( \begin{array}{c} \dfrac{z \; h_{0,x} \; (x_0 - c_{0,x})}{\mathtt {f}_0} \\ \dfrac{z \; h_{0,y} \; (y_0 - c_{0,y})}{\mathtt {f}_0} \\ z \end{array} \right) \; , \end{aligned}$$

(60)

which yields the surface normal

$$\begin{aligned} \tilde{\mathbf {n}}(\mathbf {x}_0)= & {} \mathcal {S}_{x}(\mathbf {x}_0) \times \mathcal {S}_{y}(\mathbf {x}_0)\\ \nonumber= & {} z \; \frac{h_{0,x} \; h_{0,y}}{\mathtt {f}_0^2} \! \left( \begin{array}{c} - \mathtt {f}_0 \; \dfrac{z_x}{h_{0,x}} \\ - \mathtt {f}_0 \; \dfrac{z_y}{h_{0,y}} \\ \nabla z^\top \left( \mathbf {x}_0 \! - \! \mathbf {c}_0\right) + z \end{array} \right) \\ \nonumber= & {} h_{0,x} \; h_{0,y} \; \frac{z}{\mathtt {f}_0^2} \! \left( \begin{array}{c} - \mathtt {f}_0 \; H^{-1}_{0} \; \nabla z \\ \left( H^{-1}_{0} \nabla z \right) ^\top H_{0} \left( \mathbf {x}_0 \! - \! \mathbf {c}_0\right) + z \end{array} \right) , \end{aligned}$$

(61)

where

$$\begin{aligned} H_{0} = \left( \begin{array}{cc} h_{0,x} &{} 0 \\ 0 &{} h_{0,y} \end{array} \right) \end{aligned}$$

(62)

is a diagonal matrix involving the pixel sizes $h_{0,x}$ and $h_{0,y}$. Please note that due to the positive sign in the third component of the parametrisation this normal is now pointing inwards, i.e. into the object. Inverting the direction hence yields the desired outward normal

$$\begin{aligned} - \tilde{\mathbf {n}}(\mathbf {x}_0)= & {} -\mathcal {S}_{x}(\mathbf {x}_0) \times \mathcal {S}_{y}(\mathbf {x}_0)\nonumber \\= & {} h_{0,x} \; h_{0,y} \; \frac{z}{\mathtt {f}_0^2}\nonumber \\&\cdot \,\left( \begin{array}{c} \mathtt {f}_0 \; H^{-1}_{0} \; \nabla z \\ -\left( H^{-1}_{0} \nabla z \right) ^\top H_{0} \left( \mathbf {x}_0 \! - \! \mathbf {c}_0\right) - z \end{array} \right) . \end{aligned}$$

(63)

1.3 Comparison

Let us now compare the surface normals in Eqs. (59) and (63). On the one hand, one can observe a difference in the sign of the third component. Since the sign of the third component of the illumination vector field changes according to the sign of the third component of the surface parametrisation, the result of the inner product between surface normal and light direction in Eq. (11) is the same for both parametrisations. On the other hand, one can rewrite the image coordinates $\hat{\mathbf {x}}_0$ and the associated depth gradient $\hat{\nabla } z$ in terms of pixel coordinates. It is easy to verify that the corresponding transforms are given by $\hat{\mathbf {x}}_0 \! = \! H_{0} \! \left( \mathbf {x}_0 \! - \! \mathbf {c}_0\right) $ and $\hat{\nabla } z \! = \! H_{0}^{-1} \nabla z$, respectively. Finally, ignoring scale—we are only interested in the direction—one can see that the corresponding unit surface normals are equivalent.

Appendix B: Derivation of the Euler–Lagrange Equations

In order to derive the Euler–Lagrange equations (EL 1)–(EL 4) of the differential energy (25) one must compute its first variation, i.e. the Gâteaux differential. Due to the sum rule one may split the derivation into the contributions of the individual terms in Eqs. (35), (37), (38), (39), (40) and (43). While the derivation of most contributions is straightforward (Gelfand and Fomin 2000), the contributions related to the differential SfS data term in Eq. (37) are more complicated due to the fact that it contains non-local contributions. However, in contrast to typical non-local approaches that rely on patch-based strategies, e.g. Boulanger et al. (2011), the non-local contributions in our case originate from the upwind scheme approximation employed in the SfS term.

Hence, in the following, we focus on the derivation of those contributions to the Euler–Lagrange equations that are related to the SfS data term. Therefore, we first compute the Gâteaux differential of $D_{\mathsf {sfs}}^k$ at $( dz^k, \mathbf {dl}^k, \mathbf {d}\varvec{\rho }^k )$ in the direction $\mathbf {v} = \left( v_1,\mathbf {v}_2,\mathbf {v}_3\right) ^\top $, as carried out in Eq. (69). While the first steps are straightforward, the rearrangement of the terms in step (a) is slightly more difficult. In order to factor out the directions $v_1$ evaluated at position $\mathbf {x}_0$ we must account for the additional non-local contributions of the neighbouring positions, i.e. the non-local contributions of $\mathbf {x}_0 - \mathbf {h}_x^k$, $\mathbf {x}_0 + \mathbf {h}_x^k$, $\mathbf {x}_0 - \mathbf {h}_y^k$ and $\mathbf {x}_0 + \mathbf {h}_y^k$, as illustrated in Fig. 14. Please note that for reasons of simplicity we neglect the fact that we do not have non-local contributions outside the image domain $\varOmega $, and hence would have a reduced set of neighbours near the boundary. In the last step (b), we can finally identify the coefficients of the directions $v_1$, $\mathbf {v}_2$ and $\mathbf {v}_3$ with the functional derivatives $\frac{\delta D_{\mathsf {sfs}}^k}{\delta dz^k}$, $\frac{\delta D_{\mathsf {sfs}}^k}{\delta \mathbf {dl}^k }$ and $\frac{\delta D_{\mathsf {sfs}}^k}{\delta \mathbf {d}\varvec{\rho }^k }$ which serve as abbreviations.

Since $\delta D_{\mathsf {sfs}}^k$ must vanish at extrema for any direction including $\mathbf {v} = \left( v_1, \mathbf {0}, \mathbf {0}\right) ^\top $, $\mathbf {v} = \left( 0, \mathbf {v}_2, \mathbf {0}\right) ^\top $, and $\mathbf {v} = \left( 0, \mathbf {0}, \mathbf {v}_3\right) ^\top $, we obtain the following necessary conditions, i.e. Euler–Lagrange equations:

$$\begin{aligned} \frac{\delta D_{\mathsf {sfs}}^k}{\delta dz^k} = 0 , \quad \frac{\delta D_{\mathsf {sfs}}^k}{\delta \mathbf {dl}^k } = 0 , \quad \frac{\delta D_{\mathsf {sfs}}^k}{\delta \mathbf {d}\varvec{\rho }^k } = 0 . \end{aligned}$$

(64)

Please note that the other terms of the differential energy give additional contributions to the Euler–Lagrange equations. They can be computed in the same way as before. Hence, finally we obtain:

$$\begin{aligned} 0 = \frac{\delta E^k}{\delta dz^k} =&\bigg ( \frac{\delta D^k_{\mathsf {stereo}}}{\delta dz^k} + \nu \cdot \frac{\delta D^k_{\mathsf {sfs}} }{\delta dz^k} + \alpha _{z} \cdot \frac{\delta R^k_{\mathsf {depth}} }{\delta dz^k} \nonumber \\&+ \alpha _{\mathbf {l}} \cdot \frac{\delta R^k_{\mathsf {illum}} }{\delta dz^k} + \alpha _{\varvec{\rho }} \cdot \frac{\delta R^k_{\mathsf {albedo}}}{\delta dz^k} + \alpha _{\mathsf {inc}} \cdot \frac{\delta R^k_{\mathsf {inc}} }{\delta dz^k}\bigg ) , \end{aligned}$$

(65)

$$\begin{aligned} 0 = \frac{\delta E^k}{\delta \mathbf {dl}^k } =&\bigg ( \frac{\delta D^k_{\mathsf {stereo}}}{\delta \mathbf {dl}^k } + \nu \cdot \frac{\delta D^k_{\mathsf {sfs}} }{\delta \mathbf {dl}^k } + \alpha _{z} \cdot \frac{\delta R^k_{\mathsf {depth}} }{\delta \mathbf {dl}^k } \nonumber \\&+ \alpha _{\mathbf {l}} \cdot \frac{\delta R^k_{\mathsf {illum}} }{\delta \mathbf {dl}^k } + \alpha _{\varvec{\rho }} \cdot \frac{\delta R^k_{\mathsf {albedo}}}{\delta \mathbf {dl}^k } + \alpha _{\mathsf {inc}} \cdot \frac{\delta R^k_{\mathsf {inc}} }{\delta \mathbf {dl}^k }\bigg ) , \end{aligned}$$

(66)

$$\begin{aligned} 0 = \frac{\delta E^k}{\delta \mathbf {d}\varvec{\rho }^k } =&\bigg ( \frac{\delta D^k_{\mathsf {stereo}}}{\delta \mathbf {d}\varvec{\rho }^k} + \nu \cdot \frac{\delta D^k_{\mathsf {sfs}} }{\delta \mathbf {d}\varvec{\rho }^k} + \alpha _{z} \cdot \frac{\delta R^k_{\mathsf {depth}} }{\delta \mathbf {d}\varvec{\rho }^k} \nonumber \\&+ \alpha _{\mathbf {l}} \cdot \frac{\delta R^k_{\mathsf {illum}} }{\delta \mathbf {d}\varvec{\rho }^k} + \alpha _{\varvec{\rho }} \cdot \frac{\delta R^k_{\mathsf {albedo}}}{\delta \mathbf {d}\varvec{\rho }^k} + \alpha _{\mathsf {inc}} \cdot \frac{\delta R^k_{\mathsf {inc}} }{\delta \mathbf {d}\varvec{\rho }^k} \bigg ) , \end{aligned}$$

(67)

$$\begin{aligned} 0 = \frac{\delta E^k}{\delta \mathbf {du}^k } =&\bigg ( \frac{\delta D^k_{\mathsf {stereo}}}{\delta \mathbf {du}^k} + \nu \cdot \frac{\delta D^k_{\mathsf {sfs}} }{\delta \mathbf {du}^k} + \alpha _{z} \cdot \frac{\delta R^k_{\mathsf {depth}} }{\delta \mathbf {du}^k} \nonumber \\&+ \alpha _{\mathbf {l}} \cdot \frac{\delta R^k_{\mathsf {illum}} }{\delta \mathbf {du}^k} + \alpha _{\varvec{\rho }} \cdot \frac{\delta R^k_{\mathsf {albedo}}}{\delta \mathbf {du}^k} + \alpha _{\mathsf {inc}} \cdot \frac{\delta R^k_{\mathsf {inc}} }{\delta \mathbf {du}^k} \bigg ) . \end{aligned}$$

(68)

which then results in the Euler–Lagrange equations (EL 1)–(EL 4).

$$\begin{aligned}&\delta D_{\mathsf {sfs}}^k \left( dz^k, \mathbf {dl}^k, \mathbf {d}\varvec{\rho }^k; v_1,\mathbf {v}_2,\mathbf {v}_3 \right) = \lim \limits _{t \rightarrow 0} \frac{D_{\mathsf {sfs}}^k \left( dz^k + t \cdot v_1, \mathbf {dl}^k + t \cdot \mathbf {v}_2, \mathbf {d}\varvec{\rho }^k + t \cdot \mathbf {v}_3 \right) - D_{\mathsf {sfs}}^k \left( dz^k, \mathbf {dl}^k, \mathbf {d}\varvec{\rho }^k \right) }{t}\\&\quad = \frac{d}{dt} D_{\mathsf {sfs}}^k \left( dz^k + t \cdot v_1, \mathbf {dl}^k + t \cdot \mathbf {v}_2, \mathbf {d}\varvec{\rho }^k + t \cdot \mathbf {v}_3 \right) \biggr |_{t = 0}\\&\quad = 2 \cdot \! \! \int _{\varOmega _{0}} \sum _{c=1}^{3} \Bigg \{ \Bigg [ {\phi }^{k,c}(\mathbf {x}_0) + \sum _{\mathbf {h}^k \in H^k} \frac{\partial {\phi }^{k,c}(\mathbf {x}_0)}{\partial z^k(\mathbf {x}_0 \! + \! \mathbf {h}^k)} \big ( dz^{k}(\mathbf {x}_0 \! + \! \mathbf {h}^k) + t \cdot v_1(\mathbf {x}_0 \! + \! \mathbf {h}^k) \big ))\\&\qquad +\,\left( \frac{\partial {\phi }^{k,c}(\mathbf {x}_0)}{\partial \mathbf {l}^k(\mathbf {x}_0)} \right) ^\top \left( \mathbf {dl}^k(\mathbf {x}_0) + t \cdot \mathbf {v}_2(\mathbf {x}_0) \right) + \left( \frac{\partial {\phi }^{k,c}(\mathbf {x}_0)}{\partial \varvec{\rho }^k(\mathbf {x}_0)}\right) ^\top \left( \mathbf {d}\varvec{\rho }^k(\mathbf {x}_0) + t \cdot \mathbf {v}_3(\mathbf {x}_0) \right) \Bigg ]\\&\qquad \cdot \,\Bigg [ \sum _{\mathbf {n}^k \in H^k} \left( \frac{\partial {\phi }^{k,c}(\mathbf {x}_0)}{\partial z^k(\mathbf {x}_0 \! + \! \mathbf {n}^k)} \cdot v_1(\mathbf {x}_0 + \mathbf {n}^k) \right) + \left( \frac{\partial {\phi }^{k,c}(\mathbf {x}_0)}{\partial \mathbf {l}^k(\mathbf {x}_0)} \right) ^\top \mathbf {v}_2(\mathbf {x}_0) + \left( \frac{\partial {\phi }^{k,c}(\mathbf {x}_0)}{\partial \varvec{\rho }^k(\mathbf {x}_0)}\right) ^\top \mathbf {v}_3(\mathbf {x}_0) \Bigg ] \Bigg \} \, \mathrm {d}\mathbf {x}_0 \Biggr |_{t = 0}\\&\quad = 2 \cdot \! \! \int _{\varOmega _{0}} \sum _{c=1}^{3} \Bigg \{ \Bigg [ {\phi }^{k,c}(\mathbf {x}_0) + \sum _{\mathbf {h}^k \in H^k} \frac{\partial {\phi }^{k,c}(\mathbf {x}_0)}{\partial z^k(\mathbf {x}_0 \! + \! \mathbf {h}^k)} \; dz^{k}(\mathbf {x}_0 \! + \! \mathbf {h}^k) + \,\left( \frac{\partial {\phi }^{k,c}(\mathbf {x}_0)}{\partial \mathbf {l}^k(\mathbf {x}_0)} \right) ^\top \mathbf {dl}^k(\mathbf {x}_0) + \left( \frac{\partial {\phi }^{k,c}(\mathbf {x}_0)}{\partial \varvec{\rho }^k(\mathbf {x}_0)}\right) ^\top \mathbf {d}\varvec{\rho }^k(\mathbf {x}_0) \Bigg ] \\&\qquad \cdot \, \Bigg [ \sum _{\mathbf {n}^k \in H^k} \left( \frac{\partial {\phi }^{k,c}(\mathbf {x}_0)}{\partial z^k(\mathbf {x}_0 \! + \! \mathbf {n}^k)} \cdot v_1(\mathbf {x}_0 + \mathbf {n}^k) \right) + \left( \frac{\partial {\phi }^{k,c}(\mathbf {x}_0)}{\partial \mathbf {l}^k(\mathbf {x}_0)} \right) ^\top \mathbf {v}_2(\mathbf {x}_0) + \left( \frac{\partial {\phi }^{k,c}(\mathbf {x}_0)}{\partial \varvec{\rho }^k(\mathbf {x}_0)}\right) ^\top \mathbf {v}_3(\mathbf {x}_0) \Bigg ] \Bigg \} \, \mathrm {d}\mathbf {x}_0 \end{aligned}$$

$$\begin{aligned}&\quad {\mathop {=}\limits ^{(a)}} 2 \cdot \! \! \int _{\varOmega _{0}} \sum _{c=1}^{3} \sum _{\mathbf {m}^k \in H^k} \Bigg \{ \Bigg [ {\phi }^{k,c}(\mathbf {x}_0 \! + \! \mathbf {m}^k) + \sum _{\mathbf {h}^k \in H^k} \frac{\partial {\phi }^{k,c}(\mathbf {x}_0 \! + \! \mathbf {m}^k)}{\partial z^k(\mathbf {x}_0 \! + \! \mathbf {m}^k \! + \! \mathbf {h}^k)} dz^{k}(\mathbf {x}_0 \! + \! \mathbf {m}^k \! + \! \mathbf {h}^k) + \left( \frac{\partial {\phi }^{k,c}(\mathbf {x}_0 \! + \! \mathbf {m}^k)}{\partial \mathbf {l}^k(\mathbf {x}_0 \! + \! \mathbf {m}^k)} \right) ^\top \mathbf {dl}^k(\mathbf {x}_0 \! + \! \mathbf {m}^k)\nonumber \\&\qquad +\, \left( \frac{\partial {\phi }^{k,c}(\mathbf {x}_0 \! + \! \mathbf {m}^k)}{\partial \varvec{\rho }^k(\mathbf {x}_0 \! + \! \mathbf {m}^k)} \right) ^\top \mathbf {d}\varvec{\rho }^k(\mathbf {x}_0 \! + \! \mathbf {m}^k) \Bigg ] \cdot \,\frac{\partial {\phi }^{k,c}(\mathbf {x}_0 + \mathbf {m}^k)}{\partial z^k(\mathbf {x}_0 )} \Bigg \} \cdot v_1(\mathbf {x}_0) \nonumber \\&\quad + \,\sum _{c=1}^{3} \Bigg \{ \Bigg [ {\phi }^{k,c}(\mathbf {x}_0) + \sum _{\mathbf {h}^k \in H^k} \frac{\partial {\phi }^{k,c}(\mathbf {x}_0)}{\partial z^k(\mathbf {x}_0 \! + \! \mathbf {h}^k)} \; dz^{k}(\mathbf {x}_0 \! + \! \mathbf {h}^k) +\, \left( \frac{\partial {\phi }^{k,c}(\mathbf {x}_0)}{\partial \mathbf {l}^k(\mathbf {x}_0)} \right) ^\top \mathbf {dl}^k(\mathbf {x}_0) + \left( \frac{\partial {\phi }^{k,c}(\mathbf {x}_0)}{\partial \varvec{\rho }^k(\mathbf {x}_0)}\right) ^\top \mathbf {d}\varvec{\rho }^k(\mathbf {x}_0) \Bigg ]\nonumber \\&\qquad \cdot \, \left( \frac{\partial {\phi }^{k,c}(\mathbf {x}_0)}{\partial \mathbf {l}^k(\mathbf {x}_0)} \right) \Bigg \}^\top \mathbf {v}_2(\mathbf {x}_0) \nonumber \\&\quad + \,\sum _{c=1}^{3} \Bigg \{ \Bigg [ {\phi }^{k,c}(\mathbf {x}_0) + \sum _{\mathbf {h}^k \in H^k} \frac{\partial {\phi }^{k,c}(\mathbf {x}_0)}{\partial z^k(\mathbf {x}_0 \! + \! \mathbf {h}^k)} \; dz^{k}(\mathbf {x}_0 \! + \! \mathbf {h}^k) + \left( \frac{\partial {\phi }^{k,c}(\mathbf {x}_0)}{\partial \mathbf {l}^k(\mathbf {x}_0)} \right) ^\top \mathbf {dl}^k(\mathbf {x}_0) + \left( \frac{\partial {\phi }^{k,c}(\mathbf {x}_0)}{\partial \varvec{\rho }^k(\mathbf {x}_0)}\right) ^\top \mathbf {d}\varvec{\rho }^k(\mathbf {x}_0) \Bigg ] \nonumber \\&\qquad \cdot \, \left( \frac{\partial {\phi }^{k,c}(\mathbf {x}_0)}{\partial \varvec{\rho }^k(\mathbf {x}_0)}\right) \Bigg \}^\top \mathbf {v}_3(\mathbf {x}_0) \; \mathrm {d}\mathbf {x}_0 {\mathop {=}\limits ^{(b)}} 2 \cdot \! \! \int _{\varOmega _{0}} \frac{\delta D_{\mathsf {sfs}}^k}{\delta dz^k}(\mathbf {x}_0) \cdot v_1(\mathbf {x}_0) + \frac{\delta D_{\mathsf {sfs}}^k}{\delta \mathbf {dl}^k }(\mathbf {x}_0) ^\top \mathbf {v}_2(\mathbf {x}_0) + \frac{\delta D_{\mathsf {sfs}}^k}{\delta \mathbf {d}\varvec{\rho }^k }(\mathbf {x}_0) ^\top \mathbf {v}_3(\mathbf {x}_0) \; \mathrm {d}\mathbf {x}_0 \end{aligned}$$

(69)

Appendix C: Pseudocode

Algorithm 1 sketches the employed optimisation strategy in terms of pseudocode.

Appendix D: Model Parameters

In order to obtain the best possible reconstruction the parameters have to be adjusted properly. While the solver related parameters mainly effect the trade off between reconstruction quality and runtime performance and thus can be set fixed in advance (see Sect. 5), the model parameters have direct impact on the quality and thus must be selected more carefully. In this context, a convenient strategy is to first adjust the parameters of the pure stereo model and then extend the parameter set to the parameters of the combined model. From our experience the optimal model parameters may vary depending on the intrinsic camera parameters as well as the distance of the scene to the camera. In this context, even some of the model parameters turned out to be suitable for a wider range of images and thus can also be set fixed in advance; see Sect. 5. What remains to be adjusted are essentially the hyper parameters for the terms of the differential energy, which we specified in Table 2. The parameter settings for the method of Galliani et al. (2015) and Graber et al. (2015) are also given in Table 2. Regarding the method of Graber et al. we even optimised the level depended parameter scaling function in the python code from

$$\begin{aligned} \lambda _{l} = \lambda \cdot \left( \frac{1}{\eta }\right) ^l \quad \text{ to } \quad \lambda _{l} = \lambda \cdot \left( \frac{1}{\eta }\right) ^{1.5 \cdot l} , \end{aligned}$$

(70)

which led to a noticeable improvement of the results. In this context l denotes the current pyramid level (the finest resolution is $l=0$).

Table 2 Parameter settings for the different datasets and approaches

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maurer, D., Ju, Y.C., Breuß, M. et al. Combining Shape from Shading and Stereo: A Joint Variational Method for Estimating Depth, Illumination and Albedo. Int J Comput Vis 126, 1342–1366 (2018). https://doi.org/10.1007/s11263-018-1079-1

Download citation

Received: 01 February 2017
Accepted: 07 March 2018
Published: 27 March 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s11263-018-1079-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining Shape from Shading and Stereo: A Joint Variational Method for Estimating Depth, Illumination and Albedo

Abstract

Access this article