Structural Causal 3D Reconstruction

Liu, Weiyang; Liu, Zhen; Paull, Liam; Weller, Adrian; Schölkopf, Bernhard

doi:10.1007/978-3-031-19769-7_9

Weiyang Liu^12,13,
Zhen Liu¹⁴,
Liam Paull¹⁴,
Adrian Weller^13,15 &
…
Bernhard Schölkopf¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13661))

Included in the following conference series:

European Conference on Computer Vision

4161 Accesses
6 Citations

Abstract

This paper considers the problem of unsupervised 3D object reconstruction from in-the-wild single-view images. Due to ambiguity and intrinsic ill-posedness, this problem is inherently difficult to solve and therefore requires strong regularization to achieve disentanglement of different latent factors. Unlike existing works that introduce explicit regularizations into objective functions, we look into a different space for implicit regularization – the structure of latent space. Specifically, we restrict the structure of latent space to capture a topological causal ordering of latent factors (i.e., representing causal dependency as a directed acyclic graph). We first show that different causal orderings matter for 3D reconstruction, and then explore several approaches to find a task-dependent causal factor ordering. Our experiments demonstrate that the latent space structure indeed serves as an implicit regularization and introduces an inductive bias beneficial for reconstruction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Associative3D: Volumetric Reconstruction from Sparse Views

GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views

References

Agrawal, P., Carreira, J., Malik, J.: Learning to see by moving. In: ICCV (2015)
Google Scholar
Albiero, V., Chen, X., Yin, X., Pang, G., Hassner, T.: img2pose: face alignment and detection via 6DoF, face pose estimation. In: CVPR (2021)
Google Scholar
Besserve, M., Sun, R., Schölkopf, B.: Intrinsic disentanglement: an invariance view for deep generative models. In: ICML 2018 Workshop on Theoretical Foundations and Applications of Deep Generative Models (2018)
Google Scholar
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques (1999)
Google Scholar
Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
Chen, C.H., Tyagi, A., Agrawal, A., Drover, D., Stojanov, S., Rehg, J.M.: Unsupervised 3D pose estimation with geometric self-supervision. In: CVPR (2019)
Google Scholar
Chen, W., et al.: Learning to predict 3D objects with an interpolation-based differentiable renderer. In: NeurIPS (2019)
Google Scholar
Choutas, V., Pavlakos, G., Bolkart, T., Tzionas, D., Black, M.J.: Monocular expressive body regression through body-driven attention. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 20–40. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_2
Chapter Google Scholar
Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_38
Chapter Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NeurIPS (2014)
Google Scholar
Elsken, T., Metzen, J.H., Hutter, F.: Neural architecture search: a survey. J. Mach. Learn. Res. 20(1), 1997–2017 (2019)
MathSciNet MATH Google Scholar
Fahim, G., Amin, K., Zarif, S.: Single-view 3D reconstruction: a survey of deep learning methods. Comput. Graph. 94, 164–190 (2021)
Article Google Scholar
Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object reconstruction from a single image. In: CVPR (2017)
Google Scholar
Feng, Y., Choutas, V., Bolkart, T., Tzionas, D., Black, M.J.: Collaborative regression of expressive bodies using moderation. In: 3DV (2021)
Google Scholar
Feng, Y., Feng, H., Black, M.J., Bolkart, T.: Learning an animatable detailed 3D face model from in-the-wild images. ACM Trans. Graph. (TOG) 40, 1–13 (2021)
Google Scholar
François, A.R., Medioni, G.G., Waupotitsch, R.: Mirror symmetry ${=}{>}$ 2-view stereo geometry. Image Vis. Comput. 21(2), 137–143 (2003)
Article Google Scholar
Frazier, P.I.: A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811 (2018)
Fuentes-Pacheco, J., Ruiz-Ascencio, J., Rendón-Mancha, J.M.: Visual simultaneous localization and mapping: a survey. Artif. Intell. Rev. 43(1), 55–81 (2015)
Article Google Scholar
Gecer, B., Ploumpis, S., Kotsia, I., Zafeiriou, S.: GANFit: generative adversarial network fitting for high fidelity 3D face reconstruction. In: CVPR (2019)
Google Scholar
Gerig, T., et al.: Morphable face models-an open framework. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018) (2018)
Google Scholar
Girdhar, R., Fouhey, D.F., Rodriguez, M., Gupta, A.: Learning a predictable and generative vector representation for objects. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 484–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_29
Chapter Google Scholar
Green, R.: Spherical harmonic lighting: the gritty details. In: Archives of the Game Developers Conference, vol. 56, p. 4 (2003)
Google Scholar
Gwak, J., Choy, C.B., Chandraker, M., Garg, A., Savarese, S.: Weakly supervised 3D reconstruction with adversarial constraint. In: 3DV (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Henderson, P., Ferrari, V.: Learning to generate and reconstruct 3D meshes with only 2D supervision. arXiv preprint arXiv:1807.09259 (2018)
Henderson, P., Ferrari, V.: Learning single-image 3D reconstruction by generative modelling of shape, pose and shading. IJCV 128(4), 835–854 (2020)
Article Google Scholar
Ho, L.N., Tran, A.T., Phung, Q., Hoai, M.: Toward realistic single-view 3D object reconstruction with unsupervised learning from multiple images. In: ICCV (2021)
Google Scholar
Horn, B.K., Brooks, M.J.: Shape from Shading. MIT Press, Cambridge (1989)
MATH Google Scholar
Hu, T., Wang, L., Xu, X., Liu, S., Jia, J.: Self-supervised 3D mesh reconstruction from single images. In: CVPR (2021)
Google Scholar
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR (2018)
Google Scholar
Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 386–402. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_23
Chapter Google Scholar
Kar, A., Häne, C., Malik, J.: Learning a multi-view stereo machine. In: NIPS (2017)
Google Scholar
Kato, H., Harada, T.: Learning view priors for single-view 3D reconstruction. In: CVPR (2019)
Google Scholar
Kato, H., Ushiku, Y., Harada, T.: Neural 3D mesh renderer. In: CVPR (2018)
Google Scholar
Kilbertus, N., Parascandolo, G., Schölkopf, B.: Generalization in anti-causal learning. arXiv preprint arXiv:1812.00524 (2018)
Koenderink, J.J.: What does the occluding contour tell us about solid shape? Perception 13(3), 321–330 (1984)
Article Google Scholar
Leeb, F., Lanzillotta, G., Annadani, Y., Besserve, M., Bauer, S., Schölkopf, B.: Structure by architecture: disentangled representations without regularization. arXiv preprint arXiv:2006.07796 (2020)
Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph. 36(6), 1–17 (2017)
Google Scholar
Li, X., et al.: Self-supervised single-view 3D reconstruction via semantic consistency. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 677–693. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_40
Chapter Google Scholar
Liu, H., Simonyan, K., Yang, Y.: Darts: differentiable architecture search. In: ICLR (2019)
Google Scholar
Liu, S., Li, T., Chen, W., Li, H.: Soft rasterizer: a differentiable renderer for image-based 3D reasoning. In: ICCV (2019)
Google Scholar
Liu, W., Wen, Y., Raj, B., Singh, R., Weller, A.: Sphereface revived: unifying hyperspherical face recognition. TPAMI (2022)
Google Scholar
Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L.: Sphereface: deep hypersphere embedding for face recognition. In: CVPR (2017)
Google Scholar
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV (2015)
Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 1–16 (2015)
Article Google Scholar
Mukherjee, D.P., Zisserman, A.P., Brady, M., Smith, F.: Shape from symmetry: detecting and exploiting symmetry in affine images. Philos. Trans. R. Soc. Lond. Series A: Phys. Eng. Sci. 351(1695), 77–106 (1995)
Google Scholar
Murphy, K.P.: Dynamic Bayesian Networks: Representation, Inference and Learning. University of California, Berkeley (2002)
Google Scholar
Novotny, D., Larlus, D., Vedaldi, A.: Learning 3D object categories by looking around them. In: ICCV (2017)
Google Scholar
Ozyesil, O., Voroninski, V., Basri, R., Singer, A.: A survey of structure from motion. arXiv preprint arXiv:1701.08493 (2017)
Pan, J., Han, X., Chen, W., Tang, J., Jia, K.: Deep mesh reconstruction from single RGB images via topology modification networks. In: ICCV (2019)
Google Scholar
Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.: Cats and dogs. In: CVPR (2012)
Google Scholar
Paysan, P., Knothe, R., Amberg, B., Romdhani, S., Vetter, T.: A 3D face model for pose and illumination invariant face recognition. In: 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 296–301. IEEE (2009)
Google Scholar
Pearl, J.: Causality. Cambridge University Press, Cambridge (2009)
Book Google Scholar
Phong, B.T.: Illumination for computer generated pictures. Commun. ACM 18(6), 311–317 (1975)
Article Google Scholar
Schölkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K., Mooij, J.: On causal and anticausal learning. In: Langford, J., Pineau, J. (eds.) Proceedings of the 29th International Conference on Machine Learning (ICML), pp. 1255–1262. Omnipress, New York (2012). http://icml.cc/2012/papers/625.pdf
Schölkopf, B., et al.: Toward causal representation learning. Proc. IEEE 109(5), 612–634 (2021)
Article Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR (2015)
Google Scholar
Shen, X., Liu, F., Dong, H., Lian, Q., Chen, Z., Zhang, T.: Disentangled generative causal representation learning. arXiv preprint arXiv:2010.02637 (2020)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sinha, S.N., Ramnath, K., Szeliski, R.: Detecting and reconstructing 3D mirror symmetric objects. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 586–600. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_42
Chapter Google Scholar
Suwajanakorn, S., Snavely, N., Tompson, J.J., Norouzi, M.: Discovery of latent 3D keypoints via end-to-end geometric reasoning. In: NeurIPS (2018)
Google Scholar
Tewari, A., et al.: MoFA: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In: ICCV (2017)
Google Scholar
Thrun, S., Wegbreit, B.: Shape from symmetry. In: ICCV (2005)
Google Scholar
Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: CVPR (2017)
Google Scholar
Vowels, M.J., Camgoz, N.C., Bowden, R.: D’ya like DAGs? A survey on structure learning and causal discovery. arXiv preprint arXiv:2103.02582 (2021)
Wang, C., Buenaposada, J.M., Zhu, R., Lucey, S.: Learning depth from monocular videos using direct methods. In: CVPR (2018)
Google Scholar
Wang, M., Shu, Z., Cheng, S., Panagakis, Y., Samaras, D., Zafeiriou, S.: An adversarial neuro-tensorial approach for learning disentangled representations. IJCV 127(6), 743–762 (2019)
Article Google Scholar
Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.-G.: Pixel2Mesh: generating 3D mesh models from single RGB images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 55–71. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_4
Chapter Google Scholar
Wang, Q., et al.: Exponential convergence of the deep neural network approximation for analytic functions. arXiv preprint arXiv:1807.00297 (2018)
Weichwald, S., Schölkopf, B., Ball, T., Grosse-Wentrup, M.: Causal and anti-causal learning in pattern recognition for neuroimaging. In: International Workshop on Pattern Recognition in Neuroimaging (2014)
Google Scholar
Wen, C., Zhang, Y., Li, Z., Fu, Y.: Pixel2Mesh++: multi-view 3D mesh generation via deformation. In: ICCV (2019)
Google Scholar
Wen, Y., Liu, W., Raj, B., Singh, R.: Self-supervised 3d face reconstruction via conditional estimation. In: ICCV (2021)
Google Scholar
Wiles, O., Zisserman, A.: SilNet: single-and multi-view reconstruction by learning from silhouettes. In: BMVC (2017)
Google Scholar
Witkin, A.P.: Recovering surface shape and orientation from texture. Artif. Intell. 17(1–3), 17–45 (1981)
Article Google Scholar
Wu, S., Rupprecht, C., Vedaldi, A.: Unsupervised learning of probably symmetric deformable 3D objects from images in the wild. In: CVPR (2020)
Google Scholar
Xiang, Yu., et al.: ObjectNet3D: a large scale database for 3D object recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 160–176. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_10
Chapter Google Scholar
Xie, H., Yao, H., Sun, X., Zhou, S., Zhang, S.: Pix2Vox: context-aware 3D reconstruction from single and multi-view images. In: ICCV (2019)
Google Scholar
Yan, X., Yang, J., Yumer, E., Guo, Y., Lee, H.: Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision. In: NIPS (2016)
Google Scholar
Yang, M., Liu, F., Chen, Z., Shen, X., Hao, J., Wang, J.: CausalVAE: disentangled representation learning via neural structural causal models. In: CVPR (2021)
Google Scholar
Yi, H., et al.: MMFace: a multi-metric regression network for unconstrained face reconstruction. In: CVPR (2019)
Google Scholar
Yu, Y., Chen, J., Gao, T., Yu, M.: DAG-GNN: DAG structure learning with graph neural networks. In: ICML (2019)
Google Scholar
Zhang, R., Tsai, P.S., Cryer, J.E., Shah, M.: Shape-from-shading: a survey. TPAMI 21(8), 690–706 (1999)
Article Google Scholar
Zhang, W., Sun, J., Tang, X.: Cat head detection - how to effectively exploit shape and texture features. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 802–816. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88693-8_59
Chapter Google Scholar
Zheng, X., Aragam, B., Ravikumar, P.K., Xing, E.P.: DAGs with no tears: continuous optimization for structure learning. In: NeurIPS (2018)
Google Scholar
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: CVPR (2017)
Google Scholar
Zhu, R., Kiani Galoogahi, H., Wang, C., Lucey, S.: Rethinking reprojection: closing the loop for pose-aware shape reconstruction from a single image. In: ICCV (2017)
Google Scholar
Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: ICLR (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Max Planck Institute for Intelligent Systems, Tübingen, Germany
Weiyang Liu & Bernhard Schölkopf
University of Cambridge, Cambridge, UK
Weiyang Liu & Adrian Weller
Mila, Université de Montréal, Montreal, Canada
Zhen Liu & Liam Paull
Alan Turing Institute, London, UK
Adrian Weller

Authors

Weiyang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Liam Paull
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Weller
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Schölkopf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiyang Liu .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3305 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, W., Liu, Z., Paull, L., Weller, A., Schölkopf, B. (2022). Structural Causal 3D Reconstruction. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13661. Springer, Cham. https://doi.org/10.1007/978-3-031-19769-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-19769-7_9
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19768-0
Online ISBN: 978-3-031-19769-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics