Skip to main content

Structural Causal 3D Reconstruction

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13661))

Included in the following conference series:

Abstract

This paper considers the problem of unsupervised 3D object reconstruction from in-the-wild single-view images. Due to ambiguity and intrinsic ill-posedness, this problem is inherently difficult to solve and therefore requires strong regularization to achieve disentanglement of different latent factors. Unlike existing works that introduce explicit regularizations into objective functions, we look into a different space for implicit regularization – the structure of latent space. Specifically, we restrict the structure of latent space to capture a topological causal ordering of latent factors (i.e., representing causal dependency as a directed acyclic graph). We first show that different causal orderings matter for 3D reconstruction, and then explore several approaches to find a task-dependent causal factor ordering. Our experiments demonstrate that the latent space structure indeed serves as an implicit regularization and introduces an inductive bias beneficial for reconstruction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Agrawal, P., Carreira, J., Malik, J.: Learning to see by moving. In: ICCV (2015)

    Google Scholar 

  2. Albiero, V., Chen, X., Yin, X., Pang, G., Hassner, T.: img2pose: face alignment and detection via 6DoF, face pose estimation. In: CVPR (2021)

    Google Scholar 

  3. Besserve, M., Sun, R., Schölkopf, B.: Intrinsic disentanglement: an invariance view for deep generative models. In: ICML 2018 Workshop on Theoretical Foundations and Applications of Deep Generative Models (2018)

    Google Scholar 

  4. Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques (1999)

    Google Scholar 

  5. Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)

  6. Chen, C.H., Tyagi, A., Agrawal, A., Drover, D., Stojanov, S., Rehg, J.M.: Unsupervised 3D pose estimation with geometric self-supervision. In: CVPR (2019)

    Google Scholar 

  7. Chen, W., et al.: Learning to predict 3D objects with an interpolation-based differentiable renderer. In: NeurIPS (2019)

    Google Scholar 

  8. Choutas, V., Pavlakos, G., Bolkart, T., Tzionas, D., Black, M.J.: Monocular expressive body regression through body-driven attention. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 20–40. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_2

    Chapter  Google Scholar 

  9. Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_38

    Chapter  Google Scholar 

  10. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NeurIPS (2014)

    Google Scholar 

  11. Elsken, T., Metzen, J.H., Hutter, F.: Neural architecture search: a survey. J. Mach. Learn. Res. 20(1), 1997–2017 (2019)

    MathSciNet  MATH  Google Scholar 

  12. Fahim, G., Amin, K., Zarif, S.: Single-view 3D reconstruction: a survey of deep learning methods. Comput. Graph. 94, 164–190 (2021)

    Article  Google Scholar 

  13. Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object reconstruction from a single image. In: CVPR (2017)

    Google Scholar 

  14. Feng, Y., Choutas, V., Bolkart, T., Tzionas, D., Black, M.J.: Collaborative regression of expressive bodies using moderation. In: 3DV (2021)

    Google Scholar 

  15. Feng, Y., Feng, H., Black, M.J., Bolkart, T.: Learning an animatable detailed 3D face model from in-the-wild images. ACM Trans. Graph. (TOG) 40, 1–13 (2021)

    Google Scholar 

  16. François, A.R., Medioni, G.G., Waupotitsch, R.: Mirror symmetry \({=}{>}\) 2-view stereo geometry. Image Vis. Comput. 21(2), 137–143 (2003)

    Article  Google Scholar 

  17. Frazier, P.I.: A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811 (2018)

  18. Fuentes-Pacheco, J., Ruiz-Ascencio, J., Rendón-Mancha, J.M.: Visual simultaneous localization and mapping: a survey. Artif. Intell. Rev. 43(1), 55–81 (2015)

    Article  Google Scholar 

  19. Gecer, B., Ploumpis, S., Kotsia, I., Zafeiriou, S.: GANFit: generative adversarial network fitting for high fidelity 3D face reconstruction. In: CVPR (2019)

    Google Scholar 

  20. Gerig, T., et al.: Morphable face models-an open framework. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018) (2018)

    Google Scholar 

  21. Girdhar, R., Fouhey, D.F., Rodriguez, M., Gupta, A.: Learning a predictable and generative vector representation for objects. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 484–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_29

    Chapter  Google Scholar 

  22. Green, R.: Spherical harmonic lighting: the gritty details. In: Archives of the Game Developers Conference, vol. 56, p. 4 (2003)

    Google Scholar 

  23. Gwak, J., Choy, C.B., Chandraker, M., Garg, A., Savarese, S.: Weakly supervised 3D reconstruction with adversarial constraint. In: 3DV (2017)

    Google Scholar 

  24. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  25. Henderson, P., Ferrari, V.: Learning to generate and reconstruct 3D meshes with only 2D supervision. arXiv preprint arXiv:1807.09259 (2018)

  26. Henderson, P., Ferrari, V.: Learning single-image 3D reconstruction by generative modelling of shape, pose and shading. IJCV 128(4), 835–854 (2020)

    Article  Google Scholar 

  27. Ho, L.N., Tran, A.T., Phung, Q., Hoai, M.: Toward realistic single-view 3D object reconstruction with unsupervised learning from multiple images. In: ICCV (2021)

    Google Scholar 

  28. Horn, B.K., Brooks, M.J.: Shape from Shading. MIT Press, Cambridge (1989)

    MATH  Google Scholar 

  29. Hu, T., Wang, L., Xu, X., Liu, S., Jia, J.: Self-supervised 3D mesh reconstruction from single images. In: CVPR (2021)

    Google Scholar 

  30. Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR (2018)

    Google Scholar 

  31. Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 386–402. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_23

    Chapter  Google Scholar 

  32. Kar, A., Häne, C., Malik, J.: Learning a multi-view stereo machine. In: NIPS (2017)

    Google Scholar 

  33. Kato, H., Harada, T.: Learning view priors for single-view 3D reconstruction. In: CVPR (2019)

    Google Scholar 

  34. Kato, H., Ushiku, Y., Harada, T.: Neural 3D mesh renderer. In: CVPR (2018)

    Google Scholar 

  35. Kilbertus, N., Parascandolo, G., Schölkopf, B.: Generalization in anti-causal learning. arXiv preprint arXiv:1812.00524 (2018)

  36. Koenderink, J.J.: What does the occluding contour tell us about solid shape? Perception 13(3), 321–330 (1984)

    Article  Google Scholar 

  37. Leeb, F., Lanzillotta, G., Annadani, Y., Besserve, M., Bauer, S., Schölkopf, B.: Structure by architecture: disentangled representations without regularization. arXiv preprint arXiv:2006.07796 (2020)

  38. Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph. 36(6), 1–17 (2017)

    Google Scholar 

  39. Li, X., et al.: Self-supervised single-view 3D reconstruction via semantic consistency. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 677–693. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_40

    Chapter  Google Scholar 

  40. Liu, H., Simonyan, K., Yang, Y.: Darts: differentiable architecture search. In: ICLR (2019)

    Google Scholar 

  41. Liu, S., Li, T., Chen, W., Li, H.: Soft rasterizer: a differentiable renderer for image-based 3D reasoning. In: ICCV (2019)

    Google Scholar 

  42. Liu, W., Wen, Y., Raj, B., Singh, R., Weller, A.: Sphereface revived: unifying hyperspherical face recognition. TPAMI (2022)

    Google Scholar 

  43. Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L.: Sphereface: deep hypersphere embedding for face recognition. In: CVPR (2017)

    Google Scholar 

  44. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV (2015)

    Google Scholar 

  45. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 1–16 (2015)

    Article  Google Scholar 

  46. Mukherjee, D.P., Zisserman, A.P., Brady, M., Smith, F.: Shape from symmetry: detecting and exploiting symmetry in affine images. Philos. Trans. R. Soc. Lond. Series A: Phys. Eng. Sci. 351(1695), 77–106 (1995)

    Google Scholar 

  47. Murphy, K.P.: Dynamic Bayesian Networks: Representation, Inference and Learning. University of California, Berkeley (2002)

    Google Scholar 

  48. Novotny, D., Larlus, D., Vedaldi, A.: Learning 3D object categories by looking around them. In: ICCV (2017)

    Google Scholar 

  49. Ozyesil, O., Voroninski, V., Basri, R., Singer, A.: A survey of structure from motion. arXiv preprint arXiv:1701.08493 (2017)

  50. Pan, J., Han, X., Chen, W., Tang, J., Jia, K.: Deep mesh reconstruction from single RGB images via topology modification networks. In: ICCV (2019)

    Google Scholar 

  51. Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.: Cats and dogs. In: CVPR (2012)

    Google Scholar 

  52. Paysan, P., Knothe, R., Amberg, B., Romdhani, S., Vetter, T.: A 3D face model for pose and illumination invariant face recognition. In: 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 296–301. IEEE (2009)

    Google Scholar 

  53. Pearl, J.: Causality. Cambridge University Press, Cambridge (2009)

    Book  Google Scholar 

  54. Phong, B.T.: Illumination for computer generated pictures. Commun. ACM 18(6), 311–317 (1975)

    Article  Google Scholar 

  55. Schölkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K., Mooij, J.: On causal and anticausal learning. In: Langford, J., Pineau, J. (eds.) Proceedings of the 29th International Conference on Machine Learning (ICML), pp. 1255–1262. Omnipress, New York (2012). http://icml.cc/2012/papers/625.pdf

  56. Schölkopf, B., et al.: Toward causal representation learning. Proc. IEEE 109(5), 612–634 (2021)

    Article  Google Scholar 

  57. Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR (2015)

    Google Scholar 

  58. Shen, X., Liu, F., Dong, H., Lian, Q., Chen, Z., Zhang, T.: Disentangled generative causal representation learning. arXiv preprint arXiv:2010.02637 (2020)

  59. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  60. Sinha, S.N., Ramnath, K., Szeliski, R.: Detecting and reconstructing 3D mirror symmetric objects. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 586–600. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_42

    Chapter  Google Scholar 

  61. Suwajanakorn, S., Snavely, N., Tompson, J.J., Norouzi, M.: Discovery of latent 3D keypoints via end-to-end geometric reasoning. In: NeurIPS (2018)

    Google Scholar 

  62. Tewari, A., et al.: MoFA: model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In: ICCV (2017)

    Google Scholar 

  63. Thrun, S., Wegbreit, B.: Shape from symmetry. In: ICCV (2005)

    Google Scholar 

  64. Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: CVPR (2017)

    Google Scholar 

  65. Vowels, M.J., Camgoz, N.C., Bowden, R.: D’ya like DAGs? A survey on structure learning and causal discovery. arXiv preprint arXiv:2103.02582 (2021)

  66. Wang, C., Buenaposada, J.M., Zhu, R., Lucey, S.: Learning depth from monocular videos using direct methods. In: CVPR (2018)

    Google Scholar 

  67. Wang, M., Shu, Z., Cheng, S., Panagakis, Y., Samaras, D., Zafeiriou, S.: An adversarial neuro-tensorial approach for learning disentangled representations. IJCV 127(6), 743–762 (2019)

    Article  Google Scholar 

  68. Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.-G.: Pixel2Mesh: generating 3D mesh models from single RGB images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 55–71. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_4

    Chapter  Google Scholar 

  69. Wang, Q., et al.: Exponential convergence of the deep neural network approximation for analytic functions. arXiv preprint arXiv:1807.00297 (2018)

  70. Weichwald, S., Schölkopf, B., Ball, T., Grosse-Wentrup, M.: Causal and anti-causal learning in pattern recognition for neuroimaging. In: International Workshop on Pattern Recognition in Neuroimaging (2014)

    Google Scholar 

  71. Wen, C., Zhang, Y., Li, Z., Fu, Y.: Pixel2Mesh++: multi-view 3D mesh generation via deformation. In: ICCV (2019)

    Google Scholar 

  72. Wen, Y., Liu, W., Raj, B., Singh, R.: Self-supervised 3d face reconstruction via conditional estimation. In: ICCV (2021)

    Google Scholar 

  73. Wiles, O., Zisserman, A.: SilNet: single-and multi-view reconstruction by learning from silhouettes. In: BMVC (2017)

    Google Scholar 

  74. Witkin, A.P.: Recovering surface shape and orientation from texture. Artif. Intell. 17(1–3), 17–45 (1981)

    Article  Google Scholar 

  75. Wu, S., Rupprecht, C., Vedaldi, A.: Unsupervised learning of probably symmetric deformable 3D objects from images in the wild. In: CVPR (2020)

    Google Scholar 

  76. Xiang, Yu., et al.: ObjectNet3D: a large scale database for 3D object recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 160–176. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_10

    Chapter  Google Scholar 

  77. Xie, H., Yao, H., Sun, X., Zhou, S., Zhang, S.: Pix2Vox: context-aware 3D reconstruction from single and multi-view images. In: ICCV (2019)

    Google Scholar 

  78. Yan, X., Yang, J., Yumer, E., Guo, Y., Lee, H.: Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision. In: NIPS (2016)

    Google Scholar 

  79. Yang, M., Liu, F., Chen, Z., Shen, X., Hao, J., Wang, J.: CausalVAE: disentangled representation learning via neural structural causal models. In: CVPR (2021)

    Google Scholar 

  80. Yi, H., et al.: MMFace: a multi-metric regression network for unconstrained face reconstruction. In: CVPR (2019)

    Google Scholar 

  81. Yu, Y., Chen, J., Gao, T., Yu, M.: DAG-GNN: DAG structure learning with graph neural networks. In: ICML (2019)

    Google Scholar 

  82. Zhang, R., Tsai, P.S., Cryer, J.E., Shah, M.: Shape-from-shading: a survey. TPAMI 21(8), 690–706 (1999)

    Article  Google Scholar 

  83. Zhang, W., Sun, J., Tang, X.: Cat head detection - how to effectively exploit shape and texture features. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 802–816. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88693-8_59

    Chapter  Google Scholar 

  84. Zheng, X., Aragam, B., Ravikumar, P.K., Xing, E.P.: DAGs with no tears: continuous optimization for structure learning. In: NeurIPS (2018)

    Google Scholar 

  85. Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: CVPR (2017)

    Google Scholar 

  86. Zhu, R., Kiani Galoogahi, H., Wang, C., Lucey, S.: Rethinking reprojection: closing the loop for pose-aware shape reconstruction from a single image. In: ICCV (2017)

    Google Scholar 

  87. Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: ICLR (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiyang Liu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3305 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, W., Liu, Z., Paull, L., Weller, A., Schölkopf, B. (2022). Structural Causal 3D Reconstruction. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13661. Springer, Cham. https://doi.org/10.1007/978-3-031-19769-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19769-7_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19768-0

  • Online ISBN: 978-3-031-19769-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics