Skip to main content

Weakly-Supervised Reconstruction of 3D Objects with Large Shape Variation from Single In-the-Wild Images

  • Conference paper
  • First Online:
Computer Vision – ACCV 2020 (ACCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12622))

Included in the following conference series:

Abstract

Existing unsupervised 3D object reconstruction methods can not work well if the shape of objects varies substantially across images or if the images have distracting background. This paper proposes a novel learning framework for reconstructing 3D objects with large shape variation from single in-the-wild images. Considering that shape variation leads to appearance change of objects at various scales, we propose a fusion module to form combined multi-scale image features for 3D reconstruction. To deal with the ambiguity caused by shape variation, we propose side-output mask constraint to supervise the feature extraction, and adaptive edge constraint and initial shape constraint to supervise the shape reconstruction. Moreover, we propose background manipulation to augment the training images such that the obtained model is robust to background distraction. Extensive experiments have been done for both non-rigid objects (birds) and rigid objects (planes and vehicles), and the results prove the superiority of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zhang, P., Liu, W., Lei, Y., Lu, H., Yang, X.: Cascaded context pyramid for full-resolution 3D semantic scene completion. In: IEEE International Conference on Computer Vision (ICCV), pp. 7801–7810 (2019)

    Google Scholar 

  2. Kar, A., Häne, C., Malik, J.: Learning a multi-view stereo machine. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems (NeurIPS), pp. 365–376 (2017)

    Google Scholar 

  3. Lin, C.H., et al.: Photometric mesh optimization for video-aligned 3D object reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 969–978 (2019)

    Google Scholar 

  4. Sridhar, S., Rempe, D., Valentin, J., Bouaziz, S., Guibas, L.J.: Multiview aggregation for learning category-specific shape reconstruction. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2348–2359 (2019)

    Google Scholar 

  5. Shen, W., Jia, Y., Wu, Y.: 3D shape reconstruction from images in the frequency domain. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4471–4479 (2019)

    Google Scholar 

  6. Yang, G., Huang, X., Hao, Z., Liu, M.Y., Belongie, S., Hariharan, B.: PointFlow: 3D point cloud generation with continuous normalizing flows. In: IEEE International Conference on Computer Vision (ICCV), pp. 4541–4550 (2019)

    Google Scholar 

  7. Pan, J., Han, X., Chen, W., Tang, J., Jia, K.: Deep mesh reconstruction from single RGB images via topology modification networks. In: IEEE International Conference on Computer Vision (ICCV), pp. 9964–9973 (2019)

    Google Scholar 

  8. Richter, S.R., Roth, S.: Matryoshka networks: predicting 3D geometry via nested shape layers. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1936–1944 (2018)

    Google Scholar 

  9. Smith, E., Fujimoto, S., Romero, A., Meger, D.: Geometrics: exploiting geometric structure for graph-encoded objects. In: Chaudhuri, K., Salakhutdinov, R. (eds.) International Conference on Machine Learning (ICML), pp. 5866–5876 (2019)

    Google Scholar 

  10. Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs. In: IEEE International Conference on Computer Vision (ICCV), pp. 2088–2096 (2017)

    Google Scholar 

  11. Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), pp. 187–194 (1999)

    Google Scholar 

  12. Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2Mesh: generating 3D mesh models from single RGB images. In: European Conference on Computer Vision (ECCV), pp. 52–67 (2018)

    Google Scholar 

  13. Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: European Conference on Computer Vision (ECCV), pp. 371–386 (2018)

    Google Scholar 

  14. Cha, G., Lee, M., Oh, S.: Unsupervised 3D reconstruction networks. In: The IEEE International Conference on Computer Vision (ICCV), pp. 3849–3858 (2019)

    Google Scholar 

  15. Kato, H., Harada, T.: Learning view priors for single-view 3D reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9778–9787 (2019)

    Google Scholar 

  16. Liu, S., Saito, S., Chen, W., Li, H.: Learning to infer implicit surfaces without 3D supervision. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 8293–8304 (2019)

    Google Scholar 

  17. Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_38

    Chapter  Google Scholar 

  18. Girdhar, R., Fouhey, D.F., Rodriguez, M., Gupta, A.: Learning a predictable and generative vector representation for objects. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 484–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_29

    Chapter  Google Scholar 

  19. Wang, P.S., Liu, Y., Guo, Y.X., Sun, C.Y., Tong, X.: Adaptive O-CNN: a patch-based deep representation of 3D shapes. ACM Trans. Graph. (TOG) 37, 1–11 (2018)

    Google Scholar 

  20. Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 605–613 (2017)

    Google Scholar 

  21. Insafutdinov, E., Dosovitskiy, A.: Unsupervised learning of shape and pose with differentiable point clouds. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2802–2812 (2018)

    Google Scholar 

  22. Kurenkov, A., et al.: DeformNet: free-form deformation network for 3D shape reconstruction. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 858–866 (2017)

    Google Scholar 

  23. Lin, C.H., Kong, C., Lucey, S.: Learning efficient point cloud generation for dense 3D object reconstruction. In: AAAI Conference on Artificial Intelligence (AAAI) (2018)

    Google Scholar 

  24. Wei, Y., Liu, S., Zhao, W., Lu, J., Zhou, J.: Conditional single-view shape generation for multi-view stereo reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9651–9660 (2019)

    Google Scholar 

  25. Groueix, T., Fisher, M., Kim, V.G., Russell, B., Aubry, M.: AtlasNet: a papier-mâché approach to learning 3D surface generation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 216–224 (2018)

    Google Scholar 

  26. Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4460–4470 (2019)

    Google Scholar 

  27. Xu, Q., Wang, W., Ceylan, D., Mech, R., Neumann, U.: DISN: deep implicit surface network for high-quality single-view 3D reconstruction. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 490–500 (2019)

    Google Scholar 

  28. Cashman, T.J., Fitzgibbon, A.W.: What shape are dolphins? Building 3D morphable models from 2D images. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 35(1), 232–244 (2013)

    Article  Google Scholar 

  29. Ntouskos, V., et al.: Component-wise modeling of articulated objects. In: IEEE International Conference on Computer Vision (ICCV), pp. 2327–2335 (2015)

    Google Scholar 

  30. Kanazawa, A., Kovalsky, S., Basri, R., Jacobs, D.W.: Learning 3D deformation of animals from 2D images. In: Computer Graphics Forum, pp. 365–374 (2016)

    Google Scholar 

  31. Zuffi, S., Kanazawa, A., Jacobs, D., Black, M.: 3D menagerie: modeling the 3D shape and pose of animals. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6365–6373 (2017)

    Google Scholar 

  32. Zuffi, S., Kanazawa, A., Black, M.J.: Lions and tigers and bears: capturing non-rigid, 3D, articulated shape from images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3955–3963 (2018)

    Google Scholar 

  33. Zuffi, S., Kanazawa, A., Berger-Wolf, T., Black, M.J.: Three-D Safari: learning to estimate zebra pose, shape, and texture from images “in the wild”. In: IEEE International Conference on Computer Vision (ICCV), pp. 5359–5368 (2019)

    Google Scholar 

  34. Wen, C., Zhang, Y., Li, Z., Fu, Y.: Pixel2Mesh++: multi-view 3D mesh generation via deformation. In: IEEE International Conference on Computer Vision (ICCV), pp. 1042–1051 (2019)

    Google Scholar 

  35. Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.N.: Semantic graph convolutional networks for 3D human pose regression. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3425–3435 (2019)

    Google Scholar 

  36. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

    Google Scholar 

  37. Kato, H., Ushiku, Y., Harada, T.: Neural 3D mesh renderer. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3907–3916 (2018)

    Google Scholar 

  38. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 586–595 (2018)

    Google Scholar 

  39. Xiang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 75–82 (2014)

    Google Scholar 

  40. Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2

  41. Kar, A., Tulsiani, S., Carreira, J., Malik, J.: Category-specific object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1966–1974 (2015)

    Google Scholar 

  42. Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2626–2634 (2017)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (61773270, 61971005).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qijun Zhao .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1597 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, S., Zhu, Z., Dai, X., Zhao, Q., Li, J. (2021). Weakly-Supervised Reconstruction of 3D Objects with Large Shape Variation from Single In-the-Wild Images. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12622. Springer, Cham. https://doi.org/10.1007/978-3-030-69525-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-69525-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-69524-8

  • Online ISBN: 978-3-030-69525-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics