Weakly-Supervised Reconstruction of 3D Objects with Large Shape Variation from Single In-the-Wild Images

Sun, Shichen; Zhu, Zhengbang; Dai, Xiaowei; Zhao, Qijun; Li, Jing

doi:10.1007/978-3-030-69525-5_1

Shichen Sun¹²,
Zhengbang Zhu¹²,
Xiaowei Dai¹²,
Qijun Zhao^12,13 &
…
Jing Li¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12622))

Included in the following conference series:

Asian Conference on Computer Vision

1239 Accesses
2 Citations

Abstract

Existing unsupervised 3D object reconstruction methods can not work well if the shape of objects varies substantially across images or if the images have distracting background. This paper proposes a novel learning framework for reconstructing 3D objects with large shape variation from single in-the-wild images. Considering that shape variation leads to appearance change of objects at various scales, we propose a fusion module to form combined multi-scale image features for 3D reconstruction. To deal with the ambiguity caused by shape variation, we propose side-output mask constraint to supervise the feature extraction, and adaptive edge constraint and initial shape constraint to supervise the shape reconstruction. Moreover, we propose background manipulation to augment the training images such that the obtained model is robust to background distraction. Extensive experiments have been done for both non-rigid objects (birds) and rigid objects (planes and vehicles), and the results prove the superiority of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zhang, P., Liu, W., Lei, Y., Lu, H., Yang, X.: Cascaded context pyramid for full-resolution 3D semantic scene completion. In: IEEE International Conference on Computer Vision (ICCV), pp. 7801–7810 (2019)
Google Scholar
Kar, A., Häne, C., Malik, J.: Learning a multi-view stereo machine. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems (NeurIPS), pp. 365–376 (2017)
Google Scholar
Lin, C.H., et al.: Photometric mesh optimization for video-aligned 3D object reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 969–978 (2019)
Google Scholar
Sridhar, S., Rempe, D., Valentin, J., Bouaziz, S., Guibas, L.J.: Multiview aggregation for learning category-specific shape reconstruction. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2348–2359 (2019)
Google Scholar
Shen, W., Jia, Y., Wu, Y.: 3D shape reconstruction from images in the frequency domain. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4471–4479 (2019)
Google Scholar
Yang, G., Huang, X., Hao, Z., Liu, M.Y., Belongie, S., Hariharan, B.: PointFlow: 3D point cloud generation with continuous normalizing flows. In: IEEE International Conference on Computer Vision (ICCV), pp. 4541–4550 (2019)
Google Scholar
Pan, J., Han, X., Chen, W., Tang, J., Jia, K.: Deep mesh reconstruction from single RGB images via topology modification networks. In: IEEE International Conference on Computer Vision (ICCV), pp. 9964–9973 (2019)
Google Scholar
Richter, S.R., Roth, S.: Matryoshka networks: predicting 3D geometry via nested shape layers. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1936–1944 (2018)
Google Scholar
Smith, E., Fujimoto, S., Romero, A., Meger, D.: Geometrics: exploiting geometric structure for graph-encoded objects. In: Chaudhuri, K., Salakhutdinov, R. (eds.) International Conference on Machine Learning (ICML), pp. 5866–5876 (2019)
Google Scholar
Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs. In: IEEE International Conference on Computer Vision (ICCV), pp. 2088–2096 (2017)
Google Scholar
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), pp. 187–194 (1999)
Google Scholar
Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2Mesh: generating 3D mesh models from single RGB images. In: European Conference on Computer Vision (ECCV), pp. 52–67 (2018)
Google Scholar
Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: European Conference on Computer Vision (ECCV), pp. 371–386 (2018)
Google Scholar
Cha, G., Lee, M., Oh, S.: Unsupervised 3D reconstruction networks. In: The IEEE International Conference on Computer Vision (ICCV), pp. 3849–3858 (2019)
Google Scholar
Kato, H., Harada, T.: Learning view priors for single-view 3D reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9778–9787 (2019)
Google Scholar
Liu, S., Saito, S., Chen, W., Li, H.: Learning to infer implicit surfaces without 3D supervision. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 8293–8304 (2019)
Google Scholar
Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_38
Chapter Google Scholar
Girdhar, R., Fouhey, D.F., Rodriguez, M., Gupta, A.: Learning a predictable and generative vector representation for objects. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 484–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_29
Chapter Google Scholar
Wang, P.S., Liu, Y., Guo, Y.X., Sun, C.Y., Tong, X.: Adaptive O-CNN: a patch-based deep representation of 3D shapes. ACM Trans. Graph. (TOG) 37, 1–11 (2018)
Google Scholar
Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 605–613 (2017)
Google Scholar
Insafutdinov, E., Dosovitskiy, A.: Unsupervised learning of shape and pose with differentiable point clouds. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2802–2812 (2018)
Google Scholar
Kurenkov, A., et al.: DeformNet: free-form deformation network for 3D shape reconstruction. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 858–866 (2017)
Google Scholar
Lin, C.H., Kong, C., Lucey, S.: Learning efficient point cloud generation for dense 3D object reconstruction. In: AAAI Conference on Artificial Intelligence (AAAI) (2018)
Google Scholar
Wei, Y., Liu, S., Zhao, W., Lu, J., Zhou, J.: Conditional single-view shape generation for multi-view stereo reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9651–9660 (2019)
Google Scholar
Groueix, T., Fisher, M., Kim, V.G., Russell, B., Aubry, M.: AtlasNet: a papier-mâché approach to learning 3D surface generation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 216–224 (2018)
Google Scholar
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4460–4470 (2019)
Google Scholar
Xu, Q., Wang, W., Ceylan, D., Mech, R., Neumann, U.: DISN: deep implicit surface network for high-quality single-view 3D reconstruction. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 490–500 (2019)
Google Scholar
Cashman, T.J., Fitzgibbon, A.W.: What shape are dolphins? Building 3D morphable models from 2D images. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 35(1), 232–244 (2013)
Article Google Scholar
Ntouskos, V., et al.: Component-wise modeling of articulated objects. In: IEEE International Conference on Computer Vision (ICCV), pp. 2327–2335 (2015)
Google Scholar
Kanazawa, A., Kovalsky, S., Basri, R., Jacobs, D.W.: Learning 3D deformation of animals from 2D images. In: Computer Graphics Forum, pp. 365–374 (2016)
Google Scholar
Zuffi, S., Kanazawa, A., Jacobs, D., Black, M.: 3D menagerie: modeling the 3D shape and pose of animals. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6365–6373 (2017)
Google Scholar
Zuffi, S., Kanazawa, A., Black, M.J.: Lions and tigers and bears: capturing non-rigid, 3D, articulated shape from images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3955–3963 (2018)
Google Scholar
Zuffi, S., Kanazawa, A., Berger-Wolf, T., Black, M.J.: Three-D Safari: learning to estimate zebra pose, shape, and texture from images “in the wild”. In: IEEE International Conference on Computer Vision (ICCV), pp. 5359–5368 (2019)
Google Scholar
Wen, C., Zhang, Y., Li, Z., Fu, Y.: Pixel2Mesh++: multi-view 3D mesh generation via deformation. In: IEEE International Conference on Computer Vision (ICCV), pp. 1042–1051 (2019)
Google Scholar
Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.N.: Semantic graph convolutional networks for 3D human pose regression. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3425–3435 (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Google Scholar
Kato, H., Ushiku, Y., Harada, T.: Neural 3D mesh renderer. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3907–3916 (2018)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 586–595 (2018)
Google Scholar
Xiang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 75–82 (2014)
Google Scholar
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Kar, A., Tulsiani, S., Carreira, J., Malik, J.: Category-specific object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1966–1974 (2015)
Google Scholar
Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2626–2634 (2017)
Google Scholar

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (61773270, 61971005).

Author information

Authors and Affiliations

College of Computer Science, Sichuan University, Chengdu, China
Shichen Sun, Zhengbang Zhu, Xiaowei Dai, Qijun Zhao & Jing Li
School of Information Science and Technology, Tibet University, Lhasa, China
Qijun Zhao

Authors

Shichen Sun
View author publications
You can also search for this author in PubMed Google Scholar
Zhengbang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowei Dai
View author publications
You can also search for this author in PubMed Google Scholar
Qijun Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jing Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qijun Zhao .

Editor information

Editors and Affiliations

Waseda University, Tokyo, Japan
Hiroshi Ishikawa
Institute of Automation of Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Czech Technical University in Prague, Prague, Czech Republic
Tomas Pajdla
University of Pennsylvania, Philadelphia, PA, USA
Jianbo Shi

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1597 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, S., Zhu, Z., Dai, X., Zhao, Q., Li, J. (2021). Weakly-Supervised Reconstruction of 3D Objects with Large Shape Variation from Single In-the-Wild Images. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12622. Springer, Cham. https://doi.org/10.1007/978-3-030-69525-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-69525-5_1
Published: 27 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69524-8
Online ISBN: 978-3-030-69525-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics