Abstract
Existing unsupervised 3D object reconstruction methods can not work well if the shape of objects varies substantially across images or if the images have distracting background. This paper proposes a novel learning framework for reconstructing 3D objects with large shape variation from single in-the-wild images. Considering that shape variation leads to appearance change of objects at various scales, we propose a fusion module to form combined multi-scale image features for 3D reconstruction. To deal with the ambiguity caused by shape variation, we propose side-output mask constraint to supervise the feature extraction, and adaptive edge constraint and initial shape constraint to supervise the shape reconstruction. Moreover, we propose background manipulation to augment the training images such that the obtained model is robust to background distraction. Extensive experiments have been done for both non-rigid objects (birds) and rigid objects (planes and vehicles), and the results prove the superiority of the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhang, P., Liu, W., Lei, Y., Lu, H., Yang, X.: Cascaded context pyramid for full-resolution 3D semantic scene completion. In: IEEE International Conference on Computer Vision (ICCV), pp. 7801–7810 (2019)
Kar, A., Häne, C., Malik, J.: Learning a multi-view stereo machine. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems (NeurIPS), pp. 365–376 (2017)
Lin, C.H., et al.: Photometric mesh optimization for video-aligned 3D object reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 969–978 (2019)
Sridhar, S., Rempe, D., Valentin, J., Bouaziz, S., Guibas, L.J.: Multiview aggregation for learning category-specific shape reconstruction. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2348–2359 (2019)
Shen, W., Jia, Y., Wu, Y.: 3D shape reconstruction from images in the frequency domain. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4471–4479 (2019)
Yang, G., Huang, X., Hao, Z., Liu, M.Y., Belongie, S., Hariharan, B.: PointFlow: 3D point cloud generation with continuous normalizing flows. In: IEEE International Conference on Computer Vision (ICCV), pp. 4541–4550 (2019)
Pan, J., Han, X., Chen, W., Tang, J., Jia, K.: Deep mesh reconstruction from single RGB images via topology modification networks. In: IEEE International Conference on Computer Vision (ICCV), pp. 9964–9973 (2019)
Richter, S.R., Roth, S.: Matryoshka networks: predicting 3D geometry via nested shape layers. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1936–1944 (2018)
Smith, E., Fujimoto, S., Romero, A., Meger, D.: Geometrics: exploiting geometric structure for graph-encoded objects. In: Chaudhuri, K., Salakhutdinov, R. (eds.) International Conference on Machine Learning (ICML), pp. 5866–5876 (2019)
Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs. In: IEEE International Conference on Computer Vision (ICCV), pp. 2088–2096 (2017)
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), pp. 187–194 (1999)
Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., Jiang, Y.G.: Pixel2Mesh: generating 3D mesh models from single RGB images. In: European Conference on Computer Vision (ECCV), pp. 52–67 (2018)
Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: European Conference on Computer Vision (ECCV), pp. 371–386 (2018)
Cha, G., Lee, M., Oh, S.: Unsupervised 3D reconstruction networks. In: The IEEE International Conference on Computer Vision (ICCV), pp. 3849–3858 (2019)
Kato, H., Harada, T.: Learning view priors for single-view 3D reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9778–9787 (2019)
Liu, S., Saito, S., Chen, W., Li, H.: Learning to infer implicit surfaces without 3D supervision. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 8293–8304 (2019)
Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_38
Girdhar, R., Fouhey, D.F., Rodriguez, M., Gupta, A.: Learning a predictable and generative vector representation for objects. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 484–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_29
Wang, P.S., Liu, Y., Guo, Y.X., Sun, C.Y., Tong, X.: Adaptive O-CNN: a patch-based deep representation of 3D shapes. ACM Trans. Graph. (TOG) 37, 1–11 (2018)
Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 605–613 (2017)
Insafutdinov, E., Dosovitskiy, A.: Unsupervised learning of shape and pose with differentiable point clouds. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2802–2812 (2018)
Kurenkov, A., et al.: DeformNet: free-form deformation network for 3D shape reconstruction. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 858–866 (2017)
Lin, C.H., Kong, C., Lucey, S.: Learning efficient point cloud generation for dense 3D object reconstruction. In: AAAI Conference on Artificial Intelligence (AAAI) (2018)
Wei, Y., Liu, S., Zhao, W., Lu, J., Zhou, J.: Conditional single-view shape generation for multi-view stereo reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9651–9660 (2019)
Groueix, T., Fisher, M., Kim, V.G., Russell, B., Aubry, M.: AtlasNet: a papier-mâché approach to learning 3D surface generation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 216–224 (2018)
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3D reconstruction in function space. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4460–4470 (2019)
Xu, Q., Wang, W., Ceylan, D., Mech, R., Neumann, U.: DISN: deep implicit surface network for high-quality single-view 3D reconstruction. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 490–500 (2019)
Cashman, T.J., Fitzgibbon, A.W.: What shape are dolphins? Building 3D morphable models from 2D images. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 35(1), 232–244 (2013)
Ntouskos, V., et al.: Component-wise modeling of articulated objects. In: IEEE International Conference on Computer Vision (ICCV), pp. 2327–2335 (2015)
Kanazawa, A., Kovalsky, S., Basri, R., Jacobs, D.W.: Learning 3D deformation of animals from 2D images. In: Computer Graphics Forum, pp. 365–374 (2016)
Zuffi, S., Kanazawa, A., Jacobs, D., Black, M.: 3D menagerie: modeling the 3D shape and pose of animals. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6365–6373 (2017)
Zuffi, S., Kanazawa, A., Black, M.J.: Lions and tigers and bears: capturing non-rigid, 3D, articulated shape from images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3955–3963 (2018)
Zuffi, S., Kanazawa, A., Berger-Wolf, T., Black, M.J.: Three-D Safari: learning to estimate zebra pose, shape, and texture from images “in the wild”. In: IEEE International Conference on Computer Vision (ICCV), pp. 5359–5368 (2019)
Wen, C., Zhang, Y., Li, Z., Fu, Y.: Pixel2Mesh++: multi-view 3D mesh generation via deformation. In: IEEE International Conference on Computer Vision (ICCV), pp. 1042–1051 (2019)
Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.N.: Semantic graph convolutional networks for 3D human pose regression. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3425–3435 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Kato, H., Ushiku, Y., Harada, T.: Neural 3D mesh renderer. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3907–3916 (2018)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 586–595 (2018)
Xiang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 75–82 (2014)
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Kar, A., Tulsiani, S., Carreira, J., Malik, J.: Category-specific object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1966–1974 (2015)
Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2626–2634 (2017)
Acknowledgments
This work is supported by the National Natural Science Foundation of China (61773270, 61971005).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Sun, S., Zhu, Z., Dai, X., Zhao, Q., Li, J. (2021). Weakly-Supervised Reconstruction of 3D Objects with Large Shape Variation from Single In-the-Wild Images. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12622. Springer, Cham. https://doi.org/10.1007/978-3-030-69525-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-69525-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69524-8
Online ISBN: 978-3-030-69525-5
eBook Packages: Computer ScienceComputer Science (R0)