Disentangling Deep Network for Reconstructing 3D Object Shapes from Single 2D Images

Yang, Yang; Han, Junwei; Zhang, Dingwen; Cheng, De

doi:10.1007/978-3-030-88007-1_13

Yang Yang¹⁶,
Junwei Han¹⁶,
Dingwen Zhang¹⁶ &
…
De Cheng¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13020))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

2099 Accesses
3 Citations

Abstract

Recovering 3D shapes of deformable objects from single 2D images is an extremely challenging and ill-posed problem. Most existing approaches are based on structure-from-motion or graph inference, where a 3D shape is solved by fitting 2D keypoints/mask instead of directly using the vital cue in the original 2D image. These methods usually require multiple views of an object instance and rely on accurate labeling, detection, and matching of 2D keypoints/mask across multiple images. To overcome these limitations, we make effort to reconstruct 3D deformable object shapes directly from the given unconstrained 2D images. In training, instead of using multiple images per object instance, our approach relaxes the constraint to use images from the same object category (with one 2D image per object instance). The key is to disentangle the category-specific representation of the 3D shape identity and the instance-specific representation of the 3D shape displacement from the 2D training images. In testing, the 3D shape of an object can be reconstructed from the given image by deforming the 3D shape identity according to the 3D shape displacement. To achieve this goal, we propose a novel convolutional encoder-decoder network—the Disentangling Deep Network (DisDN). To demonstrate the effectiveness of the proposed approach, we implement comprehensive experiments on the challenging PASCAL VOC benchmark and use different 3D shape ground-truth in training and testing to avoiding overfitting. The obtained experimental results show that DisDN outperforms other state-of-the-art and baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cheng, G., Li, R., Lang, C., Han, J.: Task-wise attention guided part complementary learning for few-shot image classification. Sci. China Inf. Sci. 64(2), 1–14 (2021). https://doi.org/10.1007/s11432-020-3156-7
Article Google Scholar
Choy, C.B., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3D–R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Proceedings of the European Conference on Computer Vision, pp. 628–644 (2016)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)
Article Google Scholar
Fan, H., Su, H., Guibas, L.: A point set generation network for 3D object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 38 (2017)
Google Scholar
Girdhar, R., Fouhey, D.F., Rodriguez, M., Gupta, A.: Learning a predictable and generative vector representation for objects. In: Proceedings of the European Conference on Computer Vision, pp. 484–499 (2016)
Google Scholar
Gong, C., Yang, J., You, J.J., Sugiyama, M.: Centroid estimation with guaranteed efficiency: a general framework for weakly supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Google Scholar
Gwak, J., Choy, C.B., Garg, A., Chandraker, M., Savarese, S.: Weakly supervised generative adversarial networks for 3D reconstruction. arXiv preprint arXiv:1705.10904 (2017)
Han, J., Yang, Y., Zhang, D., Huang, D., Xu, D., De La Torre, F.: Weakly-supervised learning of category-specific 3D object shapes. IEEE Trans. Pattern Anal. Mach. Intell. 43(4), 1423–1437 (2019)
Article Google Scholar
Higgins, I., et al.: Towards a definition of disentangled representations. arXiv preprint arXiv:1812.02230 (2018)
Johnston, A., Garg, R., Carneiro, G., Reid, I., van den Hengel, A.: Scaling CNNs for high resolution volumetric reconstruction from a single image. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 939–948 (2017)
Google Scholar
Kar, A., Tulsiani, S., Carreira, J., Malik, J.: Category-specific object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1966–1974 (2015)
Google Scholar
Kim, Y.M., Theobalt, C., Diebel, J., Kosecka, J., Miscusik, B., Thrun, S.: Multi-view image and ToF sensor fusion for dense 3D reconstruction. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1542–1549 (2009)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Kolev, K., Klodt, M., Brox, T., Cremers, D.: Continuous global optimization in multiview 3D reconstruction. Int. J. Comput. Vision 84(1), 80–96 (2009)
Article Google Scholar
Kurenkov, A., et al.: Deformnet: free-form deformation network for 3D shape reconstruction from a single image. arXiv preprint arXiv:1708.04672 (2017)
Larsen, A.B.L., Sønderby, S.K., Larochelle, H., Winther, O.: Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300 (2015)
Owens, A., Xiao, J., Torralba, A., Freeman, W.: Shape anchors for data-driven multi-view reconstruction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 33–40 (2013)
Google Scholar
Reed, S., Sohn, K., Zhang, Y., Lee, H.: Learning to disentangle factors of variation with manifold interaction. In: Proceedings of the International Conference on Machine Learning, pp. 1431–1439 (2014)
Google Scholar
Sanchez, E.H., Serrurier, M., Ortner, M.: Learning disentangled representations via mutual information estimation. In: Proceedings of the European Conference on Computer Vision, pp. 205–221 (2020)
Google Scholar
Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A comparison and evaluation of multi-view stereo reconstruction algorithms. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 519–528 (2006)
Google Scholar
Shin, D., Fowlkes, C.C., Hoiem, D.: Pixels, voxels, and views: a study of shape representations for single view 3D object shape prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3061–3069 (2018)
Google Scholar
Tulsiani, S., Kar, A., Carreira, J., Malik, J.: Learning category-specific deformable 3D models for object reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 719–731 (2017)
Article Google Scholar
Vicente, S., Carreira, J., Agapito, L., Batista, J.: Reconstructing pascal VOC. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 41–48 (2014)
Google Scholar
Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Advances in Neural Information Processing Systems, pp. 82–90 (2016)
Google Scholar
Xiang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: a benchmark for 3D object detection in the wild. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 75–82 (2014)
Google Scholar
Yingze Bao, S., Chandraker, M., Lin, Y., Savarese, S.: Dense object reconstruction with semantic priors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1264–1271 (2013)
Google Scholar
Zhang, D., Han, J., Cheng, G., Yang, M.H.: Weakly supervised object localization and detection: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2021)
Google Scholar
Zhang, D., Han, J., Yang, Y., Huang, D.: Learning category-specific 3D shape models from weakly labeled 2D images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4573–4581 (2017)
Google Scholar
Zhang, D., Zeng, W., Yao, J., Han, J.: Weakly supervised object detection using proposal-and semantic-level relationships. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Google Scholar
Zia, M.Z., Stark, M., Schiele, B., Schindler, K.: Detailed 3D representations for object recognition and modeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2608–2623 (2013)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Key-Area Research and Development Program of Guangdong Province(2019B010110001), the National Science Foundation of China (Grant Nos. 61876140, 61806167, 61936007 and U1801265), and the research funds for interdisciplinary subject, NWPU.

Author information

Authors and Affiliations

School of Automation, Northwestern Polytechnical University, Xi’an, 710072, Shaanxi, People’s Republic of China
Yang Yang, Junwei Han & Dingwen Zhang
The School of Telecommunication Engineering, Xidian University, Xi’an, 710071, Shaanxi, People’s Republic of China
De Cheng

Authors

Yang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Junwei Han
View author publications
You can also search for this author in PubMed Google Scholar
Dingwen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
De Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Science and Technology Beijing, Beijing, China
Huimin Ma
Chinese Academy of Sciences, Beijing, China
Liang Wang
Tsinghua University, Beijing, China
Changshui Zhang
Zhejiang University, Hangzhou, China
Fei Wu
Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Hunan University, Changsha, China
Yaonan Wang
Sun Yat-Sen University, Guangzhou, Guangdong, China
Jianhuang Lai
Beijing Jiaotong University, Beijing, China
Yao Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, Y., Han, J., Zhang, D., Cheng, D. (2021). Disentangling Deep Network for Reconstructing 3D Object Shapes from Single 2D Images. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13020. Springer, Cham. https://doi.org/10.1007/978-3-030-88007-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-88007-1_13
Published: 22 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88006-4
Online ISBN: 978-3-030-88007-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics