Skip to main content

Disentangling Deep Network for Reconstructing 3D Object Shapes from Single 2D Images

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13020))

Included in the following conference series:

Abstract

Recovering 3D shapes of deformable objects from single 2D images is an extremely challenging and ill-posed problem. Most existing approaches are based on structure-from-motion or graph inference, where a 3D shape is solved by fitting 2D keypoints/mask instead of directly using the vital cue in the original 2D image. These methods usually require multiple views of an object instance and rely on accurate labeling, detection, and matching of 2D keypoints/mask across multiple images. To overcome these limitations, we make effort to reconstruct 3D deformable object shapes directly from the given unconstrained 2D images. In training, instead of using multiple images per object instance, our approach relaxes the constraint to use images from the same object category (with one 2D image per object instance). The key is to disentangle the category-specific representation of the 3D shape identity and the instance-specific representation of the 3D shape displacement from the 2D training images. In testing, the 3D shape of an object can be reconstructed from the given image by deforming the 3D shape identity according to the 3D shape displacement. To achieve this goal, we propose a novel convolutional encoder-decoder network—the Disentangling Deep Network (DisDN). To demonstrate the effectiveness of the proposed approach, we implement comprehensive experiments on the challenging PASCAL VOC benchmark and use different 3D shape ground-truth in training and testing to avoiding overfitting. The obtained experimental results show that DisDN outperforms other state-of-the-art and baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cheng, G., Li, R., Lang, C., Han, J.: Task-wise attention guided part complementary learning for few-shot image classification. Sci. China Inf. Sci. 64(2), 1–14 (2021). https://doi.org/10.1007/s11432-020-3156-7

    Article  Google Scholar 

  2. Choy, C.B., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3D–R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Proceedings of the European Conference on Computer Vision, pp. 628–644 (2016)

    Google Scholar 

  3. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)

    Article  Google Scholar 

  4. Fan, H., Su, H., Guibas, L.: A point set generation network for 3D object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 38 (2017)

    Google Scholar 

  5. Girdhar, R., Fouhey, D.F., Rodriguez, M., Gupta, A.: Learning a predictable and generative vector representation for objects. In: Proceedings of the European Conference on Computer Vision, pp. 484–499 (2016)

    Google Scholar 

  6. Gong, C., Yang, J., You, J.J., Sugiyama, M.: Centroid estimation with guaranteed efficiency: a general framework for weakly supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. (2020)

    Google Scholar 

  7. Gwak, J., Choy, C.B., Garg, A., Chandraker, M., Savarese, S.: Weakly supervised generative adversarial networks for 3D reconstruction. arXiv preprint arXiv:1705.10904 (2017)

  8. Han, J., Yang, Y., Zhang, D., Huang, D., Xu, D., De La Torre, F.: Weakly-supervised learning of category-specific 3D object shapes. IEEE Trans. Pattern Anal. Mach. Intell. 43(4), 1423–1437 (2019)

    Article  Google Scholar 

  9. Higgins, I., et al.: Towards a definition of disentangled representations. arXiv preprint arXiv:1812.02230 (2018)

  10. Johnston, A., Garg, R., Carneiro, G., Reid, I., van den Hengel, A.: Scaling CNNs for high resolution volumetric reconstruction from a single image. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 939–948 (2017)

    Google Scholar 

  11. Kar, A., Tulsiani, S., Carreira, J., Malik, J.: Category-specific object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1966–1974 (2015)

    Google Scholar 

  12. Kim, Y.M., Theobalt, C., Diebel, J., Kosecka, J., Miscusik, B., Thrun, S.: Multi-view image and ToF sensor fusion for dense 3D reconstruction. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1542–1549 (2009)

    Google Scholar 

  13. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)

  14. Kolev, K., Klodt, M., Brox, T., Cremers, D.: Continuous global optimization in multiview 3D reconstruction. Int. J. Comput. Vision 84(1), 80–96 (2009)

    Article  Google Scholar 

  15. Kurenkov, A., et al.: Deformnet: free-form deformation network for 3D shape reconstruction from a single image. arXiv preprint arXiv:1708.04672 (2017)

  16. Larsen, A.B.L., Sønderby, S.K., Larochelle, H., Winther, O.: Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300 (2015)

  17. Owens, A., Xiao, J., Torralba, A., Freeman, W.: Shape anchors for data-driven multi-view reconstruction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 33–40 (2013)

    Google Scholar 

  18. Reed, S., Sohn, K., Zhang, Y., Lee, H.: Learning to disentangle factors of variation with manifold interaction. In: Proceedings of the International Conference on Machine Learning, pp. 1431–1439 (2014)

    Google Scholar 

  19. Sanchez, E.H., Serrurier, M., Ortner, M.: Learning disentangled representations via mutual information estimation. In: Proceedings of the European Conference on Computer Vision, pp. 205–221 (2020)

    Google Scholar 

  20. Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A comparison and evaluation of multi-view stereo reconstruction algorithms. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 519–528 (2006)

    Google Scholar 

  21. Shin, D., Fowlkes, C.C., Hoiem, D.: Pixels, voxels, and views: a study of shape representations for single view 3D object shape prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3061–3069 (2018)

    Google Scholar 

  22. Tulsiani, S., Kar, A., Carreira, J., Malik, J.: Learning category-specific deformable 3D models for object reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 719–731 (2017)

    Article  Google Scholar 

  23. Vicente, S., Carreira, J., Agapito, L., Batista, J.: Reconstructing pascal VOC. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 41–48 (2014)

    Google Scholar 

  24. Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Advances in Neural Information Processing Systems, pp. 82–90 (2016)

    Google Scholar 

  25. Xiang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: a benchmark for 3D object detection in the wild. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 75–82 (2014)

    Google Scholar 

  26. Yingze Bao, S., Chandraker, M., Lin, Y., Savarese, S.: Dense object reconstruction with semantic priors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1264–1271 (2013)

    Google Scholar 

  27. Zhang, D., Han, J., Cheng, G., Yang, M.H.: Weakly supervised object localization and detection: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2021)

    Google Scholar 

  28. Zhang, D., Han, J., Yang, Y., Huang, D.: Learning category-specific 3D shape models from weakly labeled 2D images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4573–4581 (2017)

    Google Scholar 

  29. Zhang, D., Zeng, W., Yao, J., Han, J.: Weakly supervised object detection using proposal-and semantic-level relationships. IEEE Trans. Pattern Anal. Mach. Intell. (2020)

    Google Scholar 

  30. Zia, M.Z., Stark, M., Schiele, B., Schindler, K.: Detailed 3D representations for object recognition and modeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2608–2623 (2013)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Key-Area Research and Development Program of Guangdong Province(2019B010110001), the National Science Foundation of China (Grant Nos. 61876140, 61806167, 61936007 and U1801265), and the research funds for interdisciplinary subject, NWPU.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, Y., Han, J., Zhang, D., Cheng, D. (2021). Disentangling Deep Network for Reconstructing 3D Object Shapes from Single 2D Images. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13020. Springer, Cham. https://doi.org/10.1007/978-3-030-88007-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88007-1_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88006-4

  • Online ISBN: 978-3-030-88007-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics