Abstract
Recovering 3D shapes of deformable objects from single 2D images is an extremely challenging and ill-posed problem. Most existing approaches are based on structure-from-motion or graph inference, where a 3D shape is solved by fitting 2D keypoints/mask instead of directly using the vital cue in the original 2D image. These methods usually require multiple views of an object instance and rely on accurate labeling, detection, and matching of 2D keypoints/mask across multiple images. To overcome these limitations, we make effort to reconstruct 3D deformable object shapes directly from the given unconstrained 2D images. In training, instead of using multiple images per object instance, our approach relaxes the constraint to use images from the same object category (with one 2D image per object instance). The key is to disentangle the category-specific representation of the 3D shape identity and the instance-specific representation of the 3D shape displacement from the 2D training images. In testing, the 3D shape of an object can be reconstructed from the given image by deforming the 3D shape identity according to the 3D shape displacement. To achieve this goal, we propose a novel convolutional encoder-decoder network—the Disentangling Deep Network (DisDN). To demonstrate the effectiveness of the proposed approach, we implement comprehensive experiments on the challenging PASCAL VOC benchmark and use different 3D shape ground-truth in training and testing to avoiding overfitting. The obtained experimental results show that DisDN outperforms other state-of-the-art and baseline methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cheng, G., Li, R., Lang, C., Han, J.: Task-wise attention guided part complementary learning for few-shot image classification. Sci. China Inf. Sci. 64(2), 1–14 (2021). https://doi.org/10.1007/s11432-020-3156-7
Choy, C.B., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3D–R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Proceedings of the European Conference on Computer Vision, pp. 628–644 (2016)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)
Fan, H., Su, H., Guibas, L.: A point set generation network for 3D object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 38 (2017)
Girdhar, R., Fouhey, D.F., Rodriguez, M., Gupta, A.: Learning a predictable and generative vector representation for objects. In: Proceedings of the European Conference on Computer Vision, pp. 484–499 (2016)
Gong, C., Yang, J., You, J.J., Sugiyama, M.: Centroid estimation with guaranteed efficiency: a general framework for weakly supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Gwak, J., Choy, C.B., Garg, A., Chandraker, M., Savarese, S.: Weakly supervised generative adversarial networks for 3D reconstruction. arXiv preprint arXiv:1705.10904 (2017)
Han, J., Yang, Y., Zhang, D., Huang, D., Xu, D., De La Torre, F.: Weakly-supervised learning of category-specific 3D object shapes. IEEE Trans. Pattern Anal. Mach. Intell. 43(4), 1423–1437 (2019)
Higgins, I., et al.: Towards a definition of disentangled representations. arXiv preprint arXiv:1812.02230 (2018)
Johnston, A., Garg, R., Carneiro, G., Reid, I., van den Hengel, A.: Scaling CNNs for high resolution volumetric reconstruction from a single image. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 939–948 (2017)
Kar, A., Tulsiani, S., Carreira, J., Malik, J.: Category-specific object reconstruction from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1966–1974 (2015)
Kim, Y.M., Theobalt, C., Diebel, J., Kosecka, J., Miscusik, B., Thrun, S.: Multi-view image and ToF sensor fusion for dense 3D reconstruction. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1542–1549 (2009)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Kolev, K., Klodt, M., Brox, T., Cremers, D.: Continuous global optimization in multiview 3D reconstruction. Int. J. Comput. Vision 84(1), 80–96 (2009)
Kurenkov, A., et al.: Deformnet: free-form deformation network for 3D shape reconstruction from a single image. arXiv preprint arXiv:1708.04672 (2017)
Larsen, A.B.L., Sønderby, S.K., Larochelle, H., Winther, O.: Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300 (2015)
Owens, A., Xiao, J., Torralba, A., Freeman, W.: Shape anchors for data-driven multi-view reconstruction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 33–40 (2013)
Reed, S., Sohn, K., Zhang, Y., Lee, H.: Learning to disentangle factors of variation with manifold interaction. In: Proceedings of the International Conference on Machine Learning, pp. 1431–1439 (2014)
Sanchez, E.H., Serrurier, M., Ortner, M.: Learning disentangled representations via mutual information estimation. In: Proceedings of the European Conference on Computer Vision, pp. 205–221 (2020)
Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A comparison and evaluation of multi-view stereo reconstruction algorithms. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 519–528 (2006)
Shin, D., Fowlkes, C.C., Hoiem, D.: Pixels, voxels, and views: a study of shape representations for single view 3D object shape prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3061–3069 (2018)
Tulsiani, S., Kar, A., Carreira, J., Malik, J.: Learning category-specific deformable 3D models for object reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 719–731 (2017)
Vicente, S., Carreira, J., Agapito, L., Batista, J.: Reconstructing pascal VOC. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 41–48 (2014)
Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Advances in Neural Information Processing Systems, pp. 82–90 (2016)
Xiang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: a benchmark for 3D object detection in the wild. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 75–82 (2014)
Yingze Bao, S., Chandraker, M., Lin, Y., Savarese, S.: Dense object reconstruction with semantic priors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1264–1271 (2013)
Zhang, D., Han, J., Cheng, G., Yang, M.H.: Weakly supervised object localization and detection: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2021)
Zhang, D., Han, J., Yang, Y., Huang, D.: Learning category-specific 3D shape models from weakly labeled 2D images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4573–4581 (2017)
Zhang, D., Zeng, W., Yao, J., Han, J.: Weakly supervised object detection using proposal-and semantic-level relationships. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Zia, M.Z., Stark, M., Schiele, B., Schindler, K.: Detailed 3D representations for object recognition and modeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2608–2623 (2013)
Acknowledgements
This work was supported by the Key-Area Research and Development Program of Guangdong Province(2019B010110001), the National Science Foundation of China (Grant Nos. 61876140, 61806167, 61936007 and U1801265), and the research funds for interdisciplinary subject, NWPU.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, Y., Han, J., Zhang, D., Cheng, D. (2021). Disentangling Deep Network for Reconstructing 3D Object Shapes from Single 2D Images. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13020. Springer, Cham. https://doi.org/10.1007/978-3-030-88007-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-88007-1_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88006-4
Online ISBN: 978-3-030-88007-1
eBook Packages: Computer ScienceComputer Science (R0)