skip to main content
research-article

DEGAN: Detail-Enhanced Generative Adversarial Network for Monocular Depth-Based 3D Reconstruction

Published: 18 November 2024 Publication History

Abstract

Although deep networks-based 3D reconstruction methods can recover the 3D geometry given few inputs, they may produce unfaithful reconstruction when predicting occluded parts of 3D objects. To address the issue, we propose Detail-Enhanced Generative Adversarial Network (DEGAN) which consists of Encoder–Decoder-Based Generator (EDGen) and Voxel-Point Embedding Network-Based Discriminator (VPDis) for 3D reconstruction from a monocular depth image of an object. Firstly, EDGen decodes the features from the 2.5D voxel grid representation of an input depth image and generates the 3D occupancy grid under GAN losses and a sampling point loss. The sampling loss can improve the accuracy of predicted points with high uncertainty. VPDis helps reconstruct the details under voxel and point adversarial losses, respectively. Experimental results show that DEGAN not only outperforms several state-of-the-art methods on both public ModelNet and ShapeNet datasets but also predicts more reliable occluded/missing parts of 3D objects.

References

[1]
Bo Yang, Stefano Rosa, Andrew Markham, Niki Trigoni, and Hongkai Wen. 2019. 3D object dense reconstruction from a single depth view. IEEE Trans. Pattern Anal. Mach. Intell. 41, 12 (2019), 2820–2834.
[2]
Tianhao Zhao, Songnan Li, King Ngi Ngan, and Fanzi Wu. 2019. 3-D reconstruction of human body shape from a single commodity depth camera. IEEE Trans. Multim. 21, 1 (2019), 114–123.
[3]
Hao Zhu, Yebin Liu, Jingtao Fan, Qionghai Dai, and Cao Xun. 2017. Video-based outdoor human reconstruction. IEEE Trans. Circ. Syst. Video Tech. 27, 4 (2017), 760–770.
[4]
Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, and Steven M. Seitz. 2007. Multi-view stereo for community photo collections. In ICCV, Vol. 1, 1–8.
[5]
Yasutaka Furukawa and Jean Ponce. 2006. Carved visual hulls for image-based modeling. In ECCV, 564–577.
[6]
Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In CVPR, 1912–1920.
[7]
JunYoung Gwak, Christopher B. Choy, Manmohan Chandraker, Animesh Garg, and Silvio Savarese. 2017. Weakly supervised 3D reconstruction with adversarial constraint. In 3DV, 263–272.
[8]
Jiajun Wu, Tianfan Xue, Joseph J. Lim, Yuandong Tian, Joshua B. Tenenbaum, Antonio Torralba, and William T. Freeman. 2016. Single image 3D interpreter network. In ECCV, 365–382.
[9]
Richard A. Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, and Andrew W. Fitzgibbon. 2011. KinectFusion: Real-time dense surface mapping and tracking. In ISMAR, 127–136.
[10]
Matthias Nießner, Michael Zollhöfer, Shahram Izadi, and Marc Stamminger. 2013. Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. 32, 6 (2013), 169–169.
[11]
Frank Steinbrucker, Christian Kerl, and Daniel Cremers. 2013. Large-scale multi-resolution surface reconstruction from RGB-D sequences. In ICCV, 3264–3271.
[12]
Christopher B. Choy, Danfei Xu, JunYoung Gwak, Kevin Chen, and Silvio Savarese. 2016. 3D-R2N2: A unified approach for single and multi-view 3D object reconstruction. In ECCV, 628–644.
[13]
Pengfei Wu, Yiguang Liu, Mao Ye, Jie Li, and Shuangli Du. 2017. Fast and adaptive 3D reconstruction with extensively high completeness. IEEE Trans. Multim. 19, 2 (2017), 266–278.
[14]
Gaochang Wu, Yebin Liu, Lu Fang, Qionghai Dai, and Tianyou Chai. 2018. Light field reconstruction using convolutional network on EPI and extended applications. IEEE Trans. Pattern Anal. Mach. Intell. 41, 7 (2018), 1681–1694.
[15]
Lingjing Wang and Yi Fang. 2017. Unsupervised 3D reconstruction from a single image via adversarial learning. arXiv:1711.09312.
[16]
Bo Yang, Hongkai Wen, Sen Wang, Ronald Clark, Andrew Markham, and Niki Trigoni. 2017. 3D object reconstruction from a single depth view with adversarial learning. ICCV Workshop 112, 518 (2017), 679–688.
[17]
Haoqiang Fan, Hao Su, and Leonidas J. Guibas Guibas. 2017. A point set generation network for 3D object reconstruction from a single image. In CVPR, 2463–2471.
[18]
Weisheng Dong, Guangming Shi, Xin Li, Kefan Peng, Jinjian Wu, and Zhenhua Guo. 2017. Color-guided depth recovery via joint local structural and nonlocal low-rank regularization. IEEE Trans. Multim. 19, 2 (2017), 293–301.
[19]
Edward Smith and David Meger. 2017. Improved adversarial systems for 3D object generation and reconstruction. In CoRL, 87–96.
[20]
Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh Tenenbaum. 2016. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In NIPS, 82–90.
[21]
Haisheng Li, Yanping Zheng, Xiaoqun Wu, and Qiang Cai. 2019. 3D model generation and reconstruction using conditional generative adversarial network. Int. J. Comput. Intell. 12, 2 (2019), 697–705.
[22]
Eric R. Chan, Connor Z. Lin, Matthew A. Chan, Koki Nagano, Boxiao Pan, Shalini De Mello, Orazio Gallo, Leonidas J. Guibas, Jonathan Tremblay, Sameh Khamis, Tero Karras, and Gordon Wetzstein. 2022. Efficient geometry-aware 3D generative adversarial networks. In CVPR, 16102–16112.
[23]
Jhony K. Pontes, Chen Kong, Sridha Sridharan, Simon Lucey, Anders Eriksson, and Clinton Fookes. 2018. Image2Mesh: A learning framework for single image 3D reconstruction. In ACCV, 365–381.
[24]
Yangyan Li, Angela Dai, Leonidas Guibas, and Matthias Nießner. 2015. Database-assisted object retrieval for real-time 3D reconstruction. Comput. Graph. Forum 34, 2 (2015), 435–446.
[25]
Rohit Girdhar, David F. Fouhey, Mikel Rodriguez, and Abhinav Gupta. 2016. Learning a predictable and generative vector representation for objects. In ECCV, 484–499.
[26]
Abhishek Sharma, Oliver Grau, and Mario Fritz. 2016. VCONV-DAE: Deep volumetric shape learning without object labels. In ECCV, 236–250.
[27]
Chen-Hsuan Lin, Chen Kong, and Simon Lucey. 2018. Learning efficient point cloud generation for dense 3D object reconstruction. In AAAI, 7114–7121.
[28]
Jacob Varley, Chad DeChant, Adam Richardson, Joaquín Ruales, and Peter K. Allen. 2017. Shape completion enabled robotic grasping. In IROS, 2442–2447.
[29]
Liang Pan, Xinyi Chen, Zhongang Cai, Junzhe Zhang, Haiyu Zhao, Shuai Yi, and Ziwei Liu. 2023. Variational relational point completion network for robust 3D classification. IEEE Trans. Pattern Anal. Mach. Intell. 45, 9 (2023), 11340–11351.
[30]
Charlie Nash and Chris K. I. Williams. 2017. The shape variational autoencoder: A deep generative model of part-segmented 3D objects. Comput. Graph. Forum 36, 5 (2017), 1–12.
[31]
Danilo Jimenez Rezende, S. M. Ali Eslami, Shakir Mohamed, Peter Battaglia, Max Jaderberg, and Nicolas Heess. 2016. Unsupervised learning of 3D structure from images. In NIPS, 4996–5004.
[32]
Wen Xin, Zhou Junsheng, Liu YuShen, Su Hua, Dong Zhen, and Han Zhizhong. 2022. 3D shape reconstruction from 2D images with disentangled attribute flow. In CVPR, 3793–3803.
[33]
Aihua Mao, Zihui Du, Junhui Hou, Yaqi Duan, Yong-Jin Liu, and Ying He. 2023. PU-Flow: A point cloud upsampling network with normalizing flows. IEEE Trans. Vis. Comput. Graph 29, 12 (2023), 4964–4977.
[34]
Xiaoxiao Long, YuanChen Guo, Cheng Lin, Yuan Liu, Zhiyang Dou, Lingjie Liu, Yuexin Ma, SongHai Zhang, Marc Habermann, Christian Theobalt, and Wenping Wang. 2023. Wonder3D: Single image to 3D using cross-domain diffusion. arXiv:2310.15008.
[35]
Titas Anciukevicius, Zexiang Xu, Matthew Fisher, Paul Henderson, Hakan Bilen, Niloy J. Mitra, and Paul Guerrero. 2023. RenderDiffusion: Image diffusion for 3D reconstruction, inpainting and generation. In CVPR, 1–15.
[36]
Hansheng Chen, Jiatao Gu, Anpei Chen, Wei Tian, Zhuowen Tu, Lingjie Liu, and Hao Su. 2023. Single-stage diffusion NeRF: A unified approach to 3D generation and reconstruction. In ICCV, 1–19.
[37]
Jiatao Gu, Qingzhe Gao, Shuangfei Zhai, Baoquan Chen, Lingjie Liu, and Josh M. Susskind. 2023. Learning controllable 3D diffusion models from single-view images. arXiv:2304.06700.
[38]
Zhizhuo Zhou and Shubham Tulsiani. 2023. SparseFusion: Distilling view-conditioned diffusion for 3D reconstruction. In CVPR, 12588–12597.
[39]
Caixia Liu, Dehui Kong, Shaofan Wang, Jinghua Li, and Baocai Yin. 2021. Latent feature-aware and local structure-preserving network for 3D completion from a single depth view. In ICANN, 67–79.
[40]
Caixia Liu, Dehui Kong, Shaofan Wang, Jinghua Li, and Baocai Yin. 2021. DLGAN: Depth-preserving latent generative adversarial network for 3D reconstruction. IEEE Trans. Multim. 23 (2021), 2843–2856.
[41]
Caixia Liu, Dehui Kong, Shaofan Wang, Jinghua Li, and Baocai Yin. 2022. A spatial relationship preserving adversarial network for 3D reconstruction from a single depth view. ACM Trans. Multim. Comput. Commun. Appl. 18, 4 (2022), 1–22.
[42]
Ruikai Cui, Shi Qiu, Saeed Anwar, Jiawei Liu, Chaoyue Xing, Jing Zhang, and Nick Barnes. 2023. P2C: Self-supervised point cloud completion from single partial clouds. In ICCV, 1–10.
[43]
Angjoo Kanazawa, Shubham Tulsiani, Alexei A. Efros, and Jitendra Malik. 2018. Learning category-specific mesh reconstruction from image collections. In ECCV, Vol. 15, 386–402.
[44]
Anurag Ranjan, Timo Bolkart, Soubhik Sanyal, and Michael J. Black. 2018. Generating 3D faces using convolutional mesh autoencoders. In ECCV, Vol. 3, 725–741.
[45]
Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei Liu, and Yu-Gang Jiang. 2018. Pixel2Mesh: Generating 3D mesh models from single RGB images. In ECCV, Vol. 11, 55–71.
[46]
Caixia Liu, Dehui Kong, Shaofan Wang, Qianxing Li, Jinghua Li, and Baocai Yin. 2023. Multi-scale latent feature-aware network for logical partition based 3D voxel reconstruction. Neurocomputing 533 (2023), 22–34.
[47]
Angela Dai, Charles Ruizhongtai Qi, and Matthias Nießner. 2017. Shape completion using 3D-encoder-predictor CNNs and shape synthesis. In CVPR, 6545–6554.
[48]
Jeong Joon Park, Peter Florence, Julian Straub, Richard A. Newcombe, and Steven Lovegrove. 2019. DeepSDF: Learning continuous signed distance functions for shape representation. In CVPR, 165–174.
[49]
Zhiqin Chen and Hao Zhang. 2019. Learning implicit fields for generative shape modeling. In CVPR, 5939–5948.
[50]
Lars M. Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, and Andreas Geiger. 2019. Occupancy networks: Learning 3D reconstruction in function space. In CVPR, 4460–4470.
[51]
Qiangeng Xu, Weiyue Wang, Duygu Ceylan, Radomír Mech, and Ulrich Neumann. 2019. DISN: Deep implicit surface network for high-quality single-view 3D reconstruction. In NIPS, 490–500.
[52]
Meihua Zhao, Gang Xiong, MengChu Zhou, Zhen Shen, and FeiYue Wang. 2021. 3D-RVP: A method for 3D object reconstruction from a single depth view using voxel and point. Neurocomputing 430 (2021), 94–103.
[53]
Zhen Wang, Shijie Zhou, Jeong Joon Park, Despoina Paschalidou, Suya You, Gordon Wetzstein, Leonidas Guibas, and Achuta Kadambi. 2023. ALTO: Alternating latent topologies for implicit 3D reconstruction. In CVPR, 94–103.
[54]
Zekun Qi, Muzhou Yu, Runpei Dong, and Kaisheng Ma. 2023. VPP: Efficient conditional 3D generation via voxel-point progressive representation. In NIPS, 26744–26763.
[55]
Wentao Yuan, Tejas Khot, David Held, Christoph Mertz, and Martial Hebert. 2018. PCN: Point completion network. In 3DV, 728–737.
[56]
Pablo Speciale, Martin R. Oswald, Andrea Cohen, and Marc Pollefeys. 2016. A symmetry prior for convex variational 3D reconstruction. In ECCV, Vol. 8, 313–328.
[57]
Hassan Afzal, Djamila Aouada, Bruno Mirbach, and Björn E. Ottersten. 2018. Full 3D reconstruction of non-rigidly deforming objects. ACM Trans. Multim. Comput. Commun. Appl. 14, 1s (2018), 1–23.
[58]
Bo Yang, Sen Wang, Andrew Markham, and Niki Trigoni. 2020. Robust attentional aggregation of deep feature sets for multi-view 3D reconstruction. In IJCV, 53–73.
[59]
Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, and Carl Vondrick. 2023. Zero-1-to-3: Zero-shot one image to 3D object. In ICCV, 9264–9275.
[60]
Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. 2023. LRM: Large reconstruction model for single image to 3D. In ICLR, 1–25.
[61]
Yao Wei, George Vosselman, and Michael Ying Yang. 2023. BuilDiff: 3D building shape generation using single-image conditional point cloud diffusion models. In ICCV Workshops, 2902–2911.
[62]
Luke Melas-Kyriazi, Christian Rupprecht, and Andrea Vedaldi. 2023. \(\rm PC^{2}\): Projection-conditioned point cloud diffusion for single-image 3D reconstruction. In CVPR, 12923–12932.
[63]
Yue Jiang, Dantong Ji, Zhizhong Han, and Matthias Zwicker. 2020. SDFDiff: Differentiable rendering of signed distance fields for 3D shape optimization. In CVPR, 1248–1258.
[64]
Yanping Zheng, Guang Zeng, Haisheng Li, Qiang Cai, and Junping Du. 2022. Colorful 3D reconstruction at high resolution using multi-view representation. J. Vis. Commun. Image Represent. 85, 103486 (2022), 1–10.
[65]
Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-end recovery of human shape and pose. In CVPR, 7122–7131.
[66]
Chen Kong, Chenhsuan Lin, and Simon Lucey. 2017. Using locally corresponding CAD models for dense 3D reconstructions from a single image. In CVPR, 5603–5611.
[67]
Thibault Groueix, Matthew Fisher, Vladimir G. Kim, Bryan C. Russell, and Mathieu Aubry. 2018. A papier-mâché approach to learning 3D surface generation. In CVPR, 216–224.
[68]
Haozhe Xie, Hongxun Yao, Xiaoshuai Sun, Shangchen Zhou, and Shengping Zhang. 2019. Pix2Vox: Context-aware 3D reconstruction from single and multi-view images. In ICCV, 2690–2698.
[69]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In MICCAI, 234–241.
[70]
Yu Xiang, Roozbeh Mottaghi, and Silvio Savarese. 2014. Beyond PASCAL: A benchmark for 3D object detection in the wild. In WACV, 75–82.
[71]
Joseph J. Lim, Hamed Pirsiavash, and Antonio Torralba. 2013. Parsing IKEA objects: Fine pose estimation. In ICCV, 2992–2999.
[72]
Laura Downs, Anthony Francis, Nate Koenig, Brandon Kinman, Ryan Hickman, Krista Reymann, Thomas B. McHugh, and Vincent Vanhoucke. 2022. Google scanned objects: A high-quality dataset of 3D scanned household items. In ICRA, 2553–2560.
[73]
Pratheba Selvaraju, Mohamed Nabail, Marios Loizou, Maria Maslioukova, Melinos Averkiou, Andreas Andreou, Siddhartha Chaudhuri, and Evangelos Kalogerakis. 2021. BuildingNet: Learning to label 3D buildings. In ICCV, 10377–10387.
[74]
Ruxandra Marina Florea, Adrian Munteanu, Shao Ping Lu, and Peter Schelkens. 2017. Wavelet-based \(L_{\infty}\) semi-regular mesh coding. IEEE Trans. Multim. 19, 2 (2017), 236–250.
[75]
Ruizhen Hu, Tingkai Sha, Oliver van Kaick, Oliver Deussen, and Hui Huang. (2020). Data sampling in multi-view and multi-class scatterplots via set cover optimization. IEEE Trans. Vis. Comput. Graph 26, 1 (2020), 739–748.
[76]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR, 1–15.
[77]
Lyne P. Tchapmi, Vineet Kosaraju, Hamid Rezatofighi, Ian D. Reid, and Silvio Savarese. 2019. TopNet: Structural point cloud decoder. In CVPR, 383–392.
[78]
Xiaogang Wang, Marcelo H. Ang, and Gim Hee Lee. 2020. Cascaded refinement network for point cloud completion. In CVPR, 787–796.

Index Terms

  1. DEGAN: Detail-Enhanced Generative Adversarial Network for Monocular Depth-Based 3D Reconstruction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 12
    December 2024
    721 pages
    EISSN:1551-6865
    DOI:10.1145/3618076
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 November 2024
    Online AM: 30 August 2024
    Accepted: 15 August 2024
    Revised: 16 April 2024
    Received: 18 December 2023
    Published in TOMM Volume 20, Issue 12

    Check for updates

    Author Tags

    1. 3D shape reconstruction
    2. Generative adversarial network
    3. Voxel-point embedding network
    4. Sampling point loss
    5. Monocular depth image

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China
    • Beijing Natural Science Foundation
    • R&D Program of Beijing Municipal Education Commission

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 299
      Total Downloads
    • Downloads (Last 12 months)299
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media